CN112633340A - Target detection model training method, target detection model training device, target detection model detection device and storage medium - Google Patents
Target detection model training method, target detection model training device, target detection model detection device and storage medium Download PDFInfo
- Publication number
- CN112633340A CN112633340A CN202011475085.0A CN202011475085A CN112633340A CN 112633340 A CN112633340 A CN 112633340A CN 202011475085 A CN202011475085 A CN 202011475085A CN 112633340 A CN112633340 A CN 112633340A
- Authority
- CN
- China
- Prior art keywords
- target
- detection model
- filter
- filters
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 166
- 238000012549 training Methods 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000006870 function Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000000605 extraction Methods 0.000 abstract description 12
- 238000002372 labelling Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000003062 neural network model Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009828 non-uniform distribution Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a target detection model training method, a target detection method, equipment and a storage medium, wherein the target detection model training method comprises the following steps: acquiring a training image, and labeling a sample target in the training image; inputting the training image into a target detection model to obtain a prediction target of the training image; the target detection model comprises a backbone network, wherein the backbone network comprises a plurality of convolution layers, each convolution layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group; and training a target detection model by using the minimization of the difference between a predicted target and a sample target as a target and the minimization of cosine similarity between filters of each filter group as a target. Therefore, the same group of filters share the same parameters, the number of irrelevant filters is reduced, the number of parameters of the target detection model is effectively reduced, and meanwhile, the effectiveness of feature extraction and the accuracy of target detection are guaranteed.
Description
Technical Field
The application belongs to the technical field of target detection, and particularly relates to a target detection model training method, a target detection model training device, a target detection model detection method, a target detection model detection device and a storage medium.
Background
Object detection of images is one of the four tasks classical in computer vision, which, unlike object recognition, requires the detection of multiple objects present in the same picture. Due to the complexity of the algorithm, a neural network model is required to contain a large number of trainable parameters to achieve a good detection effect, so that the neural network model is low in efficiency; the existing method for reducing the number of parameters can cause the detection accuracy of the neural network model to be reduced.
Therefore, how to reduce the number of parameters and the model volume of the neural network model and ensure the detection accuracy of the neural network is an urgent problem to be solved.
Disclosure of Invention
The application provides a target detection model training method, a target detection device and a storage medium, which aim to solve the technical problem of large quantity of parameters of a neural network model.
In order to solve the technical problem, the application adopts a technical scheme that: a method of object detection model training, the method comprising: acquiring a training image, and processing the training image to label a sample target in the training image; inputting the training image into the target detection model to obtain a predicted target of the training image; the target detection model comprises a backbone network, wherein the backbone network comprises a plurality of convolutional layers, each convolutional layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group; and training the target detection model by using the difference minimization target between the prediction target and the sample target as a target and the cosine similarity minimization between the filters of each filter group as a target.
According to an embodiment of the present application, the training the target detection model with the target of the difference between the prediction target and the sample target as a target and the minimization of cosine similarity between the filters of each filter bank as a target includes: training the target detection model by using a back propagation gradient algorithm to minimize a preset loss function; the preset loss function comprises a target frame loss function, a classification loss function, a confidence loss function and the sum of filter bank loss functions, and the filter bank loss function comprises:wherein α' is a constant, kiFor the ith filter in the filter bank, kjFor the jth filter in the filter bank, n is the predetermined number, K is a filter bank matrix, tr (KK)T) Transposed traces that are K times K.
According to an embodiment of the present application, the inter-filter sharing weights of the filter groups of the same group include: in the back propagation gradient algorithm, the weight and weight correction are shared among the filters of the filter bank in the same group.
According to an embodiment of the present application, the target detection model further includes a feature enhancement network and a detection head module sequentially connected to the backbone network.
According to an embodiment of the application, each set of filter banks comprises eight filters obtained by non-rotation, rotation by 90 °, 180 °, 270 ° and symmetric transformation of one of the filters.
In order to solve the above technical problem, the present application adopts another technical solution: a target detection model-based detection method, the method comprising: acquiring a target image; inputting the target image into the target detection model to obtain a detection result of the target image; the target detection model comprises a backbone network, the backbone network comprises a plurality of convolutional layers, each convolutional layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group.
According to an embodiment of the present application, the detection result includes a target box value of an initial target, an initial classification result of the initial target, and an initial confidence of the initial target, and the method includes: obtaining the classification index with the maximum probability in the initial classification result, and obtaining a final classification result by contrasting an index table; acquiring the target frame value of the initial target, and acquiring an initial target frame by using a target frame conversion method; and re-scoring the initial confidence of the initial target frame to screen out a final target detection result.
According to an embodiment of the present application, the target detection model is obtained by training according to any one of the above training methods.
In order to solve the above technical problem, the present application adopts another technical solution: an electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement any of the above methods.
In order to solve the above technical problem, the present application adopts another technical solution: a computer readable storage medium having stored thereon program data which, when executed by a processor, implements any of the methods described above.
The beneficial effect of this application is: different from the prior art, each group of filter groups of the main network of the target detection model comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and the converted filters in the same group share the same parameters, so that similar features can be extracted from multiple angles by rotating and symmetrically transforming weights, the number of irrelevant filters is reduced, the number of parameters of the target detection model is effectively reduced, and meanwhile, the feature extraction effectiveness and the target detection accuracy are ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of object detection model training according to the present application;
FIG. 2 is a schematic diagram of a fourth order dihedral population of an embodiment of the present application trained by the target detection model;
FIG. 3 is a schematic flow chart diagram illustrating an embodiment of a target detection model-based detection method according to the present application;
FIG. 4 is a block diagram of an embodiment of an object detection model training apparatus according to the present application;
FIG. 5 is a block diagram of an embodiment of an object detection model-based detection apparatus according to the present application;
FIG. 6 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1 and 2, fig. 1 is a schematic flowchart illustrating an embodiment of a target detection model training method according to the present application; FIG. 2 is a diagram of a fourth order dihedral population according to an embodiment of the present invention trained by the target detection model.
An embodiment of the present application provides a target detection model training method, including the following steps:
s101: and acquiring a training image, and processing the training image to label a sample target in the training image.
And acquiring a training image, and labeling a sample target in the training image to obtain a data set. Specifically, the training image may be subjected to detection labeling by an existing target detection model, such as a standard YOLOv4 target detection model, to obtain a data set of the training image. When the target detection model is trained by using the training image, the data set is divided according to a cross validation method, so that effective information as much as possible is obtained from limited data, and a more stable target detection model is obtained. It should be noted that the training image is a group of image sequences, and includes a certain number of images, which can achieve an effective training effect on the target detection model.
S102: and inputting the training image into the target detection model to obtain a prediction target of the training image.
And constructing an initial target detection model, wherein the target detection model comprises a backbone network, the backbone network comprises a plurality of convolutional layers, and each convolutional layer comprises a plurality of filter banks. Unlike conventional convolutional neural network models, each filter bank of the backbone network of the object detection model constructed in the present application includes a predetermined number of filters obtained by one filter rotation and/or flipping, and weights are shared among the filters of the same filter bank. The shared weight value comprises a weight correction value.
The inventor of the application finds that filters in the same convolutional layer have similar weights in the convolutional neural network back propagation training process through statistics. The weights are characterized in that the weights between different filters are symmetric to each other or can be obtained by a rotational or symmetric transformation.
For the filters which are independent from each other and tend to be symmetrically similar after training, each group of filter groups of the backbone network constructed by the method comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and the converted same group of filters share the same parameters, so that similar features can be extracted from multiple angles through rotation and symmetric transformation weights, the number of irrelevant filters is reduced, the number of parameters of a target detection model is effectively reduced, and meanwhile, the feature extraction effectiveness and the target detection accuracy are ensured.
When constructing a convolutional layer of a backbone network, a first randomly generated filter in each group of filters is a unit filter, other filters obtained by rotation and/or flipping change in each group of filters are generator filters, and the predetermined number of filters in each group of filters is at least two. For example, as shown in FIG. 2, each set of filter banks includes eight filters obtained by a non-rotating, 90, 180, 270 rotation of one filter, and a symmetric transformation, according to the quadtiegohedron population property. By transformation, the filter can obtain similar characteristics in 8 different directions. The eight symmetric convolution filters after conversion share the same parameters. In the back propagation training, the weight correction values obtained by each set of eight convolution filters are superimposed, and the basic parameters are corrected together.
In one embodiment, the object detection model in the present application can be constructed and initialized according to the structure of the standard YOLOv4 object detection network, including a sequentially connected backbone network, feature enhanced network and detection head modules. Wherein the backbone network uses a modified CSPDarknet53 network, and in a modified CSPDarknet53 network, convolutional layers are constructed using the methods of the present application. In the backbone network of the original CSPDarknet53, the network comprises 5 convolution modules, and 52 convolution layers, wherein each convolution module comprises 2 convolution layers, the number of convolution layer filters is 32, 64, 128, 256, 512, 1024, and the connections between convolution modules comprise 32, 64, 128, 256, 512, 1024 convolution filters. In the convolutional layers of the present application, the number of filters in each convolutional layer is initialized to be original 1/8, and meanwhile, 8 filters in each group are constructed by using these filters according to a fourth-order dihedral group generator, after construction, the total number of filters is the same as the trunk network of the original CSPDarknet53 network, but the filters in each group share weights, share weight correction during back propagation, and respectively extract the features of the single-bit filters in each group in different directions under different generator filter transformations. (supplementary notes: the standard CSPDarknet53 is a Backbone structure generated by referring to the experience of CSPNet in 2019 on the basis of the Yolov3 Backbone network Darknet53, wherein the Backbone structure comprises 5 CSP modules (cross-level local connection modules), the standard YOLOv4 network is improved by nearly 10 points in accuracy relative to YOLOv3, but the speed is hardly reduced, and the YOLOv4 is a detection model with higher speed and better precision, and can complete training only by 1080Ti or 2080 Ti.)
After feature detection of the backbone network, the network is enhanced using standard features. (Standard feature enhancement architecture based on feature pyramid framework enhances communication propagation of features between layers, adding a bottom-up enhancement path to enhance the low-dimensional features in the detection and high-dimensional feature extraction tasks. Using cross-connects, the feature output extracted from each convolutional layer is added to the same stage feature map of the top-down path, which is then sent to the next stage.)
After the characteristic enhancement network, the standard YOLOv3 detection header is accessed in sequence through convolution connection. (the YOLOv3 network consists of a feature extraction network Darknet53 and a YOLOv3 detection head, and the YOLOv3 detection head detects the confidence, the category and the position of a target through 3 feature graphs with different scales, so that the characteristic with finer granularity can be detected, and the detection of the small target is facilitated).
Leaky ReLU was used as the activation function for the target detection model.
The training image is input into the target detection model, and the prediction target of the training image can be obtained.
S103: and training a target detection model by using the target of minimizing the difference between the predicted target and the sample target and the target of minimizing the cosine similarity between the filters of each filter group.
The loss function in the existing mode comprises the sum of a target frame loss function, a classification loss function and a confidence coefficient loss function.
The preset loss function comprises a target frame loss function,A sum of a classification loss function, a confidence loss function, and a filter bank loss function. The preset Loss function is Loss ═ Losscls+Lossconf+Lossbox+ λ r, wherein LossclsBeing a Loss-of-class function, LossconfAs a confidence Loss function, LossboxIs the target box loss function, r is the filter bank loss function, and λ is the coefficient. And optimally training the target detection model by minimizing a preset Loss function Loss.
According to the method, the cosine similarity between the filters of each filter group is calculated, the cosine similarity between the filters is minimized, and the generation of similar filters after rotation or symmetric transformation is inhibited. For a set of filter bank matrices K, the constraint term r is calculated as follows:
wherein a is a constant, kiFor the ith filter in the filter bank, kjFor the jth filter in the filter bank, all filters in the filter bank matrix K have equal Frobenius norms because they are rotated or flipped by the same filter. Thus, assuming that the filter bank contains a predetermined number n of filters, the above equation can be converted to:
wherein α' is a constant, kiFor the ith filter in the filter bank, kjFor the jth filter in the filter bank, n is a predetermined number, K is a filter bank matrix, tr (KK)T) Transposed traces that are K times K.
Because some filters have rotation invariance, namely a series of filters generated after some filters are subjected to rotation and symmetric transformation, similar feature extraction results are obtained for input, so that the computational complexity is increased, and the network efficiency is reduced. The method adds anti-symmetry constraint to construct a preset loss function. By minimizing cosine similarity among the filters of each group of filter banks, the occurrence of rotation invariant filters is effectively inhibited, the occurrence of redundant parameters is further inhibited, and the algorithm feature extraction efficiency is increased.
Further, the target detection model is trained by using a back propagation gradient algorithm, so that a preset loss function is minimized. Processing the batch size batchsize by training images, initializing a learning rate learnarrate, initializing a training period epoch, and training a target detection model by using a gradient descent training method.
As shown in step S102, the process of obtaining the filter bank when constructing the convolutional layer of the backbone network of the present application can be expressed as:
ksi=ki,Kdi=F(ki)|i=1,2,...N
wherein k isiIs the first filter randomly generated in each group of filters, i.e. the unit cell filter, with ksi(ii) a F (x) is the rotational and symmetric transformation in text; kdiIs to kiFilter matrix, k, obtained after conversion using a generator filterdiIs KdiExcept for the other elements of the single-bit filter. For each filter ki,kdiAnd ksiFor a filter bank with shared weights and corrections, these filters will be used on the same convolutional layer at the same time.
Due to weight multiplexing, the total gradient in the backpropagation gradient calculation can be obtained by the sum of two parts:
and repeatedly and iteratively updating the parameters of the target detection model until the training cycle number reaches epoch and then stopping training.
In the back propagation gradient calculation training, the weight correction values obtained by each group of the filters with the preset number are superposed, and basic parameters of the target detection model are corrected together. The overfitting phenomenon of model training can be effectively reduced, and the training of model parameters is accelerated. The influence of the non-uniform distribution of the features in different directions and the mismatch of the feature distribution between the training set and the test set on the detection result can be reduced.
Each group of filter groups of the main network of the target detection model comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and the converted filters in the same group share the same parameters, so that similar features can be extracted from multiple angles by rotating and symmetrically transforming weights, the number of irrelevant filters is reduced, the number of parameters of the target detection model is effectively reduced, and the effectiveness of feature extraction and the accuracy of target detection are ensured.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment of a detection method based on a target detection model according to the present application.
Another embodiment of the present application provides a detection method based on a target detection model, including the following steps:
s201: and acquiring a target image.
Acquiring a target image, wherein the target image can be a digital image or can be acquired by preprocessing a video image, converting an analog or digital video stream into the digital image, normalizing the standard RGB image to enable the pixel value to be normalized to be between [ -1,1], and sending the processed video image frame into a target detection model.
S202: and inputting the target image into the target detection model to obtain a detection result of the target image.
And inputting the target image into the target detection model to obtain a detection result of the target image. The target detection model comprises a backbone network, the backbone network comprises a plurality of convolution layers, each convolution layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group. The shared weight value comprises a weight correction value.
For the filters which are independent from each other and tend to be symmetrically similar after training, each filter group of the backbone network constructed by the method comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and the converted filters in the same group share the same parameters, so that similar features can be extracted from multiple angles through rotating and symmetrically transforming weights, the number of irrelevant filters is reduced, the number of parameters of a target detection model is effectively reduced, and the effectiveness of feature extraction and the accuracy of target detection are ensured.
When constructing the convolutional layer of the backbone network, the first filter randomly generated in each group of filters is a unit filter, the other filters obtained by rotation and/or symmetric change in each group of filters are generator filters, and the predetermined number of filters in each group of filters is at least two. For example, according to the quadtimetric dihedral group property, each filter bank includes eight filters obtained by one filter being rotated by 90 °, 180 °, 270 °, and a symmetric transformation, without rotation. By transformation, the filter can obtain similar characteristics in 8 different directions. The eight symmetric convolution filters after conversion share the same parameters. In the back propagation training, the weight correction values obtained by each set of eight convolution filters are superimposed, and the basic parameters are corrected together.
The target detection model of the present application can be obtained by training through the target detection model training method in any of the embodiments described above.
S203: and screening the detection result to obtain a final target detection result.
The detection result comprises an object frame of the initial object, an initial classification result of the initial object and an initial confidence of the initial object.
Screening the detection result, wherein the step of obtaining the final target detection result comprises the following steps:
and obtaining the classification index with the maximum probability in the initial classification result, and obtaining the final classification result of the initial target by contrasting the index table.
And acquiring a target frame value of the initial target, and acquiring the initial target frame by using a target frame conversion method. Specifically, a regression value of the initial target is taken, and the result is converted using standard YOLOv4 target box conversion to output the initial target box.
And re-scoring the initial confidence of the initial target frame, using a standard Matrix NMS screening result to screen out the initial target frame with high confidence as the target frame of the final target, and displaying a final target detection result, wherein the final target detection result comprises the target frame of the final target, a final classification result and the confidence. (the Matrix NMS rescored the confidence level of the target box to screen the target boxes by calculating IoU for each box the same as the maximum IoU and category among all other target boxes with a higher confidence level than itself.)
By the method, videos collected by the camera in real time can be initialized into video image streams, and the video image frames are sent to the target detection model to obtain an accurate target detection result. The target detection model is small in parameter quantity, and meanwhile, the effectiveness of feature extraction and the accuracy of a target detection result are effectively guaranteed.
Referring to fig. 4, fig. 4 is a schematic diagram of a framework of an embodiment of a target detection model training apparatus according to the present application.
The present application further provides a target detection model training apparatus 30, which includes an obtaining module 31, a network module 32, and a processing module 33, so as to implement the target detection model training method according to the above-mentioned corresponding embodiment. Specifically, the obtaining module 31 obtains a training image, and the processing module 33 processes the training image to label a sample target in the training image; the processing module 33 inputs the training image into the network module 32 to obtain a prediction target of the training image; the network module 32 includes a target detection model, the target detection model includes a backbone network, the backbone network includes a first plurality of convolutional layers, each convolutional layer includes a plurality of filter banks, each filter bank includes a predetermined number of filters obtained by rotating and/or turning over one filter, and the filters of the same filter bank share a weight; the processing module 33 trains the target detection model with the objective of minimizing the difference between the prediction target and the sample target and the objective of minimizing the cosine similarity between the filters of each filter bank.
Each filter group of the main network of the target detection model of the training device 30 includes a predetermined number of filters obtained by rotating and/or turning over one filter, and the converted filters in the same group share the same parameters, so that similar features can be extracted from multiple angles by rotating and symmetrically transforming weights, the number of irrelevant filters is reduced, the number of parameters of the target detection model is effectively reduced, and meanwhile, the feature extraction effectiveness and the target detection accuracy are ensured.
Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of a detection apparatus based on a target detection model according to the present application.
The present application further provides a target detection model-based detection apparatus 40, which includes an obtaining module 41, a network module 42, and a processing module 43, so as to implement the target detection model-based detection method according to the corresponding embodiment. Specifically, the acquisition module 41 acquires a target image; the obtaining module 41 inputs the target image into the network module 42 to obtain a detection result of the target image, and the network module 42 includes a target detection model; the target detection model comprises a backbone network, the backbone network comprises a plurality of convolution layers, each convolution layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group.
The detection device 40 may initialize the video collected by the camera in real time to the video image stream, and send the video image frame to the target detection model to obtain an accurate target detection result. The target detection model is small in parameter quantity, and meanwhile, the effectiveness of feature extraction and the accuracy of a target detection result are effectively guaranteed.
Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application.
Yet another embodiment of the present application provides an electronic device 50, which includes a memory 51 and a processor 52 coupled to each other, wherein the processor 52 is configured to execute program instructions stored in the memory 51 to implement the target detection model training method of any of the above embodiments and the target detection model-based detection method of any of the above embodiments. In one particular implementation scenario, electronic device 50 may include, but is not limited to: a microcomputer, a server, and the electronic device 50 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.
Specifically, the processor 52 is configured to control itself and the memory 51 to implement the steps of the target detection model training method of any of the above embodiments and the target detection model-based detection method of any of the above embodiments. Processor 52 may also be referred to as a CPU (Central Processing Unit). Processor 52 may be an integrated circuit chip having signal processing capabilities. The Processor 52 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 52 may be commonly implemented by an integrated circuit chip.
Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application.
Yet another embodiment of the present application provides a computer-readable storage medium 60, on which program data 61 are stored, and when executed by a processor, the program data 61 are implemented in the target detection model training method of any of the above embodiments and the target detection model-based detection method of any of the above embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium 60. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium 60 and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned readable storage medium 60 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.
Claims (10)
1. A method for training an object detection model, the method comprising:
acquiring a training image, and processing the training image to label a sample target in the training image;
inputting the training image into the target detection model to obtain a predicted target of the training image; the target detection model comprises a backbone network, wherein the backbone network comprises a plurality of convolutional layers, each convolutional layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group;
and training the target detection model by using the difference minimization target between the prediction target and the sample target as a target and the cosine similarity minimization between the filters of each filter group as a target.
2. The method of claim 1, wherein training the target detection model with the target of the difference between the prediction target and the sample target as a target and the minimization of cosine similarity between the filters of each filter bank comprises:
training the target detection model by using a back propagation gradient algorithm to minimize a preset loss function; the preset loss function comprises a target frame loss function, a classification loss function, a confidence loss function and the sum of filter bank loss functions, and the filter bank loss function comprises:
wherein α' is a constant, kiFor the ith filter in the filter bank, kjFor the jth filter in the filter bank, n is the predetermined number, K is a filter bank matrix, tr(KKT) Transposed traces that are K times K.
3. The method of claim 1, wherein the inter-filter sharing weights of the same filter bank comprises:
in the back propagation gradient algorithm, the weight and weight correction are shared among the filters of the same filter bank.
4. The method of claim 1, wherein the target detection model further comprises a feature enhancement network and a detection head module connected in series with the backbone network.
5. The method of claim 1, wherein each of the filter banks comprises eight filters obtained by non-rotation, rotation by 90 °, 180 °, 270 °, and symmetric transformation of one of the filters.
6. A detection method based on an object detection model is characterized by comprising the following steps:
acquiring a target image;
inputting the target image into the target detection model to obtain a detection result of the target image; the target detection model comprises a backbone network, the backbone network comprises a plurality of convolutional layers, each convolutional layer comprises a plurality of filter groups, each filter group comprises a predetermined number of filters obtained by rotating and/or overturning one filter, and weights are shared among the filters of the same filter group.
7. The method of claim 6, wherein the detection result comprises an object box value of an initial object, an initial classification result of the initial object, and an initial confidence level of the initial object, the method comprising:
obtaining the classification index with the maximum probability in the initial classification result, and obtaining a final classification result by contrasting an index table;
acquiring the target frame value of the initial target, and acquiring an initial target frame by using a target frame conversion method;
and re-scoring the initial confidence of the initial target frame to screen out a final target detection result.
8. The method of claim 6, wherein the object detection model is trained by the training method of any one of claims 1-5.
9. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method of any of claims 1 to 8.
10. A computer-readable storage medium, on which program data are stored, which program data, when being executed by a processor, carry out the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011475085.0A CN112633340B (en) | 2020-12-14 | 2020-12-14 | Target detection model training and detection method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011475085.0A CN112633340B (en) | 2020-12-14 | 2020-12-14 | Target detection model training and detection method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112633340A true CN112633340A (en) | 2021-04-09 |
CN112633340B CN112633340B (en) | 2024-04-02 |
Family
ID=75312807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011475085.0A Active CN112633340B (en) | 2020-12-14 | 2020-12-14 | Target detection model training and detection method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633340B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112781634A (en) * | 2021-04-12 | 2021-05-11 | 南京信息工程大学 | BOTDR distributed optical fiber sensing system based on YOLOv4 convolutional neural network |
CN113378635A (en) * | 2021-05-08 | 2021-09-10 | 北京迈格威科技有限公司 | Target attribute boundary condition searching method and device of target detection model |
CN113487022A (en) * | 2021-06-17 | 2021-10-08 | 千芯半导体科技(北京)有限公司 | High-precision compression method and device suitable for hardware circuit and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811276A (en) * | 2015-05-04 | 2015-07-29 | 东南大学 | DL-CNN (deep leaning-convolutional neutral network) demodulator for super-Nyquist rate communication |
US20170344879A1 (en) * | 2016-05-31 | 2017-11-30 | Linkedln Corporation | Training a neural network using another neural network |
CN108416250A (en) * | 2017-02-10 | 2018-08-17 | 浙江宇视科技有限公司 | Demographic method and device |
KR102037484B1 (en) * | 2019-03-20 | 2019-10-28 | 주식회사 루닛 | Method for performing multi-task learning and apparatus thereof |
CN111325169A (en) * | 2020-02-26 | 2020-06-23 | 河南理工大学 | Deep video fingerprint algorithm based on capsule network |
CN111695522A (en) * | 2020-06-15 | 2020-09-22 | 重庆邮电大学 | In-plane rotation invariant face detection method and device and storage medium |
-
2020
- 2020-12-14 CN CN202011475085.0A patent/CN112633340B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811276A (en) * | 2015-05-04 | 2015-07-29 | 东南大学 | DL-CNN (deep leaning-convolutional neutral network) demodulator for super-Nyquist rate communication |
US20170344879A1 (en) * | 2016-05-31 | 2017-11-30 | Linkedln Corporation | Training a neural network using another neural network |
CN108416250A (en) * | 2017-02-10 | 2018-08-17 | 浙江宇视科技有限公司 | Demographic method and device |
KR102037484B1 (en) * | 2019-03-20 | 2019-10-28 | 주식회사 루닛 | Method for performing multi-task learning and apparatus thereof |
CN111325169A (en) * | 2020-02-26 | 2020-06-23 | 河南理工大学 | Deep video fingerprint algorithm based on capsule network |
CN111695522A (en) * | 2020-06-15 | 2020-09-22 | 重庆邮电大学 | In-plane rotation invariant face detection method and device and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112781634A (en) * | 2021-04-12 | 2021-05-11 | 南京信息工程大学 | BOTDR distributed optical fiber sensing system based on YOLOv4 convolutional neural network |
CN113378635A (en) * | 2021-05-08 | 2021-09-10 | 北京迈格威科技有限公司 | Target attribute boundary condition searching method and device of target detection model |
CN113487022A (en) * | 2021-06-17 | 2021-10-08 | 千芯半导体科技(北京)有限公司 | High-precision compression method and device suitable for hardware circuit and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112633340B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021042828A1 (en) | Neural network model compression method and apparatus, and storage medium and chip | |
CN109584337B (en) | Image generation method for generating countermeasure network based on condition capsule | |
KR102545128B1 (en) | Client device with neural network and system including the same | |
CN112633340A (en) | Target detection model training method, target detection model training device, target detection model detection device and storage medium | |
US20200302265A1 (en) | Convolutional Neural Network-Based Image Processing Method And Image Processing Apparatus | |
CN111860398B (en) | Remote sensing image target detection method and system and terminal equipment | |
CN109086653B (en) | Handwriting model training method, handwritten character recognition method, device, equipment and medium | |
CN115937655B (en) | Multi-order feature interaction target detection model, construction method, device and application thereof | |
CN111046900A (en) | Semi-supervised generation confrontation network image classification method based on local manifold regularization | |
CN110070115B (en) | Single-pixel attack sample generation method, device, equipment and storage medium | |
CN112801104B (en) | Image pixel level pseudo label determination method and system based on semantic segmentation | |
CN111898703A (en) | Multi-label video classification method, model training method, device and medium | |
CN111178196B (en) | Cell classification method, device and equipment | |
US20230053911A1 (en) | Detecting an object in an image using multiband and multidirectional filtering | |
CN115222946A (en) | Single-stage example image segmentation method and device and computer equipment | |
CN113869282A (en) | Face recognition method, hyper-resolution model training method and related equipment | |
CN113344110A (en) | Fuzzy image classification method based on super-resolution reconstruction | |
CN112329808A (en) | Optimization method and system of Deeplab semantic segmentation algorithm | |
CN111694954A (en) | Image classification method and device and electronic equipment | |
CN114842478A (en) | Text area identification method, device, equipment and storage medium | |
CN113920511A (en) | License plate recognition method, model training method, electronic device and readable storage medium | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle | |
US11080551B2 (en) | Proposal region filter for digital image processing | |
CN112183336A (en) | Expression recognition model training method and device, terminal equipment and storage medium | |
CN116797850A (en) | Class increment image classification method based on knowledge distillation and consistency regularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |