CN114677504A - Target detection method, device, equipment terminal and readable storage medium - Google Patents
Target detection method, device, equipment terminal and readable storage medium Download PDFInfo
- Publication number
- CN114677504A CN114677504A CN202210600445.8A CN202210600445A CN114677504A CN 114677504 A CN114677504 A CN 114677504A CN 202210600445 A CN202210600445 A CN 202210600445A CN 114677504 A CN114677504 A CN 114677504A
- Authority
- CN
- China
- Prior art keywords
- attention
- feature map
- extraction
- feature
- intermediate feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 77
- 238000000605 extraction Methods 0.000 claims abstract description 131
- 238000010586 diagram Methods 0.000 claims abstract description 31
- 230000006870 function Effects 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 230000005284 excitation Effects 0.000 claims description 4
- 108010001267 Protein Subunits Proteins 0.000 claims 2
- 210000000988 bone and bone Anatomy 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a target detection method, a device, an equipment terminal and a readable storage medium, wherein the target detection method preprocesses each training picture in a training set through an input end to obtain a preprocessed training set; extracting the features of each training picture in the preprocessed training set based on a feature extraction unit to obtain intermediate feature maps with different scales; according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain respective corresponding attention extraction feature maps; respectively carrying out feature combination on each intermediate feature map and each corresponding attention extraction feature map to obtain each target feature map; respectively detecting each target characteristic diagram through a prediction output unit to generate corresponding prediction values; and performing loss function calculation according to the corresponding predicted value to generate a corresponding target detection model. The target detection method improves the accuracy of the target detection method on the whole.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a target detection method, apparatus, device terminal, and readable storage medium.
Background
With the deep application of the deep convolutional neural network in the field of computer vision, a real-time target detection model represented by a YOLO algorithm plays a good detection effect in the industrial field and in practical application scenes.
The YOLOv5-Lite model is improved on the basis of the previous generation YOLOv4, the training speed is higher, and the YOLOv5-Lite model has a smaller model size and is beneficial to rapid deployment of the model.
In practical application, a large number of targets with various sizes are generated in the near and far and complex application environments of a shooting scene, however, the targets with various sizes cannot be respectively subjected to characteristic extraction and collection in a targeted manner, and the target detection accuracy is not high overall.
Disclosure of Invention
In view of this, the present application provides a target detection method, an apparatus, a device terminal, and a readable storage medium, which can overcome a disadvantage that a YOLOv5-Lite model cannot respectively perform feature extraction and collection in a targeted manner when detecting targets of various sizes, and improve an overall detection accuracy of the YOLOv5-Lite model.
A target detection method is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, and the target detection method comprises the following steps:
acquiring picture input data as a training set;
preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
extracting the features of each training picture in the preprocessed training set based on a feature extraction unit to obtain intermediate feature maps with different scales;
according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
respectively detecting each target characteristic diagram through a prediction output unit to generate corresponding prediction values;
and calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
In one embodiment, the target detection method further comprises:
acquiring picture input data as a test set;
and testing the test set according to the target detection model, and outputting a corresponding target detection result.
In one embodiment, the feature extraction unit includes a backbone unit and a heck unit which are connected in sequence, the backbone unit is connected with the input end, the output end of the heck unit is connected with the attention unit, and the feature extraction unit performs feature extraction on each training picture in the preprocessed training set to obtain the attention extraction feature maps corresponding to the intermediate feature maps, and the step includes:
performing slicing operation and convolution operation on each training picture in the preprocessed training set based on a backbone unit to obtain an initial feature map;
and performing secondary feature extraction on the initial feature map based on a Neck unit to obtain intermediate feature maps with different scales.
In one embodiment, the attention unit includes a first attention subunit and a second attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the dimension of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through a first attention subunit to obtain a corresponding first attention extraction feature map;
and respectively extracting the features of the intermediate feature maps in the second scale and the third scale through a second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In one embodiment, the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
In one embodiment, the attention unit includes a first attention subunit, a second attention subunit, and a third attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through a first attention subunit to obtain a corresponding first attention extraction feature map;
performing feature extraction on the intermediate feature map of the second scale through a second attention subunit to obtain a second attention extraction feature map;
and performing feature extraction on the intermediate feature map of the third scale through a third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In an embodiment, a batch normalization layer is further connected between the feature extraction unit and the attention unit, and the step of obtaining at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map further includes:
respectively standardizing the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
In one embodiment, the formula employed in the normalization process is:
wherein, yiShowing the normalized intermediate feature map corresponding to the ith channel, m showing the number of channels per input intermediate feature map,represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,represents the overall variance of the input m-channel intermediate feature maps,andall represent constants.
In one embodiment, the loss function is:
wherein,represents the overall loss function value of the YOLOv5-Lite network,represents a penalty factor, x represents an input target feature map, f (x) represents a predicted value,the corresponding real value is represented by a value,representing the values of the loss function for x and y,a weight corresponding to each channel is represented,represents the utilization of L1Paradigm pair weightThe absolute value summation is performed, i and j each represent a positive integer variable,represents the preset dynamic adjustment factor corresponding to the ith channel,representing the jth preset dynamic adjustment factor.
In addition, the object detection device is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are connected in sequence, the attention unit comprises a plurality of different attention subunits, and the object detection device comprises:
the training set generation module is used for acquiring picture input data as a training set;
the preprocessing module is used for preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
the first feature map generation module is used for extracting features of each training picture in the preprocessed training set based on the feature extraction unit so as to obtain intermediate feature maps with different scales;
the second feature map generation module is used for acquiring at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
the predicted value generation module is used for respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
the prediction value generation module is used for respectively detecting each target characteristic diagram through the prediction output unit so as to generate a corresponding prediction value;
and the detection model generation module is used for calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
In addition, an apparatus terminal is provided, which includes a processor and a memory, the memory is used for storing a computer program, and the processor runs the computer program to make the apparatus terminal execute the above object detection method.
Furthermore, a readable storage medium is provided, which stores a computer program, which when executed by a processor implements the above object detection method.
The target detection method is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, the target detection method comprises the steps of obtaining picture input data as a training set, preprocessing each training picture in the training set through the input end to obtain a preprocessed training set, performing feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales, respectively performing feature extraction on each intermediate feature map through at least two attention subunits at least according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map, respectively performing feature combination on each intermediate feature map and the attention extraction feature maps corresponding to each intermediate feature map, the target detection method comprises the steps of respectively extracting the characteristics of each intermediate characteristic diagram by at least obtaining two attention subunits to obtain the attention extraction characteristic diagram corresponding to each intermediate characteristic diagram, and further enabling the target detection model to respectively extract corresponding characteristic information through the attention subunits corresponding to each intermediate characteristic diagram according to the size of each intermediate characteristic diagram when detecting the target of each size in the image, namely, respectively and pertinently extracting and collecting the characteristics of the target of each size, meanwhile, each intermediate feature map and the attention extraction feature maps corresponding to each intermediate feature map are further subjected to feature merging respectively to obtain each target feature map, on one hand, more information is extracted from the original intermediate feature map due to the attention extraction feature maps, on the other hand, the original intermediate feature maps are reserved, and then the information of the two feature maps is merged, so that more useful feature information is obtained, and the detection accuracy of the targets of all sizes is further improved on the whole.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic application environment diagram of a target detection method provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a target detection method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of another target detection method provided in the embodiments of the present application;
FIG. 4 is a schematic flowchart of a method for obtaining intermediate feature maps of different scales according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an attention unit according to an embodiment of the present disclosure;
fig. 6 is a flowchart illustrating a method for obtaining attention extraction feature maps corresponding to respective intermediate feature maps according to an embodiment of the present application;
FIG. 7 is a block diagram of another attention unit configuration provided in an embodiment of the present application;
fig. 8 is a schematic flowchart of another method for obtaining attention extraction feature maps corresponding to respective intermediate feature maps according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an application environment of another target detection method provided in an embodiment of the present application;
FIG. 10 is a schematic flow chart diagram illustrating a further method for detecting an object according to an embodiment of the present disclosure;
fig. 11 is a block diagram of a target detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The embodiments described below and their technical features may be combined with each other without conflict.
As shown in fig. 1, an application environment schematic diagram of an object detection method is provided, fig. 1 is a schematic structural block diagram of a YOLOv5-Lite network, the YOLOv5-Lite network includes an input end 11, a feature extraction unit 12, an attention unit 13 and a prediction output unit 14, which are connected in sequence, and the attention unit 13 includes a plurality of different attention subunits.
As shown in fig. 2, there is provided an object detection method including:
step S110, acquiring picture input data as a training set.
When the target detection is performed, a training set needs to be established to obtain a target detection model, and a large amount of picture input data needs to be acquired as the training set.
And step S120, preprocessing each training picture in the training set through the input end to obtain a preprocessed training set.
Each training picture in the training set needs to be further preprocessed, because many shooting pictures in the picture input data are not labeled yet, in addition, the preprocessing can also comprise at least one of data enhancement processing, self-adaptive anchor frame calculation and self-adaptive picture scaling processing, and further the preprocessed training set is obtained.
And step S130, extracting the features of each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales.
The YOLOv5-Lite network generally comprises a plurality of feature extraction units, and the YOLOv5-Lite network performs feature extraction on each training picture in the preprocessed training set through the plurality of feature extraction units to obtain intermediate feature maps with different scales.
Step S140, at least two attention subunits are obtained according to the size of each intermediate feature map to perform feature extraction on each intermediate feature map respectively, so as to obtain an attention extraction feature map corresponding to each intermediate feature map.
And respectively extracting the features of each intermediate feature map by adopting a corresponding proper attention subunit according to the size of each intermediate feature map, so as to obtain the attention extraction feature map corresponding to each intermediate feature map.
In an embodiment, three intermediate feature maps with different scales are obtained, at this time, according to the size of each intermediate feature map, at least two attention subunits may be obtained to perform feature extraction on each intermediate feature map respectively, so as to obtain an attention extraction feature map corresponding to each intermediate feature map, where one attention subunit is used to perform feature extraction on the intermediate feature map of one scale, and the other attention subunit is used to perform feature extraction on the intermediate feature maps of the remaining two scales.
In this embodiment, corresponding attention subunits are respectively adopted to perform feature extraction on each intermediate feature graph with different scales, so that when the target detection model detects targets with various sizes in a picture, corresponding feature information can be respectively extracted through the corresponding attention subunits according to the sizes of the intermediate feature graphs, that is, feature extraction and collection can be respectively performed on the targets with various sizes in a targeted manner.
And step S150, respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map.
On one hand, the attention extraction feature map extracts more information from the original intermediate feature map, on the other hand, the original intermediate feature map is retained, and then the information of the two feature maps is merged, so that more useful feature information is obtained, and the detection accuracy of the objects of all sizes is further improved on the whole.
In step S160, the prediction output unit detects each target feature map to generate a corresponding prediction value.
Wherein, the prediction output unit is generally referred to as head part in the Yolov5-Lite network.
And S170, calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
The target detection method is applied to a Yolov5-Lite network, the Yolov5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, the target detection method respectively extracts features of each intermediate feature map by acquiring at least two attention subunits to obtain the attention extraction feature maps respectively corresponding to the intermediate feature maps, and further when the target detection model detects targets of various sizes in a picture, corresponding feature information can be respectively extracted by the attention subunits respectively corresponding to the intermediate feature maps according to the sizes of the intermediate feature maps, namely, the targets of various sizes can be respectively and pertinently subjected to feature extraction and collection, and meanwhile, the respective intermediate feature maps and the attention extraction feature maps respectively corresponding to the intermediate feature maps are respectively subjected to feature combination, on one hand, the attention extraction characteristic diagram extracts more information from the original intermediate characteristic diagram, and on the other hand, the original intermediate characteristic diagram is retained, so that the information of the two characteristic diagrams is combined, more useful characteristic information is obtained, and the detection accuracy of the target with each size is further improved on the whole.
In one embodiment, as shown in fig. 3, the target detection method further includes:
and step S180, acquiring picture input data as a test set.
And S190, testing the test set according to the target detection model, and outputting a corresponding target detection result.
In one embodiment, as shown in fig. 1, the feature extraction unit 12 includes a backbone unit and a Neck unit connected in sequence, the backbone unit is connected to the input end 11, and the output end of the Neck unit is connected to the attention unit 13, as shown in fig. 4, and the step S130 includes:
step S132, based on the backbone unit, slicing operation and convolution operation are carried out on each training picture in the preprocessed training set, so as to obtain an initial feature map.
And S134, performing secondary feature extraction on the initial feature map based on the Neck unit to obtain intermediate feature maps with different scales.
In one embodiment, as shown in fig. 5, the attention unit 13 includes a first attention subunit 13a and a second attention subunit 13b, as shown in fig. 6, step S140 includes:
in step S141, feature extraction is performed on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map.
And step S142, respectively performing feature extraction on the intermediate feature maps of the second scale and the third scale through a second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In this embodiment, the first attention subunit performs feature extraction on the intermediate feature map with the largest scale (i.e., the intermediate feature map with the first scale), and then, for the intermediate feature maps with the smaller scales, the second attention subunit performs feature extraction, so that more feature information can be extracted from the intermediate feature map with the smaller scale, that is, corresponding feature information can be respectively extracted through the respective corresponding attention subunits according to the sizes of the intermediate feature maps, thereby implementing feature extraction and collection on targets with various sizes respectively and specifically, and further improving the overall detection accuracy on the targets with various sizes.
In one embodiment, the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
The compression and Excitation Module is (SE Module), and the convolution Block Attention Module is (CBAM Module).
In one embodiment, as shown in fig. 7, the attention unit includes a first attention subunit 13a, a second attention subunit 13b and a third attention subunit 13c, as shown in fig. 8, and step S140 includes:
and step S143, performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map.
And step S144, performing feature extraction on the intermediate feature map of the second scale through a second attention subunit to obtain a second attention extraction feature map.
And S145, performing feature extraction on the intermediate feature map of the third scale through a third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In this embodiment, the first attention subunit performs feature extraction on the intermediate feature map with the largest scale (that is, the intermediate feature map with the first scale), then, for the intermediate feature map with the second scale with the smaller scale, the second attention subunit performs feature extraction, and for the intermediate feature map with the third scale with the smaller scale, the third attention subunit performs feature extraction, so that more feature information can be further extracted from the intermediate feature map with the smaller scale, that is, the corresponding feature information can be further extracted through the respective attention subunits according to the sizes of the intermediate feature maps, thereby implementing feature extraction and collection respectively and specifically for targets with various sizes, and further improving the detection accuracy of the targets with various sizes as a whole.
In one embodiment, as shown in fig. 9, a batch normalization layer 15 is further connected between the feature extraction unit 12 and the attention unit 13, and as shown in fig. 10, step S140 further includes:
and S200, respectively carrying out standardization processing on the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
In this embodiment, the intermediate feature maps are normalized by the batch normalization layer, and a preset dynamic adjustment factor is added, where the preset dynamic adjustment factor can reflect the degree of information change in each intermediate feature map, that is, the variance of the batch normalization layer, in other words, the variance can reflect the degree of information change, and the larger the variance is, the larger the degree of information change is, the richer the information therein is, and the higher the importance is, whereas the smaller the variance is, the smaller the degree of information change is, and the smaller the importance is, so that by setting the batch normalization layer, the feature map information can be better extracted by the subsequent attention unit.
In the process of performing the subsequent steps S140 to S150, the normalized intermediate feature maps with different scales need to be processed, and the steps S160 to S170 are not changed, as shown in fig. 10, that is:
step S140, at least two attention subunits are obtained to respectively perform feature extraction on each normalized intermediate feature map according to the size of each normalized intermediate feature map, so as to obtain an attention extraction feature map corresponding to each normalized intermediate feature map.
And step S150, respectively carrying out feature merging on each normalized intermediate feature map and the attention extraction feature maps corresponding to each normalized intermediate feature map to obtain each target feature map.
In one embodiment, the formula employed in the normalization process is:
wherein, yiRepresents the normalized intermediate feature map corresponding to the ith channel, m represents the number of channels per input intermediate feature map,represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,represents the overall variance of the input m-channel intermediate feature maps,andall represent constants.
In one embodiment, the loss function is:
wherein,represents the overall loss function value of the YOLOv5-Lite network,represents a penalty factor, x represents an input target feature map, f (x) represents a predicted value,the corresponding real value is represented by a value,representing the values of the loss function for x and y,a weight corresponding to each channel is represented,represents the utilization of L1Paradigm pair weightThe absolute value summation is performed, i and j each represent a positive integer variable,represents the preset dynamic adjustment factor corresponding to the ith channel,representing the jth preset dynamic adjustment factor.
On the basis of the embodiment shown in fig. 8, a batch normalization layer is arranged, so that the integral loss function of the YOLOv5-Lite network comprisesAnd further, the loss function can be adjusted, and the accuracy of the whole target detection is improved on the whole.
In addition, as shown in fig. 11, there is also provided an object detection apparatus 300 applied to the YOLOv5-Lite network shown in fig. 1, the object detection apparatus 300 including:
a training set generating module 310, configured to obtain picture input data as a training set;
the preprocessing module 320 is used for preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
a first feature map generation module 330, configured to perform feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps of different scales;
a second feature map generation module 340, configured to obtain at least two attention subunits according to the size of each intermediate feature map, and perform feature extraction on each intermediate feature map respectively to obtain an attention extraction feature map corresponding to each intermediate feature map;
the target feature map generation module 350 is configured to perform feature merging on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps, so as to obtain each target feature map;
and the predicted value generation module 360 detects each target feature map through the prediction output unit to generate corresponding predicted values.
And the detection model generation module 370 performs loss function calculation according to the corresponding predicted value to obtain an optimized gradient, and performs weight and bias updating until the loss function converges to generate a corresponding target detection model.
In addition, an apparatus terminal is provided, which includes a processor and a memory, the memory is used for storing a computer program, and the processor runs the computer program to make the apparatus terminal execute the above object detection method.
Furthermore, a readable storage medium is provided, which stores a computer program, which when executed by a processor implements the above object detection method.
The division of the units in the device is only used for illustration, and in other embodiments, the device may be divided into different units as needed to complete all or part of the functions of the device. For the specific limitations of the above device, reference may be made to the limitations of the above method, which are not described herein again.
That is, the above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, such as mutual combination of technical features between various embodiments, or direct or indirect application to other related technical fields, are included in the scope of the present application.
In addition, structural elements having the same or similar characteristics may be identified by the same or different reference numerals. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "for example" is used to mean "serving as an example, instance, or illustration". Any embodiment described herein as "for example" is not necessarily to be construed as preferred or advantageous over other embodiments. The previous description is provided to enable any person skilled in the art to make and use the present application. In the foregoing description, various details have been set forth for the purpose of explanation.
It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Claims (12)
1. An object detection method applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprising an input terminal, a feature extraction unit, an attention unit and a prediction output unit which are connected in sequence, the attention unit comprising a plurality of different attention sub-units, the object detection method comprising:
acquiring picture input data as a training set;
preprocessing each training picture in the training set through the input end to obtain a preprocessed training set;
extracting the features of each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales;
according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
detecting each target characteristic diagram through the prediction output unit to generate corresponding prediction values; and calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
2. The object detection method according to claim 1, characterized in that the object detection method further comprises:
acquiring picture input data as a test set;
and testing the test set according to the target detection model, and outputting a corresponding target detection result.
3. The target detection method according to claim 1, wherein the feature extraction unit includes a backbone unit and a heck unit, which are connected in sequence, the backbone unit is connected to the input end, an output end of the heck unit is connected to the attention unit, and the step of performing feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain the attention extraction feature maps corresponding to the intermediate feature maps includes: performing slicing operation and convolution operation on each training picture in the preprocessed training set based on the back bone unit to obtain an initial feature map;
and performing secondary feature extraction on the initial feature map based on the Neck unit to obtain intermediate feature maps with different scales.
4. The object detection method according to claim 1, wherein the attention unit includes a first attention subunit and a second attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map;
and respectively extracting features of the intermediate feature maps of the second scale and the third scale through the second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
5. The method of claim 4, wherein the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
6. The object detection method according to claim 1, wherein the attention unit includes a first attention subunit, a second attention subunit, and a third attention subunit, the intermediate feature maps have three scales, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map comprises:
performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map;
performing feature extraction on the intermediate feature map of the second scale through the second attention subunit to obtain a second attention extraction feature map;
and performing feature extraction on the intermediate feature map of the third scale through the third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
7. The object detection method according to claim 1, wherein a batch normalization layer is further connected between the feature extraction unit and the attention unit, and the step of obtaining at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map further comprises:
respectively standardizing the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
8. The object detection method according to claim 7, wherein the formula employed in the normalization process is:
wherein, yiShowing the normalized intermediate feature map corresponding to the ith channel, m showing the number of channels per input intermediate feature map,represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,represents the overall variance of the input m-channel intermediate feature maps,andall represent constants.
9. The object detection method of claim 8, wherein the loss function is:
wherein,represents the overall loss function value of the YOLOv5-Lite network,representing a penalty factor, x representing the input target feature map, f (x) representing a predictor,the corresponding real value is represented by a value,representing the values of the loss function for x and y,a weight corresponding to each channel is represented,represents the utilization of L1Paradigm pair weightThe absolute value summation is performed, i and j each represent a positive integer variable,represents the preset dynamic adjustment factor corresponding to the ith channel,representing the jth preset dynamic adjustment factor.
10. An object detection device applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprising an input terminal, a feature extraction unit, an attention unit and a prediction output unit connected in sequence, the attention unit comprising a plurality of different attention sub-units, the object detection device comprising:
the training set generation module is used for acquiring picture input data as a training set;
the preprocessing module is used for preprocessing each training picture in the training set through the input end to obtain a preprocessed training set;
the first feature map generation module is used for extracting features of each training picture in the preprocessed training set based on a feature extraction unit so as to obtain intermediate feature maps with different scales;
the second feature map generation module is used for acquiring at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
the predicted value generation module is used for respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
the prediction value generation module is used for respectively detecting each target characteristic diagram through the prediction output unit so as to generate a corresponding prediction value;
and the detection model generation module is used for calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
11. A device terminal, characterized in that it comprises a processor and a memory for storing a computer program, the processor running the computer program to cause the device terminal to perform the object detection method of any one of claims 1 to 9.
12. A readable storage medium, characterized in that the readable storage medium stores a computer program which, when executed by a processor, implements the object detection method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210600445.8A CN114677504B (en) | 2022-05-30 | 2022-05-30 | Target detection method, device, equipment terminal and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210600445.8A CN114677504B (en) | 2022-05-30 | 2022-05-30 | Target detection method, device, equipment terminal and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114677504A true CN114677504A (en) | 2022-06-28 |
CN114677504B CN114677504B (en) | 2022-11-15 |
Family
ID=82081145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210600445.8A Active CN114677504B (en) | 2022-05-30 | 2022-05-30 | Target detection method, device, equipment terminal and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114677504B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049851A (en) * | 2022-08-15 | 2022-09-13 | 深圳市爱深盈通信息技术有限公司 | Target detection method, device and equipment terminal based on YOLOv5 network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688723A (en) * | 2021-08-21 | 2021-11-23 | 河南大学 | Infrared image pedestrian target detection method based on improved YOLOv5 |
CN113920107A (en) * | 2021-10-29 | 2022-01-11 | 西安工程大学 | Insulator damage detection method based on improved yolov5 algorithm |
CN114005105A (en) * | 2021-12-30 | 2022-02-01 | 青岛以萨数据技术有限公司 | Driving behavior detection method and device and electronic equipment |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114359851A (en) * | 2021-12-02 | 2022-04-15 | 广州杰赛科技股份有限公司 | Unmanned target detection method, device, equipment and medium |
CN114494415A (en) * | 2021-12-31 | 2022-05-13 | 北京建筑大学 | Method for detecting, identifying and measuring gravel pile by automatic driving loader |
-
2022
- 2022-05-30 CN CN202210600445.8A patent/CN114677504B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688723A (en) * | 2021-08-21 | 2021-11-23 | 河南大学 | Infrared image pedestrian target detection method based on improved YOLOv5 |
CN113920107A (en) * | 2021-10-29 | 2022-01-11 | 西安工程大学 | Insulator damage detection method based on improved yolov5 algorithm |
CN114359851A (en) * | 2021-12-02 | 2022-04-15 | 广州杰赛科技股份有限公司 | Unmanned target detection method, device, equipment and medium |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114005105A (en) * | 2021-12-30 | 2022-02-01 | 青岛以萨数据技术有限公司 | Driving behavior detection method and device and electronic equipment |
CN114494415A (en) * | 2021-12-31 | 2022-05-13 | 北京建筑大学 | Method for detecting, identifying and measuring gravel pile by automatic driving loader |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049851A (en) * | 2022-08-15 | 2022-09-13 | 深圳市爱深盈通信息技术有限公司 | Target detection method, device and equipment terminal based on YOLOv5 network |
CN115049851B (en) * | 2022-08-15 | 2023-01-17 | 深圳市爱深盈通信息技术有限公司 | Target detection method, device and equipment terminal based on YOLOv5 network |
Also Published As
Publication number | Publication date |
---|---|
CN114677504B (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210560B (en) | Incremental training method, classification method and device, equipment and medium of classification network | |
WO2020098250A1 (en) | Character recognition method, server, and computer readable storage medium | |
CN110188829B (en) | Neural network training method, target recognition method and related products | |
CN108846404B (en) | Image significance detection method and device based on related constraint graph sorting | |
CN109413510B (en) | Video abstract generation method and device, electronic equipment and computer storage medium | |
CN111914908B (en) | Image recognition model training method, image recognition method and related equipment | |
CN110929836B (en) | Neural network training and image processing method and device, electronic equipment and medium | |
CN112257738A (en) | Training method and device of machine learning model and classification method and device of image | |
CN114241505B (en) | Method and device for extracting chemical structure image, storage medium and electronic equipment | |
CN111738270B (en) | Model generation method, device, equipment and readable storage medium | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
CN111931867B (en) | New coronary pneumonia X-ray image classification method and system based on lightweight model | |
CN115240280A (en) | Construction method of human face living body detection classification model, detection classification method and device | |
CN114677504B (en) | Target detection method, device, equipment terminal and readable storage medium | |
CN115937571A (en) | Device and method for detecting sphericity of glass for vehicle | |
CN113112518A (en) | Feature extractor generation method and device based on spliced image and computer equipment | |
CN114267089B (en) | Method, device and equipment for identifying forged image | |
CN115937596A (en) | Target detection method, training method and device of model thereof, and storage medium | |
CN115205547A (en) | Target image detection method and device, electronic equipment and storage medium | |
CN109671055A (en) | Pulmonary nodule detection method and device | |
CN110135428B (en) | Image segmentation processing method and device | |
CN111814846A (en) | Training method and recognition method of attribute recognition model and related equipment | |
CN112446428B (en) | Image data processing method and device | |
CN111179245B (en) | Image quality detection method, device, electronic equipment and storage medium | |
CN115049851B (en) | Target detection method, device and equipment terminal based on YOLOv5 network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230704 Address after: 13C-18, Caihong Building, Caihong Xindu, No. 3002, Caitian South Road, Gangsha Community, Futian Street, Futian District, Shenzhen, Guangdong 518033 Patentee after: Core Computing Integrated (Shenzhen) Technology Co.,Ltd. Address before: 518000 1001, building G3, TCL International e city, Shuguang community, Xili street, Nanshan District, Shenzhen City, Guangdong Province Patentee before: Shenzhen Aishen Yingtong Information Technology Co.,Ltd. |
|
TR01 | Transfer of patent right |