CN114677504A - Target detection method, device, equipment terminal and readable storage medium - Google Patents

Target detection method, device, equipment terminal and readable storage medium Download PDF

Info

Publication number
CN114677504A
CN114677504A CN202210600445.8A CN202210600445A CN114677504A CN 114677504 A CN114677504 A CN 114677504A CN 202210600445 A CN202210600445 A CN 202210600445A CN 114677504 A CN114677504 A CN 114677504A
Authority
CN
China
Prior art keywords
attention
feature map
extraction
feature
intermediate feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210600445.8A
Other languages
Chinese (zh)
Other versions
CN114677504B (en
Inventor
陈磊
周有喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Core Computing Integrated Shenzhen Technology Co ltd
Original Assignee
Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Aishen Yingtong Information Technology Co Ltd filed Critical Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority to CN202210600445.8A priority Critical patent/CN114677504B/en
Publication of CN114677504A publication Critical patent/CN114677504A/en
Application granted granted Critical
Publication of CN114677504B publication Critical patent/CN114677504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a target detection method, a device, an equipment terminal and a readable storage medium, wherein the target detection method preprocesses each training picture in a training set through an input end to obtain a preprocessed training set; extracting the features of each training picture in the preprocessed training set based on a feature extraction unit to obtain intermediate feature maps with different scales; according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain respective corresponding attention extraction feature maps; respectively carrying out feature combination on each intermediate feature map and each corresponding attention extraction feature map to obtain each target feature map; respectively detecting each target characteristic diagram through a prediction output unit to generate corresponding prediction values; and performing loss function calculation according to the corresponding predicted value to generate a corresponding target detection model. The target detection method improves the accuracy of the target detection method on the whole.

Description

Target detection method, device, equipment terminal and readable storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to a target detection method, apparatus, device terminal, and readable storage medium.
Background
With the deep application of the deep convolutional neural network in the field of computer vision, a real-time target detection model represented by a YOLO algorithm plays a good detection effect in the industrial field and in practical application scenes.
The YOLOv5-Lite model is improved on the basis of the previous generation YOLOv4, the training speed is higher, and the YOLOv5-Lite model has a smaller model size and is beneficial to rapid deployment of the model.
In practical application, a large number of targets with various sizes are generated in the near and far and complex application environments of a shooting scene, however, the targets with various sizes cannot be respectively subjected to characteristic extraction and collection in a targeted manner, and the target detection accuracy is not high overall.
Disclosure of Invention
In view of this, the present application provides a target detection method, an apparatus, a device terminal, and a readable storage medium, which can overcome a disadvantage that a YOLOv5-Lite model cannot respectively perform feature extraction and collection in a targeted manner when detecting targets of various sizes, and improve an overall detection accuracy of the YOLOv5-Lite model.
A target detection method is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, and the target detection method comprises the following steps:
acquiring picture input data as a training set;
preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
extracting the features of each training picture in the preprocessed training set based on a feature extraction unit to obtain intermediate feature maps with different scales;
according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
respectively detecting each target characteristic diagram through a prediction output unit to generate corresponding prediction values;
and calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
In one embodiment, the target detection method further comprises:
acquiring picture input data as a test set;
and testing the test set according to the target detection model, and outputting a corresponding target detection result.
In one embodiment, the feature extraction unit includes a backbone unit and a heck unit which are connected in sequence, the backbone unit is connected with the input end, the output end of the heck unit is connected with the attention unit, and the feature extraction unit performs feature extraction on each training picture in the preprocessed training set to obtain the attention extraction feature maps corresponding to the intermediate feature maps, and the step includes:
performing slicing operation and convolution operation on each training picture in the preprocessed training set based on a backbone unit to obtain an initial feature map;
and performing secondary feature extraction on the initial feature map based on a Neck unit to obtain intermediate feature maps with different scales.
In one embodiment, the attention unit includes a first attention subunit and a second attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the dimension of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through a first attention subunit to obtain a corresponding first attention extraction feature map;
and respectively extracting the features of the intermediate feature maps in the second scale and the third scale through a second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In one embodiment, the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
In one embodiment, the attention unit includes a first attention subunit, a second attention subunit, and a third attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through a first attention subunit to obtain a corresponding first attention extraction feature map;
performing feature extraction on the intermediate feature map of the second scale through a second attention subunit to obtain a second attention extraction feature map;
and performing feature extraction on the intermediate feature map of the third scale through a third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In an embodiment, a batch normalization layer is further connected between the feature extraction unit and the attention unit, and the step of obtaining at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map further includes:
respectively standardizing the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
In one embodiment, the formula employed in the normalization process is:
Figure 142000DEST_PATH_IMAGE001
wherein, yiShowing the normalized intermediate feature map corresponding to the ith channel, m showing the number of channels per input intermediate feature map,
Figure 372124DEST_PATH_IMAGE002
represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,
Figure 947330DEST_PATH_IMAGE003
represents the overall variance of the input m-channel intermediate feature maps,
Figure 313721DEST_PATH_IMAGE004
and
Figure 699703DEST_PATH_IMAGE005
all represent constants.
In one embodiment, the loss function is:
Figure 657602DEST_PATH_IMAGE006
wherein,
Figure 787232DEST_PATH_IMAGE007
represents the overall loss function value of the YOLOv5-Lite network,
Figure 257396DEST_PATH_IMAGE008
represents a penalty factor, x represents an input target feature map, f (x) represents a predicted value,
Figure 814279DEST_PATH_IMAGE009
the corresponding real value is represented by a value,
Figure 815733DEST_PATH_IMAGE010
representing the values of the loss function for x and y,
Figure 936005DEST_PATH_IMAGE011
a weight corresponding to each channel is represented,
Figure 73725DEST_PATH_IMAGE012
represents the utilization of L1Paradigm pair weight
Figure 4772DEST_PATH_IMAGE013
The absolute value summation is performed, i and j each represent a positive integer variable,
Figure 742790DEST_PATH_IMAGE002
represents the preset dynamic adjustment factor corresponding to the ith channel,
Figure 151906DEST_PATH_IMAGE014
representing the jth preset dynamic adjustment factor.
In addition, the object detection device is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are connected in sequence, the attention unit comprises a plurality of different attention subunits, and the object detection device comprises:
the training set generation module is used for acquiring picture input data as a training set;
the preprocessing module is used for preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
the first feature map generation module is used for extracting features of each training picture in the preprocessed training set based on the feature extraction unit so as to obtain intermediate feature maps with different scales;
the second feature map generation module is used for acquiring at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
the predicted value generation module is used for respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
the prediction value generation module is used for respectively detecting each target characteristic diagram through the prediction output unit so as to generate a corresponding prediction value;
and the detection model generation module is used for calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
In addition, an apparatus terminal is provided, which includes a processor and a memory, the memory is used for storing a computer program, and the processor runs the computer program to make the apparatus terminal execute the above object detection method.
Furthermore, a readable storage medium is provided, which stores a computer program, which when executed by a processor implements the above object detection method.
The target detection method is applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, the target detection method comprises the steps of obtaining picture input data as a training set, preprocessing each training picture in the training set through the input end to obtain a preprocessed training set, performing feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales, respectively performing feature extraction on each intermediate feature map through at least two attention subunits at least according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map, respectively performing feature combination on each intermediate feature map and the attention extraction feature maps corresponding to each intermediate feature map, the target detection method comprises the steps of respectively extracting the characteristics of each intermediate characteristic diagram by at least obtaining two attention subunits to obtain the attention extraction characteristic diagram corresponding to each intermediate characteristic diagram, and further enabling the target detection model to respectively extract corresponding characteristic information through the attention subunits corresponding to each intermediate characteristic diagram according to the size of each intermediate characteristic diagram when detecting the target of each size in the image, namely, respectively and pertinently extracting and collecting the characteristics of the target of each size, meanwhile, each intermediate feature map and the attention extraction feature maps corresponding to each intermediate feature map are further subjected to feature merging respectively to obtain each target feature map, on one hand, more information is extracted from the original intermediate feature map due to the attention extraction feature maps, on the other hand, the original intermediate feature maps are reserved, and then the information of the two feature maps is merged, so that more useful feature information is obtained, and the detection accuracy of the targets of all sizes is further improved on the whole.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic application environment diagram of a target detection method provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a target detection method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of another target detection method provided in the embodiments of the present application;
FIG. 4 is a schematic flowchart of a method for obtaining intermediate feature maps of different scales according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an attention unit according to an embodiment of the present disclosure;
fig. 6 is a flowchart illustrating a method for obtaining attention extraction feature maps corresponding to respective intermediate feature maps according to an embodiment of the present application;
FIG. 7 is a block diagram of another attention unit configuration provided in an embodiment of the present application;
fig. 8 is a schematic flowchart of another method for obtaining attention extraction feature maps corresponding to respective intermediate feature maps according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an application environment of another target detection method provided in an embodiment of the present application;
FIG. 10 is a schematic flow chart diagram illustrating a further method for detecting an object according to an embodiment of the present disclosure;
fig. 11 is a block diagram of a target detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The embodiments described below and their technical features may be combined with each other without conflict.
As shown in fig. 1, an application environment schematic diagram of an object detection method is provided, fig. 1 is a schematic structural block diagram of a YOLOv5-Lite network, the YOLOv5-Lite network includes an input end 11, a feature extraction unit 12, an attention unit 13 and a prediction output unit 14, which are connected in sequence, and the attention unit 13 includes a plurality of different attention subunits.
As shown in fig. 2, there is provided an object detection method including:
step S110, acquiring picture input data as a training set.
When the target detection is performed, a training set needs to be established to obtain a target detection model, and a large amount of picture input data needs to be acquired as the training set.
And step S120, preprocessing each training picture in the training set through the input end to obtain a preprocessed training set.
Each training picture in the training set needs to be further preprocessed, because many shooting pictures in the picture input data are not labeled yet, in addition, the preprocessing can also comprise at least one of data enhancement processing, self-adaptive anchor frame calculation and self-adaptive picture scaling processing, and further the preprocessed training set is obtained.
And step S130, extracting the features of each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales.
The YOLOv5-Lite network generally comprises a plurality of feature extraction units, and the YOLOv5-Lite network performs feature extraction on each training picture in the preprocessed training set through the plurality of feature extraction units to obtain intermediate feature maps with different scales.
Step S140, at least two attention subunits are obtained according to the size of each intermediate feature map to perform feature extraction on each intermediate feature map respectively, so as to obtain an attention extraction feature map corresponding to each intermediate feature map.
And respectively extracting the features of each intermediate feature map by adopting a corresponding proper attention subunit according to the size of each intermediate feature map, so as to obtain the attention extraction feature map corresponding to each intermediate feature map.
In an embodiment, three intermediate feature maps with different scales are obtained, at this time, according to the size of each intermediate feature map, at least two attention subunits may be obtained to perform feature extraction on each intermediate feature map respectively, so as to obtain an attention extraction feature map corresponding to each intermediate feature map, where one attention subunit is used to perform feature extraction on the intermediate feature map of one scale, and the other attention subunit is used to perform feature extraction on the intermediate feature maps of the remaining two scales.
In this embodiment, corresponding attention subunits are respectively adopted to perform feature extraction on each intermediate feature graph with different scales, so that when the target detection model detects targets with various sizes in a picture, corresponding feature information can be respectively extracted through the corresponding attention subunits according to the sizes of the intermediate feature graphs, that is, feature extraction and collection can be respectively performed on the targets with various sizes in a targeted manner.
And step S150, respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map.
On one hand, the attention extraction feature map extracts more information from the original intermediate feature map, on the other hand, the original intermediate feature map is retained, and then the information of the two feature maps is merged, so that more useful feature information is obtained, and the detection accuracy of the objects of all sizes is further improved on the whole.
In step S160, the prediction output unit detects each target feature map to generate a corresponding prediction value.
Wherein, the prediction output unit is generally referred to as head part in the Yolov5-Lite network.
And S170, calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
The target detection method is applied to a Yolov5-Lite network, the Yolov5-Lite network comprises an input end, a feature extraction unit, an attention unit and a prediction output unit which are sequentially connected, the attention unit comprises a plurality of different attention subunits, the target detection method respectively extracts features of each intermediate feature map by acquiring at least two attention subunits to obtain the attention extraction feature maps respectively corresponding to the intermediate feature maps, and further when the target detection model detects targets of various sizes in a picture, corresponding feature information can be respectively extracted by the attention subunits respectively corresponding to the intermediate feature maps according to the sizes of the intermediate feature maps, namely, the targets of various sizes can be respectively and pertinently subjected to feature extraction and collection, and meanwhile, the respective intermediate feature maps and the attention extraction feature maps respectively corresponding to the intermediate feature maps are respectively subjected to feature combination, on one hand, the attention extraction characteristic diagram extracts more information from the original intermediate characteristic diagram, and on the other hand, the original intermediate characteristic diagram is retained, so that the information of the two characteristic diagrams is combined, more useful characteristic information is obtained, and the detection accuracy of the target with each size is further improved on the whole.
In one embodiment, as shown in fig. 3, the target detection method further includes:
and step S180, acquiring picture input data as a test set.
And S190, testing the test set according to the target detection model, and outputting a corresponding target detection result.
In one embodiment, as shown in fig. 1, the feature extraction unit 12 includes a backbone unit and a Neck unit connected in sequence, the backbone unit is connected to the input end 11, and the output end of the Neck unit is connected to the attention unit 13, as shown in fig. 4, and the step S130 includes:
step S132, based on the backbone unit, slicing operation and convolution operation are carried out on each training picture in the preprocessed training set, so as to obtain an initial feature map.
And S134, performing secondary feature extraction on the initial feature map based on the Neck unit to obtain intermediate feature maps with different scales.
In one embodiment, as shown in fig. 5, the attention unit 13 includes a first attention subunit 13a and a second attention subunit 13b, as shown in fig. 6, step S140 includes:
in step S141, feature extraction is performed on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map.
And step S142, respectively performing feature extraction on the intermediate feature maps of the second scale and the third scale through a second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In this embodiment, the first attention subunit performs feature extraction on the intermediate feature map with the largest scale (i.e., the intermediate feature map with the first scale), and then, for the intermediate feature maps with the smaller scales, the second attention subunit performs feature extraction, so that more feature information can be extracted from the intermediate feature map with the smaller scale, that is, corresponding feature information can be respectively extracted through the respective corresponding attention subunits according to the sizes of the intermediate feature maps, thereby implementing feature extraction and collection on targets with various sizes respectively and specifically, and further improving the overall detection accuracy on the targets with various sizes.
In one embodiment, the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
The compression and Excitation Module is (SE Module), and the convolution Block Attention Module is (CBAM Module).
In one embodiment, as shown in fig. 7, the attention unit includes a first attention subunit 13a, a second attention subunit 13b and a third attention subunit 13c, as shown in fig. 8, and step S140 includes:
and step S143, performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map.
And step S144, performing feature extraction on the intermediate feature map of the second scale through a second attention subunit to obtain a second attention extraction feature map.
And S145, performing feature extraction on the intermediate feature map of the third scale through a third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
In this embodiment, the first attention subunit performs feature extraction on the intermediate feature map with the largest scale (that is, the intermediate feature map with the first scale), then, for the intermediate feature map with the second scale with the smaller scale, the second attention subunit performs feature extraction, and for the intermediate feature map with the third scale with the smaller scale, the third attention subunit performs feature extraction, so that more feature information can be further extracted from the intermediate feature map with the smaller scale, that is, the corresponding feature information can be further extracted through the respective attention subunits according to the sizes of the intermediate feature maps, thereby implementing feature extraction and collection respectively and specifically for targets with various sizes, and further improving the detection accuracy of the targets with various sizes as a whole.
In one embodiment, as shown in fig. 9, a batch normalization layer 15 is further connected between the feature extraction unit 12 and the attention unit 13, and as shown in fig. 10, step S140 further includes:
and S200, respectively carrying out standardization processing on the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
In this embodiment, the intermediate feature maps are normalized by the batch normalization layer, and a preset dynamic adjustment factor is added, where the preset dynamic adjustment factor can reflect the degree of information change in each intermediate feature map, that is, the variance of the batch normalization layer, in other words, the variance can reflect the degree of information change, and the larger the variance is, the larger the degree of information change is, the richer the information therein is, and the higher the importance is, whereas the smaller the variance is, the smaller the degree of information change is, and the smaller the importance is, so that by setting the batch normalization layer, the feature map information can be better extracted by the subsequent attention unit.
In the process of performing the subsequent steps S140 to S150, the normalized intermediate feature maps with different scales need to be processed, and the steps S160 to S170 are not changed, as shown in fig. 10, that is:
step S140, at least two attention subunits are obtained to respectively perform feature extraction on each normalized intermediate feature map according to the size of each normalized intermediate feature map, so as to obtain an attention extraction feature map corresponding to each normalized intermediate feature map.
And step S150, respectively carrying out feature merging on each normalized intermediate feature map and the attention extraction feature maps corresponding to each normalized intermediate feature map to obtain each target feature map.
In one embodiment, the formula employed in the normalization process is:
Figure 409712DEST_PATH_IMAGE001
wherein, yiRepresents the normalized intermediate feature map corresponding to the ith channel, m represents the number of channels per input intermediate feature map,
Figure 763857DEST_PATH_IMAGE002
represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,
Figure 474324DEST_PATH_IMAGE003
represents the overall variance of the input m-channel intermediate feature maps,
Figure 483869DEST_PATH_IMAGE004
and
Figure 48711DEST_PATH_IMAGE005
all represent constants.
In one embodiment, the loss function is:
Figure 852719DEST_PATH_IMAGE015
wherein,
Figure 768591DEST_PATH_IMAGE007
represents the overall loss function value of the YOLOv5-Lite network,
Figure 316247DEST_PATH_IMAGE008
represents a penalty factor, x represents an input target feature map, f (x) represents a predicted value,
Figure 751908DEST_PATH_IMAGE009
the corresponding real value is represented by a value,
Figure 241664DEST_PATH_IMAGE010
representing the values of the loss function for x and y,
Figure 926723DEST_PATH_IMAGE011
a weight corresponding to each channel is represented,
Figure 202371DEST_PATH_IMAGE012
represents the utilization of L1Paradigm pair weight
Figure 961380DEST_PATH_IMAGE013
The absolute value summation is performed, i and j each represent a positive integer variable,
Figure 887616DEST_PATH_IMAGE002
represents the preset dynamic adjustment factor corresponding to the ith channel,
Figure 263234DEST_PATH_IMAGE014
representing the jth preset dynamic adjustment factor.
On the basis of the embodiment shown in fig. 8, a batch normalization layer is arranged, so that the integral loss function of the YOLOv5-Lite network comprises
Figure 605222DEST_PATH_IMAGE016
And further, the loss function can be adjusted, and the accuracy of the whole target detection is improved on the whole.
In addition, as shown in fig. 11, there is also provided an object detection apparatus 300 applied to the YOLOv5-Lite network shown in fig. 1, the object detection apparatus 300 including:
a training set generating module 310, configured to obtain picture input data as a training set;
the preprocessing module 320 is used for preprocessing each training picture in the training set through an input end to obtain a preprocessed training set;
a first feature map generation module 330, configured to perform feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps of different scales;
a second feature map generation module 340, configured to obtain at least two attention subunits according to the size of each intermediate feature map, and perform feature extraction on each intermediate feature map respectively to obtain an attention extraction feature map corresponding to each intermediate feature map;
the target feature map generation module 350 is configured to perform feature merging on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps, so as to obtain each target feature map;
and the predicted value generation module 360 detects each target feature map through the prediction output unit to generate corresponding predicted values.
And the detection model generation module 370 performs loss function calculation according to the corresponding predicted value to obtain an optimized gradient, and performs weight and bias updating until the loss function converges to generate a corresponding target detection model.
In addition, an apparatus terminal is provided, which includes a processor and a memory, the memory is used for storing a computer program, and the processor runs the computer program to make the apparatus terminal execute the above object detection method.
Furthermore, a readable storage medium is provided, which stores a computer program, which when executed by a processor implements the above object detection method.
The division of the units in the device is only used for illustration, and in other embodiments, the device may be divided into different units as needed to complete all or part of the functions of the device. For the specific limitations of the above device, reference may be made to the limitations of the above method, which are not described herein again.
That is, the above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, such as mutual combination of technical features between various embodiments, or direct or indirect application to other related technical fields, are included in the scope of the present application.
In addition, structural elements having the same or similar characteristics may be identified by the same or different reference numerals. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "for example" is used to mean "serving as an example, instance, or illustration". Any embodiment described herein as "for example" is not necessarily to be construed as preferred or advantageous over other embodiments. The previous description is provided to enable any person skilled in the art to make and use the present application. In the foregoing description, various details have been set forth for the purpose of explanation.
It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Claims (12)

1. An object detection method applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprising an input terminal, a feature extraction unit, an attention unit and a prediction output unit which are connected in sequence, the attention unit comprising a plurality of different attention sub-units, the object detection method comprising:
acquiring picture input data as a training set;
preprocessing each training picture in the training set through the input end to obtain a preprocessed training set;
extracting the features of each training picture in the preprocessed training set based on the feature extraction unit to obtain intermediate feature maps with different scales;
according to the size of each intermediate feature map, at least two attention subunits are obtained to respectively perform feature extraction on each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
detecting each target characteristic diagram through the prediction output unit to generate corresponding prediction values; and calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
2. The object detection method according to claim 1, characterized in that the object detection method further comprises:
acquiring picture input data as a test set;
and testing the test set according to the target detection model, and outputting a corresponding target detection result.
3. The target detection method according to claim 1, wherein the feature extraction unit includes a backbone unit and a heck unit, which are connected in sequence, the backbone unit is connected to the input end, an output end of the heck unit is connected to the attention unit, and the step of performing feature extraction on each training picture in the preprocessed training set based on the feature extraction unit to obtain the attention extraction feature maps corresponding to the intermediate feature maps includes: performing slicing operation and convolution operation on each training picture in the preprocessed training set based on the back bone unit to obtain an initial feature map;
and performing secondary feature extraction on the initial feature map based on the Neck unit to obtain intermediate feature maps with different scales.
4. The object detection method according to claim 1, wherein the attention unit includes a first attention subunit and a second attention subunit, the intermediate feature maps have three dimensions, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain an attention extraction feature map corresponding to each intermediate feature map includes:
performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map;
and respectively extracting features of the intermediate feature maps of the second scale and the third scale through the second attention subunit to obtain a second attention extraction feature map and a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
5. The method of claim 4, wherein the first attention subunit is a compression and excitation module and the second attention subunit is a convolution block attention module.
6. The object detection method according to claim 1, wherein the attention unit includes a first attention subunit, a second attention subunit, and a third attention subunit, the intermediate feature maps have three scales, and the step of obtaining at least two attention subunits to perform feature extraction on each intermediate feature map respectively according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map comprises:
performing feature extraction on the intermediate feature map of the first scale through the first attention subunit to obtain a corresponding first attention extraction feature map;
performing feature extraction on the intermediate feature map of the second scale through the second attention subunit to obtain a second attention extraction feature map;
and performing feature extraction on the intermediate feature map of the third scale through the third attention subunit to obtain a third attention extraction feature map, wherein the first scale, the second scale and the third scale are sequentially reduced.
7. The object detection method according to claim 1, wherein a batch normalization layer is further connected between the feature extraction unit and the attention unit, and the step of obtaining at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map to obtain the attention extraction feature map corresponding to each intermediate feature map further comprises:
respectively standardizing the intermediate characteristic diagrams with different scales based on the batch standardization layer, and adjusting the weight of each channel in the intermediate characteristic diagram with each size by adopting a preset dynamic adjustment factor to obtain the standardized intermediate characteristic diagrams with different scales.
8. The object detection method according to claim 7, wherein the formula employed in the normalization process is:
Figure 897365DEST_PATH_IMAGE001
wherein, yiShowing the normalized intermediate feature map corresponding to the ith channel, m showing the number of channels per input intermediate feature map,
Figure 8540DEST_PATH_IMAGE002
represents a preset dynamic adjustment factor, x, corresponding to the ith channeliIntermediate feature graph, u, representing the input corresponding to the ith channelbRepresents the mean of the input m-channel intermediate feature maps,
Figure 488063DEST_PATH_IMAGE003
represents the overall variance of the input m-channel intermediate feature maps,
Figure 380320DEST_PATH_IMAGE004
and
Figure 210873DEST_PATH_IMAGE005
all represent constants.
9. The object detection method of claim 8, wherein the loss function is:
Figure 843848DEST_PATH_IMAGE006
wherein,
Figure 443457DEST_PATH_IMAGE007
represents the overall loss function value of the YOLOv5-Lite network,
Figure 254418DEST_PATH_IMAGE008
representing a penalty factor, x representing the input target feature map, f (x) representing a predictor,
Figure 24797DEST_PATH_IMAGE009
the corresponding real value is represented by a value,
Figure 212196DEST_PATH_IMAGE010
representing the values of the loss function for x and y,
Figure 181158DEST_PATH_IMAGE011
a weight corresponding to each channel is represented,
Figure 959758DEST_PATH_IMAGE012
represents the utilization of L1Paradigm pair weight
Figure 702586DEST_PATH_IMAGE013
The absolute value summation is performed, i and j each represent a positive integer variable,
Figure 742611DEST_PATH_IMAGE002
represents the preset dynamic adjustment factor corresponding to the ith channel,
Figure 254495DEST_PATH_IMAGE014
representing the jth preset dynamic adjustment factor.
10. An object detection device applied to a YOLOv5-Lite network, the YOLOv5-Lite network comprising an input terminal, a feature extraction unit, an attention unit and a prediction output unit connected in sequence, the attention unit comprising a plurality of different attention sub-units, the object detection device comprising:
the training set generation module is used for acquiring picture input data as a training set;
the preprocessing module is used for preprocessing each training picture in the training set through the input end to obtain a preprocessed training set;
the first feature map generation module is used for extracting features of each training picture in the preprocessed training set based on a feature extraction unit so as to obtain intermediate feature maps with different scales;
the second feature map generation module is used for acquiring at least two attention subunits to respectively perform feature extraction on each intermediate feature map according to the size of each intermediate feature map so as to obtain an attention extraction feature map corresponding to each intermediate feature map;
the predicted value generation module is used for respectively carrying out feature combination on each intermediate feature map and the attention extraction feature maps corresponding to the intermediate feature maps to obtain each target feature map;
the prediction value generation module is used for respectively detecting each target characteristic diagram through the prediction output unit so as to generate a corresponding prediction value;
and the detection model generation module is used for calculating a loss function according to the corresponding predicted value to obtain an optimized gradient, and updating the weight and the bias until the loss function is converged to generate a corresponding target detection model.
11. A device terminal, characterized in that it comprises a processor and a memory for storing a computer program, the processor running the computer program to cause the device terminal to perform the object detection method of any one of claims 1 to 9.
12. A readable storage medium, characterized in that the readable storage medium stores a computer program which, when executed by a processor, implements the object detection method of any one of claims 1 to 9.
CN202210600445.8A 2022-05-30 2022-05-30 Target detection method, device, equipment terminal and readable storage medium Active CN114677504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210600445.8A CN114677504B (en) 2022-05-30 2022-05-30 Target detection method, device, equipment terminal and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210600445.8A CN114677504B (en) 2022-05-30 2022-05-30 Target detection method, device, equipment terminal and readable storage medium

Publications (2)

Publication Number Publication Date
CN114677504A true CN114677504A (en) 2022-06-28
CN114677504B CN114677504B (en) 2022-11-15

Family

ID=82081145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210600445.8A Active CN114677504B (en) 2022-05-30 2022-05-30 Target detection method, device, equipment terminal and readable storage medium

Country Status (1)

Country Link
CN (1) CN114677504B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049851A (en) * 2022-08-15 2022-09-13 深圳市爱深盈通信息技术有限公司 Target detection method, device and equipment terminal based on YOLOv5 network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688723A (en) * 2021-08-21 2021-11-23 河南大学 Infrared image pedestrian target detection method based on improved YOLOv5
CN113920107A (en) * 2021-10-29 2022-01-11 西安工程大学 Insulator damage detection method based on improved yolov5 algorithm
CN114005105A (en) * 2021-12-30 2022-02-01 青岛以萨数据技术有限公司 Driving behavior detection method and device and electronic equipment
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114494415A (en) * 2021-12-31 2022-05-13 北京建筑大学 Method for detecting, identifying and measuring gravel pile by automatic driving loader

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688723A (en) * 2021-08-21 2021-11-23 河南大学 Infrared image pedestrian target detection method based on improved YOLOv5
CN113920107A (en) * 2021-10-29 2022-01-11 西安工程大学 Insulator damage detection method based on improved yolov5 algorithm
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114005105A (en) * 2021-12-30 2022-02-01 青岛以萨数据技术有限公司 Driving behavior detection method and device and electronic equipment
CN114494415A (en) * 2021-12-31 2022-05-13 北京建筑大学 Method for detecting, identifying and measuring gravel pile by automatic driving loader

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049851A (en) * 2022-08-15 2022-09-13 深圳市爱深盈通信息技术有限公司 Target detection method, device and equipment terminal based on YOLOv5 network
CN115049851B (en) * 2022-08-15 2023-01-17 深圳市爱深盈通信息技术有限公司 Target detection method, device and equipment terminal based on YOLOv5 network

Also Published As

Publication number Publication date
CN114677504B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN110210560B (en) Incremental training method, classification method and device, equipment and medium of classification network
WO2020098250A1 (en) Character recognition method, server, and computer readable storage medium
CN110188829B (en) Neural network training method, target recognition method and related products
CN108846404B (en) Image significance detection method and device based on related constraint graph sorting
CN109413510B (en) Video abstract generation method and device, electronic equipment and computer storage medium
CN111914908B (en) Image recognition model training method, image recognition method and related equipment
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN112257738A (en) Training method and device of machine learning model and classification method and device of image
CN114241505B (en) Method and device for extracting chemical structure image, storage medium and electronic equipment
CN111738270B (en) Model generation method, device, equipment and readable storage medium
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN111931867B (en) New coronary pneumonia X-ray image classification method and system based on lightweight model
CN115240280A (en) Construction method of human face living body detection classification model, detection classification method and device
CN114677504B (en) Target detection method, device, equipment terminal and readable storage medium
CN115937571A (en) Device and method for detecting sphericity of glass for vehicle
CN113112518A (en) Feature extractor generation method and device based on spliced image and computer equipment
CN114267089B (en) Method, device and equipment for identifying forged image
CN115937596A (en) Target detection method, training method and device of model thereof, and storage medium
CN115205547A (en) Target image detection method and device, electronic equipment and storage medium
CN109671055A (en) Pulmonary nodule detection method and device
CN110135428B (en) Image segmentation processing method and device
CN111814846A (en) Training method and recognition method of attribute recognition model and related equipment
CN112446428B (en) Image data processing method and device
CN111179245B (en) Image quality detection method, device, electronic equipment and storage medium
CN115049851B (en) Target detection method, device and equipment terminal based on YOLOv5 network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230704

Address after: 13C-18, Caihong Building, Caihong Xindu, No. 3002, Caitian South Road, Gangsha Community, Futian Street, Futian District, Shenzhen, Guangdong 518033

Patentee after: Core Computing Integrated (Shenzhen) Technology Co.,Ltd.

Address before: 518000 1001, building G3, TCL International e city, Shuguang community, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Aishen Yingtong Information Technology Co.,Ltd.

TR01 Transfer of patent right