CN114842320A - Robot target detection method and system based on DW-SEnet model - Google Patents

Robot target detection method and system based on DW-SEnet model Download PDF

Info

Publication number
CN114842320A
CN114842320A CN202210265173.0A CN202210265173A CN114842320A CN 114842320 A CN114842320 A CN 114842320A CN 202210265173 A CN202210265173 A CN 202210265173A CN 114842320 A CN114842320 A CN 114842320A
Authority
CN
China
Prior art keywords
feature
target detection
image
compression
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210265173.0A
Other languages
Chinese (zh)
Inventor
张洪
于源卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202210265173.0A priority Critical patent/CN114842320A/en
Publication of CN114842320A publication Critical patent/CN114842320A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a robot target detection method based on a DW-SEnet model, which comprises the steps of obtaining image information of a target to be detected, preprocessing the image information and obtaining an input image; carrying out depth separable convolution DW operation on the input image, and extracting image features; in the forward propagation process, a compression excitation module SE is adopted to calculate the importance degree weights of different characteristic channels of the image characteristics, and the characteristics of the different channels are inhibited or enhanced according to the weight to obtain a characteristic image; constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected, and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result; and eliminating redundant candidate detection frames through non-maximum inhibition, and labeling classification results to obtain target detection results. The target detection method provided by the invention has fewer model parameters, higher detection speed and higher accuracy when processing image target detection.

Description

Robot target detection method and system based on DW-SEnet model
Technical Field
The invention relates to the technical field of robot vision detection, in particular to a robot target detection method and system based on a DW-SEnet model.
Background
The mobile robot is ubiquitous in the life of people, and the intelligent requirement of the mobile robot is higher and higher. The sequential appearance of multiple cameras makes robotic target detection possible. The target detection is to find the exact position of an object in an image and classify it by some method.
The target detection method of the mobile robot is different from the traditional target detection method, most mobile robots do not have processors with excellent performance, and even most robots use a single chip microcomputer as the processors. Therefore, the network with excessively complex parameters is not suitable for target detection of the mobile robot, and when the network is researched in the aspect, the PC is mostly adopted as an upper computer to process graphics, so that the network is concentrated on improvement of target detection precision. But considering the processing capacity of most mobile robots, the lightweight of the algorithm model is crucial. The YOLO series algorithm and the SSD algorithm highlighted by ONE-STAGE are widely applied to mobile robots, wherein the SSD algorithm has superior performance in terms of detection time and model size. The SSD algorithm only has one neural network picture input on the whole, then a bounding box is generated through the network, and the images are classified. The SSD algorithm is formed by adding an additional network on the basis of the modified VGG16 deep learning structure network. The fully connected layer included in the VGG16 is not used in target detection, and therefore, is removed, and is replaced with a completely new convolutional layer for additional feature extraction. The input of the network is that a 300 × 300 image is propagated forward, and as the size of a continuous convolution picture is continuously reduced, at the same time, a bounding box with different sizes but a fixed size is generated, the network classifies the picture in the bounding box, and finally the bounding box is subjected to a removing operation, so that a final detection result is generated. Because the SSD target detection algorithm has more network structure parameters and a higher error rate of the classified network, the target detection speed and the detection accuracy are still to be improved.
Therefore, at present, a mobile robot target detection method with few network structure parameters, short detection time and low detection accuracy is urgently needed.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problems in the prior art, and provide a robot target detection method and system based on a DW-SEnet model, wherein DW convolution is adopted to replace the traditional convolution, so that the calculation amount and parameters required by calculation are reduced, and the calculation speed is effectively improved; the weights of different characteristic channels are calculated through the compression excitation module SE, different image characteristics are restrained or promoted according to the weights, unimportant characteristics are restrained, and the accuracy and the speed of image classification are effectively improved, so that the detection time is effectively shortened, and the detection accuracy is improved.
In order to solve the technical problem, the invention provides a robot target detection method based on a DW-SEnet model, which comprises the following steps:
s1: acquiring image information of a target to be detected, and preprocessing the image information to obtain an input image;
s2: performing depth separable convolution (DW) operation on the input image to extract image features;
s3: in the forward propagation process, a compression excitation module SE is adopted to calculate the importance degree weights of different characteristic channels of the image characteristics, and the characteristics of the different channels are inhibited or enhanced according to the weight to obtain characteristic images;
s4: constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected, and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
s5: and eliminating redundant candidate detection frames through non-maximum inhibition, and labeling the classification result to obtain a target detection result.
In one embodiment of the present invention, in S1, an RGB image of the object to be detected is taken by the mobile robot using an RGB-D vision camera.
In one embodiment of the invention, the depth separable convolution DW and the compressive excitation module SE constitute a DW-SENSet detection model.
In one embodiment of the invention, the detection model comprises 4 convolutional Block networks of the same type, and an inclusion structure is added among different Block network modules.
In an embodiment of the present invention, the inclusion structure includes the compression excitation module SE, the compression excitation module SE includes a compression unit, and the method for performing the compression operation by the compression unit includes:
compressing the features along the spatial dimension), the two-dimensional feature channel is transformed into a real number, the real number has a global receptive field, and represents the global distribution of the response on the feature channel, and the layer close to the input is also made to obtain the global receptive field, wherein the compression formula is:
Figure BDA0003552339130000031
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
In one embodiment of the present invention, the compression excitation module SE comprises an excitation unit, and the method for excitation operation by the excitation unit comprises:
generating a weight for each feature channel by a parameter w, wherein the parameter w is used for establishing a relevant weight between the feature channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z))
wherein s is the relative weight of different characteristic channels, σ () is Sigmoid function, z is the result obtained by the compression unit, δ () is RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the full connection layer;
and adding the output weight to each characteristic channel by adopting a Reweight operation to finish the recalibration of the original characteristic in the channel dimension.
In one embodiment of the present invention, the classification of the image in S4 employs a Softmax function.
In addition, the invention also provides a robot target detection system based on the DW-SEnet model, which comprises:
the image acquisition module is used for acquiring image information of a target to be detected and preprocessing the image information to obtain an input image;
the feature extraction module is used for carrying out depth separable convolution DW operation on the input image and extracting image features;
the weight excitation module is used for calculating importance degree weights of different characteristic channels of the image characteristics by adopting a compression excitation module SE in the process of forward propagation, and inhibiting or enhancing the characteristics of the different channels according to the weight to obtain characteristic images;
the target positioning and classifying module is used for constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
and the target detection module is used for eliminating redundant candidate detection frames through non-maximum inhibition and marking classification results to obtain target detection results.
In an embodiment of the present invention, in the weight excitation module, the method for performing the compression operation by the compression excitation module SE includes:
compressing the features along the spatial dimension, converting the two-dimensional feature channel into a real number, wherein the real number has a global receptive field, represents the global distribution of the response on the feature channel, and enables the layer close to the input to also obtain the global receptive field, wherein the compression formula is:
Figure BDA0003552339130000051
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
In an embodiment of the present invention, in the weight excitation module, the method for the compression excitation module SE to perform the excitation operation includes:
generating a weight for each feature channel by a parameter w, wherein the parameter w is used for establishing a relevant weight between the feature channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z))
wherein s is the relative weight of different characteristic channels, σ () is Sigmoid function, z is the result obtained by the compression unit, δ () is RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the full connection layer;
and adding the output weight to each characteristic channel by adopting a Reweight operation to finish the recalibration of the original characteristic in the channel dimension.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the method is based on a ResNet model, and combines a depth separable convolution DW and a compression excitation module SE to construct a DW-SEnet detection model so as to realize the detection of the target; DW convolution is adopted to replace the traditional convolution, so that the calculation amount and parameters required by the operation are reduced, and the calculation speed is effectively improved; and weights of different characteristic channels are calculated through the compression excitation module SE, different image characteristics are inhibited or promoted according to the weights, unimportant characteristics are inhibited, and the accuracy and the speed of image classification are effectively improved, so that the detection time is effectively shortened, and the detection accuracy is improved.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference will now be made in detail to the present disclosure, examples of which are illustrated in the accompanying drawings.
FIG. 1 is a schematic flow chart of the robot target detection method based on the DW-SEnet model according to the present invention.
FIG. 2 is a schematic diagram of a depth separable convolution DW in accordance with the present invention.
Fig. 3 is a schematic diagram of a compression excitation module SE according to the present invention.
Fig. 4 is a schematic diagram of a Re _ SSD network structure in the present invention.
FIG. 5 is a schematic diagram of a detection model according to the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Example one
Referring to fig. 1, an embodiment of the present invention provides a method for detecting a robot target based on a DW-SEnet model, including the following steps:
s1: acquiring image information of a target to be detected, and preprocessing the image information to obtain an input image;
s2: carrying out depth separable convolution DW operation on the input image, and extracting image characteristics;
s3: in the forward propagation process, a compression excitation module SE is adopted to calculate the importance degree weights of different characteristic channels of the image characteristics, and the characteristics of the different channels are inhibited or enhanced according to the weight to obtain characteristic images;
s4: constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected, and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
s5: and eliminating redundant candidate detection frames through non-maximum inhibition, and labeling the classification result to obtain a target detection result.
In the robot target detection method based on the DW-SEnet model disclosed by the embodiment of the invention, the detection method is based on a ResNet model and combines the depth separable convolution DW and the compression excitation module SE to construct the DW-SEnet detection model.
The detection model comprises 4 convolutional Block networks of the same type, and an increment structure is added among different Block network modules.
In the robot target detection method based on the DW-SEnet model disclosed by the embodiment of the invention, the DW-SEnet detection model is constructed by combining a depth separable convolution DW module and a compression excitation module SE on the basis of a ResNet model, so that the target is detected; DW convolution is adopted to replace the traditional convolution, so that the calculation amount and parameters required by the operation are reduced, and the calculation speed is effectively improved; the weights of different characteristic channels are calculated through the compression excitation module SE, different image characteristics are restrained or promoted according to the weights, unimportant characteristics are restrained, and the accuracy and the speed of image classification are effectively improved, so that the detection time is effectively shortened, and the detection accuracy is improved.
In the method for detecting a robot target based on the DW-SEnet model disclosed in the embodiment of the present invention, in the implementation S1, the mobile robot uses an RGB-D vision camera to capture RGB images of a target to be detected, and preprocesses the RGB images to obtain input images with sizes meeting the requirements. Wherein the preprocessing operation comprises RESIZE; PUDDING.
In the robot target detection method based on the DW-SEnet model disclosed in the embodiment of the present invention, as for the implementation S2, please refer to fig. 2, unlike the conventional convolution operation, one convolution kernel of the DW convolution is responsible for one channel, and one channel is only convolved by one convolution kernel. The conventional convolution is that each convolution kernel operates on each channel of the input image simultaneously, and similarly, for a 5 × 5 pixel, three-channel color input image (shape is 5 × 5 × 3), DW convolution firstly passes through a first convolution operation, and unlike the conventional convolution, DW convolution is performed completely in a two-dimensional plane. The number of convolution kernels is the same as the number of channels in the previous layer (channels and convolution kernels are in one-to-one correspondence). So that a three-channel image is operated to generate 3 characteristic images.
In the robot target detection method based on the DW-SEnet model disclosed in the embodiment of the present invention, as for the implementation S3, please refer to fig. 3, where an input is x, which has c1 feature channels, the number of channels of the feature matrix after continuous convolution pooling becomes c2, and then a compression excitation operation is performed to obtain the importance degrees of different feature channels.
Firstly, performing compression operation, namely compressing the features along the spatial dimension, so that a two-dimensional feature channel becomes a real number, namely the real number has a global receptive field and represents the global distribution of response on the feature channel, and a layer close to the input can also obtain the global receptive field, wherein the compression formula is as follows;
Figure BDA0003552339130000081
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
Second, a stimulus operation is performed, which is a mechanism similar to a gate in a recurrent neural network. Generating a weight for each eigen-channel by a parameter w, wherein the parameter w is learned to explicitly model the correlation weight between eigen-channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z))
wherein s is the relative weight of different characteristic channels, σ () is Sigmoid function, z is the result obtained by the compression unit, δ () is RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the fully-connected layer, wherein w is divided into w because two fully-connected layers are adopted for excitation operation 1 And w 2
Finally, a Reweight operation, in which the output weights of the excitation are the importance weights of each feature channel, adds these weights to each feature channel to complete the re-scaling of the original feature in the channel dimension.
In the robot target detection method based on the DW-SEnet model disclosed in the embodiment of the invention, as for the implementation mode S4, the default bounding box (true value) of the feature map is predicted, and the resampling of the features is eliminated, so that the operation speed of the algorithm can be effectively improved. A series of operations such as horizontal turning, cropping and the like are performed on the input graph to improve the accuracy of image prediction. Since the size of the target in the picture is different, in order to accommodate the difference, the ratio of the length to the width of the default width is set, and assuming that there are m feature maps, the dimension of each default bounding box is:
s k =s min +(s max -s min )(k-1)/(m-1)
in the formula, s min Is 0.2, s max 0.9, namely the characteristic scale of the lowest layer is 0.2, and the characteristic scale of the highest layer is 0.9; k is an element of [1, m ]]The number of layers of a certain layer of feature map is m, and the total number of the feature maps is m. Meanwhile, different length-width ratios a epsilon 1, 2, 3, 1/2 and 1/3 are set for the default bounding box according to the length-width ratio a and the default bounding box s k Calculating the width and height of each candidate detection frame as
Figure BDA0003552339130000091
And
Figure BDA0003552339130000092
when the length-width ratio of the default bounding box is 1, the dimension of the default bounding box is
Figure BDA0003552339130000093
In the robot target detection method based on the DW-SEnet model disclosed in the embodiment of the present invention, as for embodiment S5, all detected candidate detection frames are sorted according to their scores. Selecting the detection frame A with the largest score, setting a threshold b, calculating the intersection ratio IoU (intersection over Unit) between the detection frame A and the detection frame A with the largest score in the rest detection frames, and deleting the detection frames if IoU is larger than the threshold b, namely the detection frames with high overlapping rate; there may be no overlap with the current detection box, or their overlap area is very small (IoU is less than threshold b), then the unprocessed detection boxes are reordered, and after the ordering is completed, a detection box with the largest score is also selected, then IoU values of other detection boxes and the largest detection box are calculated, then the detection boxes with IoU larger than a certain threshold are deleted again, and the process is iterated continuously until all detection boxes are processed, and the final detection result is output.
In order to further illustrate the beneficial effects of the present invention, simulation experiments were performed as follows:
under the experimental environment of a Window10 system, a video card GTX1650, a CPU i5-9300H and a memory 16G; pythroch is used as a deep learning frame, a target detection standard dataset Pascal Voc dataset is adopted, and a mobile ROBOT common dataset is recorded to form a VOC _ ROBOT dataset, wherein 600 pictures are used as a verification set, and 1300 pictures are used as a training set. And respectively comparing the model with other common models in terms of parameter size, detection accuracy and detection time.
In order to compare the model parameters, the network models trained on YOLOv3, SSD, and DW _ SENet were compared and adjusted, and the specific results are shown in table 1.
TABLE 1 comparison of model parameters
Network model Parameter size/M
YOLOv3 235
SSD 148
DW_SENet 109
As can be seen from table 1: obviously, the invention occupies a smaller space, and only occupies 50% of the size of the mainstream Yolov3 target detection network. The model adopted by the invention adopts ResNet as a basic classification network for target detection, and compared with VGG16 adopted by an SSD target detection model, the model not only has less parameter quantity, but also has higher accuracy, and the adopted DW convolution can also greatly reduce the space occupied by the network model.
The target detection model detection accuracy is tested, firstly, a Pascal VOC2012 data set is used to train a common network, which respectively comprises DW _ SENet, SSD, YOLO, YOLOv2, YOLOv3 and Faster RCNN, and the obtained results are shown in table 2:
TABLE 2 accuracy of different network models
Network model mAP
FasterRCNN 0.708
YOLO 0.518
YOLOv2 0.704
YOLOv3 0.713
SSD 0.701
DW_SENet 0.711
According to experimental results, the target detection accuracy has higher accuracy in comparison of a plurality of common algorithms, and in the aspect of training results of the VOC2012 data set, the improved algorithm has the accuracy equivalent to that of the mainstream algorithm, and has a smaller parameter model.
The algorithm for detecting the SSD target and the invention in the algorithm are trained on the VOC _ ROBOT data set, and the training result is shown in Table 3:
TABLE 3 VOC _ ROBOT data set training result accuracy
Figure BDA0003552339130000111
As can be seen from table 3, for the common data set of the mobile robot, the accuracy of the present invention under various targets is improved to a certain extent compared with the conventional SSD algorithm. The detection time of the same picture is compared with that of the conventional algorithm in the same environment, 5 pictures are selected for detection, the average time is taken for comparison after 150 detections in each group, and the detection results are shown in table 4.
TABLE 4 comparison of the test times of different models
Figure BDA0003552339130000121
Through comparison, the detection time of the network model adopted by the invention is shorter than that of the conventional target detection algorithm.
In summary, the target detection method provided by the invention can have fewer model parameters, higher detection speed and higher accuracy when processing image target detection.
Example two
In the following, a robot target detection system based on a DW-SEnet model disclosed in the second embodiment of the present invention is introduced, and a robot target detection system based on a DW-SEnet model described below and a robot target detection method based on a DW-SEnet model described above may be referred to correspondingly.
The embodiment II of the invention discloses a robot target detection system based on a DW-SEnet model, which comprises:
the image acquisition module is used for acquiring image information of a target to be detected and preprocessing the image information to obtain an input image;
the feature extraction module is used for carrying out depth separable convolution DW operation on the input image and extracting image features;
the weight excitation module is used for calculating importance degree weights of different characteristic channels of the image characteristics by adopting a compression excitation module SE in the process of forward propagation, and inhibiting or enhancing the characteristics of the different channels according to the weight to obtain characteristic images;
the target positioning and classifying module is used for constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
and the target detection module is used for eliminating redundant candidate detection frames through non-maximum inhibition and marking classification results to obtain target detection results.
In an embodiment of the present invention, in the weight excitation module, the method for performing the compression operation by the compression excitation module SE includes:
compressing the features along the spatial dimension, converting the two-dimensional feature channel into a real number, wherein the real number has a global receptive field, characterizes the global distribution of the responses on the feature channel, and makes the layer close to the input also obtain the global receptive field, and wherein the compression formula is as follows:
Figure BDA0003552339130000131
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
In an embodiment of the present invention, in the weight excitation module, the method for the compression excitation module SE to perform the excitation operation includes:
generating a weight for each feature channel by a parameter w, wherein the parameter w is used for establishing a relevant weight between the feature channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z)) where s is the correlation weight of different feature channels, σ () is the Sigmoid function, z is the result of the compression unit, δ () is the RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the full connection layer;
and adding the output weight to each characteristic channel by adopting a Reweight operation to finish the recalibration of the original characteristic in the channel dimension.
The robot target detection system based on the DW-SEnet model of the present embodiment is used for implementing the robot target detection method based on the DW-SEnet model, and therefore, the specific implementation of the system can be seen in the above embodiment section of the robot target detection method based on the DW-SEnet model, and therefore, the specific implementation thereof can refer to the description of the corresponding respective section embodiments, and will not be further described herein.
In addition, since the DW-SEnet model-based robot target detection method system of the present embodiment is used for implementing the aforementioned DW-SEnet model-based robot target detection method, the role thereof corresponds to that of the above method, and details thereof are omitted here.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims (10)

1. A robot target detection method based on a DW-SEnet model is characterized by comprising the following steps:
s1: acquiring image information of a target to be detected, and preprocessing the image information to obtain an input image;
s2: performing depth separable convolution (DW) operation on the input image to extract image features;
s3: in the forward propagation process, a compression excitation module SE is adopted to calculate the importance degree weights of different characteristic channels of the image characteristics, and the characteristics of the different channels are inhibited or enhanced according to the weight to obtain characteristic images;
s4: constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected, and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
s5: and eliminating redundant candidate detection frames through non-maximum inhibition, and labeling the classification result to obtain a target detection result.
2. The DW-SEnet model-based robot target detection method of claim 1, wherein: in S1, an RGB image of the object to be detected is taken by the mobile robot using an RGB-D vision camera.
3. The DW-SEnet model-based robot target detection method of claim 1, wherein: the depth separable convolution DW and the compressed excitation module SE constitute a DW-SEnet detection model.
4. The DW-SEnet model-based robot target detection method of claim 3, characterized in that: the detection model comprises 4 convolutional Block networks of the same type, and an increment structure is added among different Block network modules.
5. The DW-SEnet model-based robot target detection method of claim 4, wherein the inclusion structure comprises the compression excitation module SE, the compression excitation module SE comprises a compression unit, and the method for performing the compression operation by the compression unit comprises:
compressing the features along the spatial dimension, converting the two-dimensional feature channel into a real number, wherein the real number has a global receptive field, characterizes the global distribution of the responses on the feature channel, and makes the layer close to the input also obtain the global receptive field, and wherein the compression formula is as follows:
Figure FDA0003552339120000021
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
6. The DW-SEnet model-based robot target detection method of claim 5, wherein the compressed excitation module SE comprises an excitation unit, and the excitation unit performs an excitation operation method comprising:
generating a weight for each feature channel by a parameter w, wherein the parameter w is used for establishing a relevant weight between the feature channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z))
wherein s is the relative weight of different characteristic channels, σ () is Sigmoid function, z is the result obtained by the compression unit, δ () is RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the full connection layer;
and adding the output weight to each characteristic channel by adopting a Reweight operation to finish the recalibration of the original characteristic in the channel dimension.
7. The DW-SEnet model-based robot target detection method of claim 1, wherein: the classification of the image in S4 uses the Softmax function.
8. A robot target detection system based on a DW-SEnet model is characterized by comprising:
the image acquisition module is used for acquiring image information of a target to be detected and preprocessing the image information to obtain an input image;
the feature extraction module is used for carrying out depth separable convolution DW operation on the input image and extracting image features;
the weight excitation module is used for calculating importance degree weights of different characteristic channels of the image characteristics by adopting a compression excitation module SE in the process of forward propagation, and inhibiting or enhancing the characteristics of the different channels according to the weight to obtain characteristic images;
the target positioning and classifying module is used for constructing a plurality of candidate detection frames according to the characteristic images, positioning the target to be detected and classifying the images in the candidate detection frames according to the characteristics to obtain a classification result;
and the target detection module is used for eliminating redundant candidate detection frames through non-maximum inhibition and marking classification results to obtain target detection results.
9. The DW-SEnet model-based robot target detection system of claim 8, wherein in the weight excitation module, the method of the compression excitation module SE performing a compression operation comprises:
compressing the features along the spatial dimension, converting the two-dimensional feature channel into a real number, wherein the real number has a global receptive field, represents the global distribution of the response on the feature channel, and enables the layer close to the input to also obtain the global receptive field, wherein the compression formula is:
Figure FDA0003552339120000031
wherein z is c Is the global receptive field, w is the feature matrix length, h is the feature matrix width, u c (i, j) is a convolution operation, i is a feature number of the feature matrix in the long direction, and j is a feature number of the feature matrix in the wide direction.
10. The DW-SEnet model-based robot target detection system of claim 9, wherein in the weighted excitation module, the compressed excitation module SE comprises an excitation unit, and the method of excitation operation by the excitation unit comprises:
generating a weight for each feature channel by a parameter w, wherein the parameter w is used for establishing a relevant weight between the feature channels;
s=σ(g(z,w))=σ(w 2 δ(w 1 z))
wherein s is the relative weight of different characteristic channels, σ () is Sigmoid function, z is the result obtained by the compression unit, δ () is RELU activation function, w 1 ,w 2 The correlation coefficient to be learned for the full connection layer;
and adding the output weight to each characteristic channel by adopting a Reweight operation to finish the recalibration of the original characteristic in the channel dimension.
CN202210265173.0A 2022-03-17 2022-03-17 Robot target detection method and system based on DW-SEnet model Pending CN114842320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210265173.0A CN114842320A (en) 2022-03-17 2022-03-17 Robot target detection method and system based on DW-SEnet model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210265173.0A CN114842320A (en) 2022-03-17 2022-03-17 Robot target detection method and system based on DW-SEnet model

Publications (1)

Publication Number Publication Date
CN114842320A true CN114842320A (en) 2022-08-02

Family

ID=82562338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210265173.0A Pending CN114842320A (en) 2022-03-17 2022-03-17 Robot target detection method and system based on DW-SEnet model

Country Status (1)

Country Link
CN (1) CN114842320A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN110569901A (en) * 2019-09-05 2019-12-13 北京工业大学 Channel selection-based countermeasure elimination weak supervision target detection method
CN112734732A (en) * 2021-01-11 2021-04-30 石家庄铁道大学 Railway tunnel leaky cable clamp detection method based on improved SSD algorithm
CN112836651A (en) * 2021-02-04 2021-05-25 浙江理工大学 Gesture image feature extraction method based on dynamic fusion mechanism
US20210365716A1 (en) * 2018-09-11 2021-11-25 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN113822185A (en) * 2021-09-09 2021-12-21 安徽农业大学 Method for detecting daily behavior of group health pigs
CN114005003A (en) * 2021-12-09 2022-02-01 齐齐哈尔大学 Remote sensing scene image classification method based on channel multi-packet fusion
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
US20210365716A1 (en) * 2018-09-11 2021-11-25 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN110569901A (en) * 2019-09-05 2019-12-13 北京工业大学 Channel selection-based countermeasure elimination weak supervision target detection method
CN112734732A (en) * 2021-01-11 2021-04-30 石家庄铁道大学 Railway tunnel leaky cable clamp detection method based on improved SSD algorithm
CN112836651A (en) * 2021-02-04 2021-05-25 浙江理工大学 Gesture image feature extraction method based on dynamic fusion mechanism
CN113822185A (en) * 2021-09-09 2021-12-21 安徽农业大学 Method for detecting daily behavior of group health pigs
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method
CN114005003A (en) * 2021-12-09 2022-02-01 齐齐哈尔大学 Remote sensing scene image classification method based on channel multi-packet fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIAOFEI CHAO ET.AL: "Construction of Apple Leaf Diseases Identification Networks Based on Xception Fused by SE Module", pages 1 - 14 *
常虹: "《机器学习应用视角》", 机械工业出版社, pages: 283 - 284 *
邬可等: "基于压缩激励残差网络与特征融合的行人重识别", vol. 57, no. 18, pages 97 - 103 *

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN110930454B (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN107229904B (en) Target detection and identification method based on deep learning
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN114202672A (en) Small target detection method based on attention mechanism
CN111738344B (en) Rapid target detection method based on multi-scale fusion
CN113469073A (en) SAR image ship detection method and system based on lightweight deep learning
CN111627050B (en) Training method and device for target tracking model
CN110222718B (en) Image processing method and device
CN110610210B (en) Multi-target detection method
CN111768415A (en) Image instance segmentation method without quantization pooling
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN111242026A (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN114627290A (en) Mechanical part image segmentation algorithm based on improved DeepLabV3+ network
CN112308825A (en) SqueezeNet-based crop leaf disease identification method
US20220215617A1 (en) Viewpoint image processing method and related device
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN112132145B (en) Image classification method and system based on model extended convolutional neural network
CN113033371A (en) CSP model-based multi-level feature fusion pedestrian detection method
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN117037215A (en) Human body posture estimation model training method, estimation device and electronic equipment
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination