CN112580529A - Mobile robot perception identification method, device, terminal and storage medium - Google Patents
Mobile robot perception identification method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN112580529A CN112580529A CN202011533569.6A CN202011533569A CN112580529A CN 112580529 A CN112580529 A CN 112580529A CN 202011533569 A CN202011533569 A CN 202011533569A CN 112580529 A CN112580529 A CN 112580529A
- Authority
- CN
- China
- Prior art keywords
- mobile robot
- pred
- detection
- training
- center
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008447 perception Effects 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000003860 storage Methods 0.000 title claims description 9
- 238000001514 detection method Methods 0.000 claims abstract description 80
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000011176 pooling Methods 0.000 claims abstract description 22
- 238000001914 filtration Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000010586 diagram Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 12
- 230000007423 decrease Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- 230000017105 transposition Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a mobile robot perception identification method, which comprises the following steps: training a mobile robot perception recognition model based on a CenterNet target detection algorithm; acquiring a real-time image of the mobile robot, substituting the real-time image into a trained mobile robot perception recognition model for detection, and outputting a detection frame; and carrying out maximum pooling layer filtering on the detection frames with different windows, and outputting the mobile robot frame reaching the detection confidence threshold. According to the mobile robot perception identification method, the picture to be detected is placed into the mobile robot perception identification model after training based on the complete convolution CenterNet target detection algorithm, redundant detection frames are filtered out by adopting the largest pooling layers with different scales, the time consumed for GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence coefficient threshold value is facilitated, and the accuracy and the identification speed of mutual identification of robots are improved.
Description
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of robots, in particular to a mobile robot perception identification method, a mobile robot perception identification device, a mobile robot perception identification terminal and a storage medium.
[ background of the invention ]
Along with the development of industrial robots, more and more intelligent robots enter the daily life of people and become life assistants for taking away, express delivery, buying coffee and the like. The robot can sense the surrounding environment by means of sensors such as a camera and laser, and can freely move in the environments such as buildings, hotels and the like. Usually, one robot is far from bearing the customer requirements of the whole building, and a plurality of robot intelligent groups are needed to realize the business requirements of multiple customers. The decision of each single intelligent agent can be comprehensively achieved through multi-robot intelligent agent decision, the optimal solution on the overall performance is achieved, and scenes of resource competition exist such as multiple robots passing through gates, taking elevators and the like. Therefore, the robots can accurately sense and recognize each other, and the method is the first step of the multi-agent system.
Meanwhile, with the development of deep learning, an anchor-frame-free target detection algorithm draws attention of the industry of the scientific research community, and the target detection algorithm based on the CenterNet has the characteristics of high speed, high detection precision, easiness in deployment and the like, and is increasingly applied to image target perception and recognition. However, because the cenernet is a detection algorithm for the non-maximum suppression module, the operation on the GPU is time-consuming due to the non-parallel characteristic of the operation of the NMS. Therefore, in the centeret, a maximum pooling layer with the window size of 3 × 3 is used to filter out redundant detection frames, but in an actual scene, the problems that a confidence threshold is difficult to set and the false detection rate is high easily occur, so that the accuracy rate of mutual identification of robots is reduced.
In view of the above, it is desirable to provide a method, an apparatus, a terminal and a storage medium for mobile robot sensing and recognition to overcome the above-mentioned drawbacks.
[ summary of the invention ]
The invention aims to provide a mobile robot perception identification method, a mobile robot perception identification device, a mobile robot perception identification terminal and a storage medium, and aims to solve the problems that confidence threshold values are difficult to set and false detection rate is high easily in practical application scenes of an existing target detection algorithm model based on CenterNet.
In order to achieve the above object, a first aspect of the present invention provides a mobile robot sensing and recognizing method, including the steps of:
training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
acquiring a real-time image of the mobile robot, substituting the real-time image into the trained perception recognition model of the mobile robot for detection, and outputting a detection frame;
and filtering the maximum pooling layers of the windows with different sizes of the detection frame, and outputting a mobile robot frame reaching a detection confidence coefficient threshold value.
In a preferred embodiment, the step of training the mobile robot perception recognition model based on the centrnet target detection algorithm comprises the following steps:
collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image, and establishing a training set;
building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are sequentially connected; the detection head comprises a central probability branch, a size branch and a central point offset branch;
initializing network parameters of the mobile robot perception identification model, and generating an initial weight and an initial bias;
inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through the backbone network; generating a feature map central point probability predicted value through the central probability branch; generating a size prediction value of the feature map through the size branch; generating an offset predicted value of each characteristic point from the central point through the central point offset branch;
and calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
In a preferred embodiment, a robot object box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd training target value offset from center point offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively as the coordinate value of the center point of the target frame, oxgt、oygtTraining a target value for the amount of deviation of each feature point from the center point of the target block, square bracket]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
in a preferred embodiment, the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredFor the predicted value of the offset of each feature point from the center point, SpredAnd the probability prediction value is the central point probability of the feature map.
In a preferred embodiment, a random gradient descent method and a momentum method are adopted to perform minimum calculation on the preset loss function, training is terminated after 120 times of training, and network parameters of the mobile robot perception recognition model are stored; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
In a preferred embodiment, the step of filtering the detection frames by the maximum pooling layers of windows with different sizes and outputting the mobile robot frame reaching the detection confidence threshold is performed by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spredlocation=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, Spred is the feature map center point probability predicted value, threshold is the detection confidence threshold, heatmap is the feature heat map, and location indicates the location of the mobile robot on the feature heat map.
In a preferred embodiment, the mobile robot block information is obtained by the following formula:
center=location+0.5+offsetpred
wh=e-whpred,
wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredIs a predicted value of the offset of the feature point from the central point, whpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
A second aspect of the present invention provides a mobile robot sensing and recognizing apparatus, including:
the training module is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the mobile robot perception recognition model after training for detection, and outputting a detection frame;
and the filtering output module is used for filtering the maximum pooling layers of the windows with different sizes of the detection frame and outputting the mobile robot frame reaching the detection confidence coefficient threshold.
A third aspect of the present invention provides a terminal, which includes a memory, a processor, and a mobile robot sensing and recognizing program stored in the memory and executable on the processor, wherein the mobile robot sensing and recognizing program, when executed by the processor, implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
A fourth aspect of the present invention provides a computer-readable storage medium, in which a mobile robot sensing and recognizing program is stored, and the mobile robot sensing and recognizing program, when executed by a processor, implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
According to the mobile robot perception identification method, the picture to be detected is placed into the mobile robot perception identification model after training based on the complete convolution CenterNet target detection algorithm, redundant detection frames are filtered out by adopting the largest pooling layers with different scales, the time consumed for GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence coefficient threshold value is facilitated, and the accuracy and the identification speed of mutual identification of robots are improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a mobile robot sensing and recognizing method provided by the present invention;
fig. 2 is a flowchart illustrating a sub-step of step S11 in the perception identification method for a mobile robot shown in fig. 1;
FIG. 3 is a schematic diagram of a volume block in the model constructed by the method of FIG. 1;
FIG. 4 is a schematic diagram of a residual convolution module in the model constructed by the method shown in FIG. 1;
FIG. 5 is a schematic diagram of a backbone network in a model constructed by the method shown in FIG. 1;
FIG. 6 is a schematic diagram of a detection head in the model constructed by the method shown in FIG. 1;
FIG. 7 is a schematic diagram of a network of the robot knowledge difference detector constructed by the method shown in FIG. 1;
FIG. 8 is a block diagram of a mobile robotic perception-recognition device;
fig. 9 is a block diagram of a training module in the mobile robot sensing and recognizing apparatus shown in fig. 8.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantageous effects of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the detailed description. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
It should be noted that, convolution: in the field of computer vision, convolution kernels and filters are usually small-sized matrixes, such as 3 × 3, 5 × 5 and the like, and digital images are 2-dimensional (multidimensional) matrixes (tensors) with relatively large sizes, and features (patterns) from simple to complex are learned layer by layer through convolutional neural networks.
In an embodiment of the present invention, a first aspect is to provide a mobile robot sensing recognition method for recognizing a plurality of robots with each other. As shown in FIG. 1, the following steps S11-S13 are included.
And step S11, training a mobile robot perception recognition model based on a CenterNet target detection algorithm.
CenterNet: also known as Objects as Points, an algorithm based on a full convolution depth neural network, performs feature extraction on an input picture, performs target detection by predicting the position of a central point of an object, and can predict the size of the object. Further, as shown in FIG. 2, step S11 includes the following sub-steps S111-S115.
And step S111, collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, labeling target boxes of the robots in the images, and establishing a training set. Specifically, a plurality of robots are used for collecting robot data among buildings at random and from multiple angles. In order to increase the complexity of the robot, the data richness can be increased by randomly pasting advertisement patterns, placing sundries and the like on the robot. In addition, the robot in the collected picture is marked in a rectangular frame selection mode.
Step S112, building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are connected in sequence; the detection head includes a center probability branch, a size branch, and a center point offset branch.
In this step, fig. 3 is a schematic diagram of a rolling block, which includes a rolling layer (Cin, Cout, K, S), a batch normalization layer and an active layer; the convolutional layer is a basic unit composed of a visual deep neural network, and has attributes such as a convolution window (Kernel, abbreviated as K), a convolution span (Stride, abbreviated as S), and the number of input/output channels (Cin, Cout). Fig. 4 is a schematic diagram of a residual convolution module, including a plurality of convolution blocks. Fig. 5 is a schematic diagram of a backbone network, in an actual scene, the recognition effect of a near robot is more valuable than that of a far robot in a picture, therefore, a detector (including the backbone network and a detection head) only needs to pay attention to the recognition rate of the near robot, and for this, the backbone network of the preferred embodiment of the present invention detects a target object at 8 times of a sampling layer; where 8X represents the multiple of the down-sampling, if the input picture is 320 × 320, the feature layer size is 40 × 40. Fig. 6 is a schematic diagram of a detection head according to a preferred embodiment of the present invention, which includes three branches for predicting the feature map target object center probability (score), the target size (w, h), and the center point offset (offset _ x, offset _ y). Fig. 7 is a schematic diagram of a robot sensing and recognizing detector network based on centret in this embodiment.
Step S113 initializes the network parameters of the mobile robot sensing recognition model, and generates an initial weight and an initial bias. In particular embodiments, the initialization may be performed using ImagNet pre-training weights.
Step S114, inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through a backbone network; generating a feature graph center point probability predicted value through a center probability branch; generating a size predicted value of the feature map through the size branch; and generating an offset predicted value of each feature point from the center point through the center point offset branch.
In particular, a robot object box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd training target value offset from center point offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
Wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively, the center point X, Y coordinate value, ox of the target boxgt、oygtThe target value is trained for the X, Y coordinate offset of each feature point from the center point of the target box]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
it can be understood that since the thermodynamic diagram of the down-sampling by 8 times is directly obtained by adopting the full convolution depth neural network, the anchors (anchors) do not need to be set in advance, so that the network parameter quantity and the calculation quantity are greatly reduced. The number of channels of the thermodynamic diagram is equal to the number of target categories to be detected, the first 100 peak values of the thermodynamic diagram are used as target center points extracted by the network, and finally, the final target center points are obtained by setting confidence threshold values for screening.
All upsampling in the centret algorithm is preceded by a deformable convolution (deformable) which has the effect of making the network's field of view more accurate, rather than being confined to the 3 x 3 rectangular convolution box. Meanwhile, the resolution of the 8-time downsampling feature map (the output of the convolution layer in the convolution network) is much higher than that of the general network, so that the large and small targets can be detected better without a characteristic pyramid network. The centret does not need NMS (Non-Maximum Suppression), because all detected center points are obtained from the peak of the thermodynamic diagram, so there is already a process of Non-Maximum Suppression, and NMS is very time consuming, so the computation time of the edge-side GPU is reduced. The Centernet adopts a full-convolution backbone network for encoding and decoding, the up-sampling uses the transposition convolution, the transposition convolution is greatly different from the bilinear difference value in the general up-sampling, and the transposition convolution can better restore the semantic information and the position information of the image.
And S115, calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated circulation of forward propagation and back propagation until a preset iteration stop condition is reached, and generating the trained mobile robot perception recognition model.
Specifically, the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredPredicting X, Y coordinate offset value s of each feature point from the central pointpredAnd the probability prediction value is the central point probability of the feature map.
Further, in the model training process, a random gradient descent method and a momentum method are adopted to carry out minimum calculation on the preset loss function, the training is terminated after 120 times of training, and the network parameters of the mobile robot perception recognition model are stored; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
And step S12 is executed, real-time images of the mobile robot are collected and substituted into the trained mobile robot perception recognition model for detection, and a detection frame is output.
Specifically, the first 100 peaks of the characteristic thermodynamic diagram are taken as target center points extracted by the network, and a detection frame based on the center points of the multiple peaks is output. Some of the detection boxes with lower confidence in the center point may be referred to as redundant boxes.
And 513, performing maximum pooling layer filtering on the detection frames with different windows, and outputting the mobile robot frame reaching the detection confidence threshold.
Specifically, the detection frame is filtered by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spred
location=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, Spred is the feature map center point probability predicted value, threshold is the detection confidence threshold, heatmap is the feature heat map, and location indicates the location of the mobile robot on the feature heat map.
Further, the mobile robot frame information is obtained by the following formula, so that the mobile robot is subjected to frame selection on an input picture, and mutual perception identification among the robots is further realized.
center=location+0.5+offsetpred
wh=e-wh pred,
Wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredIs a predicted value of the offset of the feature point from the central point, whpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
In summary, the mobile robot sensing identification method provided by the invention is based on the complete convolution CenterNet target detection algorithm, the picture to be detected is put into the trained mobile robot sensing identification model, and a plurality of largest pooling layers with different scales are adopted to filter out redundant detection frames, so that the time consumption of GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence threshold value is facilitated, and the accuracy and the identification speed of the mutual identification of the robots are improved.
The second aspect of the present invention provides a mobile robot sensing and recognizing device 100, which is applied to a GPU at an edge of a robot and is used for sensing and recognizing other robots. It should be noted that the implementation principle and the implementation mode of the mobile robot sensing and recognizing device 100 are consistent with those of the mobile robot sensing and recognizing method, and therefore, the following description is omitted.
As shown in fig. 8, the mobile robot sensing and recognizing device 100 includes:
the training module 10 is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module 20 is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the trained mobile robot perception identification model for detection, and outputting a detection frame;
and the filtering output module 30 is configured to perform maximum pooling layer filtering on the detection frames with different size windows, and output a mobile robot frame reaching the detection confidence threshold.
Further, as shown in fig. 9, the training module 10 includes:
the training set establishing unit 11 is used for collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image and establishing a training set;
the model building unit 12 is used for building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are connected in sequence; the detection head comprises a central probability branch, a size branch and a central point offset branch;
the initialization unit 13 is configured to initialize network parameters of the mobile robot sensing recognition model, and generate an initial weight and an initial bias;
a feature extraction unit 14, configured to input all images of the training set into the initialized mobile robot sensing recognition model, and extract a feature map of the input image through a backbone network; generating a feature graph center point probability predicted value through a center probability branch; generating a size predicted value of the feature map through the size branch; generating an offset predicted value of each feature point from the central point through the central point offset branch;
and the training unit 15 is used for calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
In yet another aspect, the present invention provides a terminal (not shown in the drawings), where the terminal includes a memory, a processor, and a mobile robot sensing and recognizing program stored in the memory and capable of running on the processor, and when the mobile robot sensing and recognizing program is executed by the processor, the mobile robot sensing and recognizing program implements the steps of the mobile robot sensing and recognizing method according to any one of the foregoing embodiments.
The present invention further provides a computer-readable storage medium (not shown in the drawings), in which a mobile robot sensing and recognizing program is stored, and when the mobile robot sensing and recognizing program is executed by a processor, the mobile robot sensing and recognizing program implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system or apparatus/terminal device and method can be implemented in other ways. For example, the above-described system or apparatus/terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The invention is not limited solely to that described in the specification and embodiments, and additional advantages and modifications will readily occur to those skilled in the art, so that the invention is not limited to the specific details, representative apparatus, and illustrative examples shown and described herein, without departing from the spirit and scope of the general concept as defined by the appended claims and their equivalents.
Claims (10)
1. A mobile robot perception identification method is characterized by comprising the following steps:
training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
acquiring a real-time image of the mobile robot, substituting the real-time image into the trained perception recognition model of the mobile robot for detection, and outputting a detection frame;
and filtering the maximum pooling layers of the windows with different sizes of the detection frame, and outputting a mobile robot frame reaching a detection confidence coefficient threshold value.
2. The mobile robot-aware recognition method of claim 1, wherein the training of the mobile robot-aware recognition model based on the centrnet target detection algorithm comprises:
collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image, and establishing a training set;
building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are sequentially connected; the detection head comprises a central probability branch, a size branch and a central point offset branch;
initializing network parameters of the mobile robot perception identification model, and generating an initial weight and an initial bias;
inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through the backbone network; generating a feature map central point probability predicted value through the central probability branch; generating a size prediction value of the feature map through the size branch; generating an offset predicted value of each characteristic point from the central point through the central point offset branch;
and calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
3. The mobile robot-aware recognition method of claim 2, wherein a robot target box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd offset training target value offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
Wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively as the coordinate value of the center point of the target frame, oxgt、oygtTraining a target value for the amount of deviation of each feature point from the center point of the target block, square bracket]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
4. the mobile robot-aware recognition method of claim 3, wherein the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredFor the predicted value of the offset of each feature point from the center point, SpredAnd the probability prediction value is the central point probability of the feature map.
5. The mobile robot sensing recognition method according to claim 4, wherein the preset loss function is subjected to minimum calculation by using a stochastic gradient descent method and a momentum method, training is terminated after 120 times of training, and network parameters of the mobile robot sensing recognition model are saved; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
6. The mobile robot-aware recognition method of any one of claims 1 to 5, wherein the step of filtering the detection boxes by a maximum pooling layer of windows of different sizes and outputting the mobile robot box reaching the detection confidence threshold is performed by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spredlocation=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, SpredThe probability prediction value of the center point of the feature map is obtained, threshold is a detection confidence threshold value, heatmap is a feature heat map, and location represents the position of the mobile robot on the feature heat map.
7. The mobile robot sensing recognition method of claim 6, wherein the mobile robot box information is obtained by the following formula:
center=location+0.5+offsetpred
wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredPredicting the offset of the feature point from the central point, WhpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
8. A mobile robotic perception identification device, comprising:
the training module is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the mobile robot perception recognition model after training for detection, and outputting a detection frame;
and the filtering output module is used for filtering the maximum pooling layers of the windows with different sizes of the detection frame and outputting the mobile robot frame reaching the detection confidence coefficient threshold.
9. A terminal, characterized in that the terminal comprises a memory, a processor and a mobile robot sensing and recognition program stored in the memory and executable on the processor, the mobile robot sensing and recognition program, when executed by the processor, implementing the steps of the mobile robot sensing and recognition method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a mobile robot perception-recognition program, which when executed by a processor implements the steps of the mobile robot perception-recognition method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011533569.6A CN112580529B (en) | 2020-12-22 | 2020-12-22 | Mobile robot perception recognition method, device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011533569.6A CN112580529B (en) | 2020-12-22 | 2020-12-22 | Mobile robot perception recognition method, device, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112580529A true CN112580529A (en) | 2021-03-30 |
CN112580529B CN112580529B (en) | 2024-08-20 |
Family
ID=75139029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011533569.6A Active CN112580529B (en) | 2020-12-22 | 2020-12-22 | Mobile robot perception recognition method, device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112580529B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688989A (en) * | 2021-08-31 | 2021-11-23 | 中国平安人寿保险股份有限公司 | Deep learning network acceleration method, device, equipment and storage medium |
CN115063410A (en) * | 2022-08-04 | 2022-09-16 | 中建电子商务有限责任公司 | Steel pipe counting method based on anchor-free target detection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180181864A1 (en) * | 2016-12-27 | 2018-06-28 | Texas Instruments Incorporated | Sparsified Training of Convolutional Neural Networks |
CN110532894A (en) * | 2019-08-05 | 2019-12-03 | 西安电子科技大学 | Remote sensing target detection method based on boundary constraint CenterNet |
CN110633731A (en) * | 2019-08-13 | 2019-12-31 | 杭州电子科技大学 | Single-stage anchor-frame-free target detection method based on staggered sensing convolution |
-
2020
- 2020-12-22 CN CN202011533569.6A patent/CN112580529B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180181864A1 (en) * | 2016-12-27 | 2018-06-28 | Texas Instruments Incorporated | Sparsified Training of Convolutional Neural Networks |
CN110532894A (en) * | 2019-08-05 | 2019-12-03 | 西安电子科技大学 | Remote sensing target detection method based on boundary constraint CenterNet |
CN110633731A (en) * | 2019-08-13 | 2019-12-31 | 杭州电子科技大学 | Single-stage anchor-frame-free target detection method based on staggered sensing convolution |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688989A (en) * | 2021-08-31 | 2021-11-23 | 中国平安人寿保险股份有限公司 | Deep learning network acceleration method, device, equipment and storage medium |
CN113688989B (en) * | 2021-08-31 | 2024-04-19 | 中国平安人寿保险股份有限公司 | Deep learning network acceleration method, device, equipment and storage medium |
CN115063410A (en) * | 2022-08-04 | 2022-09-16 | 中建电子商务有限责任公司 | Steel pipe counting method based on anchor-free target detection |
Also Published As
Publication number | Publication date |
---|---|
CN112580529B (en) | 2024-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105938559B (en) | Use the Digital Image Processing of convolutional neural networks | |
CN106709461B (en) | Activity recognition method and device based on video | |
CN107369166B (en) | Target tracking method and system based on multi-resolution neural network | |
US9710697B2 (en) | Method and system for exacting face features from data of face images | |
CN110222718B (en) | Image processing method and device | |
CN111738344A (en) | Rapid target detection method based on multi-scale fusion | |
CN113807361B (en) | Neural network, target detection method, neural network training method and related products | |
CN112580529B (en) | Mobile robot perception recognition method, device, terminal and storage medium | |
CN110532959B (en) | Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network | |
CN112308087B (en) | Integrated imaging identification method based on dynamic vision sensor | |
CN111950702A (en) | Neural network structure determining method and device | |
CN114565092A (en) | Neural network structure determining method and device | |
CN115222896A (en) | Three-dimensional reconstruction method and device, electronic equipment and computer-readable storage medium | |
Xu et al. | Tackling small data challenges in visual fire detection: a deep convolutional generative adversarial network approach | |
CN111428566A (en) | Deformation target tracking system and method | |
CN113449548A (en) | Method and apparatus for updating object recognition model | |
CN114663769A (en) | Fruit identification method based on YOLO v5 | |
CN117252928B (en) | Visual image positioning system for modular intelligent assembly of electronic products | |
CN117237858B (en) | Loop detection method | |
CN110659641B (en) | Text recognition method and device and electronic equipment | |
CN111797849A (en) | User activity identification method and device, storage medium and electronic equipment | |
CN111611917A (en) | Model training method, feature point detection device, feature point detection equipment and storage medium | |
CN111611852A (en) | Method, device and equipment for training expression recognition model | |
CN116758419A (en) | Multi-scale target detection method, device and equipment for remote sensing image | |
CN111797986A (en) | Data processing method, data processing device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |