CN112580529A - Mobile robot perception identification method, device, terminal and storage medium - Google Patents

Mobile robot perception identification method, device, terminal and storage medium Download PDF

Info

Publication number
CN112580529A
CN112580529A CN202011533569.6A CN202011533569A CN112580529A CN 112580529 A CN112580529 A CN 112580529A CN 202011533569 A CN202011533569 A CN 202011533569A CN 112580529 A CN112580529 A CN 112580529A
Authority
CN
China
Prior art keywords
mobile robot
pred
detection
training
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011533569.6A
Other languages
Chinese (zh)
Other versions
CN112580529B (en
Inventor
秦豪
赵明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yogo Robot Co Ltd
Original Assignee
Shanghai Yogo Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yogo Robot Co Ltd filed Critical Shanghai Yogo Robot Co Ltd
Priority to CN202011533569.6A priority Critical patent/CN112580529B/en
Publication of CN112580529A publication Critical patent/CN112580529A/en
Application granted granted Critical
Publication of CN112580529B publication Critical patent/CN112580529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a mobile robot perception identification method, which comprises the following steps: training a mobile robot perception recognition model based on a CenterNet target detection algorithm; acquiring a real-time image of the mobile robot, substituting the real-time image into a trained mobile robot perception recognition model for detection, and outputting a detection frame; and carrying out maximum pooling layer filtering on the detection frames with different windows, and outputting the mobile robot frame reaching the detection confidence threshold. According to the mobile robot perception identification method, the picture to be detected is placed into the mobile robot perception identification model after training based on the complete convolution CenterNet target detection algorithm, redundant detection frames are filtered out by adopting the largest pooling layers with different scales, the time consumed for GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence coefficient threshold value is facilitated, and the accuracy and the identification speed of mutual identification of robots are improved.

Description

Mobile robot perception identification method, device, terminal and storage medium
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of robots, in particular to a mobile robot perception identification method, a mobile robot perception identification device, a mobile robot perception identification terminal and a storage medium.
[ background of the invention ]
Along with the development of industrial robots, more and more intelligent robots enter the daily life of people and become life assistants for taking away, express delivery, buying coffee and the like. The robot can sense the surrounding environment by means of sensors such as a camera and laser, and can freely move in the environments such as buildings, hotels and the like. Usually, one robot is far from bearing the customer requirements of the whole building, and a plurality of robot intelligent groups are needed to realize the business requirements of multiple customers. The decision of each single intelligent agent can be comprehensively achieved through multi-robot intelligent agent decision, the optimal solution on the overall performance is achieved, and scenes of resource competition exist such as multiple robots passing through gates, taking elevators and the like. Therefore, the robots can accurately sense and recognize each other, and the method is the first step of the multi-agent system.
Meanwhile, with the development of deep learning, an anchor-frame-free target detection algorithm draws attention of the industry of the scientific research community, and the target detection algorithm based on the CenterNet has the characteristics of high speed, high detection precision, easiness in deployment and the like, and is increasingly applied to image target perception and recognition. However, because the cenernet is a detection algorithm for the non-maximum suppression module, the operation on the GPU is time-consuming due to the non-parallel characteristic of the operation of the NMS. Therefore, in the centeret, a maximum pooling layer with the window size of 3 × 3 is used to filter out redundant detection frames, but in an actual scene, the problems that a confidence threshold is difficult to set and the false detection rate is high easily occur, so that the accuracy rate of mutual identification of robots is reduced.
In view of the above, it is desirable to provide a method, an apparatus, a terminal and a storage medium for mobile robot sensing and recognition to overcome the above-mentioned drawbacks.
[ summary of the invention ]
The invention aims to provide a mobile robot perception identification method, a mobile robot perception identification device, a mobile robot perception identification terminal and a storage medium, and aims to solve the problems that confidence threshold values are difficult to set and false detection rate is high easily in practical application scenes of an existing target detection algorithm model based on CenterNet.
In order to achieve the above object, a first aspect of the present invention provides a mobile robot sensing and recognizing method, including the steps of:
training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
acquiring a real-time image of the mobile robot, substituting the real-time image into the trained perception recognition model of the mobile robot for detection, and outputting a detection frame;
and filtering the maximum pooling layers of the windows with different sizes of the detection frame, and outputting a mobile robot frame reaching a detection confidence coefficient threshold value.
In a preferred embodiment, the step of training the mobile robot perception recognition model based on the centrnet target detection algorithm comprises the following steps:
collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image, and establishing a training set;
building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are sequentially connected; the detection head comprises a central probability branch, a size branch and a central point offset branch;
initializing network parameters of the mobile robot perception identification model, and generating an initial weight and an initial bias;
inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through the backbone network; generating a feature map central point probability predicted value through the central probability branch; generating a size prediction value of the feature map through the size branch; generating an offset predicted value of each characteristic point from the central point through the central point offset branch;
and calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
In a preferred embodiment, a robot object box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd training target value offset from center point offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively as the coordinate value of the center point of the target frame, oxgt、oygtTraining a target value for the amount of deviation of each feature point from the center point of the target block, square bracket]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
Figure BDA0002852637160000031
in a preferred embodiment, the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Figure BDA0002852637160000041
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredFor the predicted value of the offset of each feature point from the center point, SpredAnd the probability prediction value is the central point probability of the feature map.
In a preferred embodiment, a random gradient descent method and a momentum method are adopted to perform minimum calculation on the preset loss function, training is terminated after 120 times of training, and network parameters of the mobile robot perception recognition model are stored; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
In a preferred embodiment, the step of filtering the detection frames by the maximum pooling layers of windows with different sizes and outputting the mobile robot frame reaching the detection confidence threshold is performed by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spredlocation=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, Spred is the feature map center point probability predicted value, threshold is the detection confidence threshold, heatmap is the feature heat map, and location indicates the location of the mobile robot on the feature heat map.
In a preferred embodiment, the mobile robot block information is obtained by the following formula:
center=location+0.5+offsetpred
wh=e-whpred,
wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredIs a predicted value of the offset of the feature point from the central point, whpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
A second aspect of the present invention provides a mobile robot sensing and recognizing apparatus, including:
the training module is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the mobile robot perception recognition model after training for detection, and outputting a detection frame;
and the filtering output module is used for filtering the maximum pooling layers of the windows with different sizes of the detection frame and outputting the mobile robot frame reaching the detection confidence coefficient threshold.
A third aspect of the present invention provides a terminal, which includes a memory, a processor, and a mobile robot sensing and recognizing program stored in the memory and executable on the processor, wherein the mobile robot sensing and recognizing program, when executed by the processor, implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
A fourth aspect of the present invention provides a computer-readable storage medium, in which a mobile robot sensing and recognizing program is stored, and the mobile robot sensing and recognizing program, when executed by a processor, implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
According to the mobile robot perception identification method, the picture to be detected is placed into the mobile robot perception identification model after training based on the complete convolution CenterNet target detection algorithm, redundant detection frames are filtered out by adopting the largest pooling layers with different scales, the time consumed for GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence coefficient threshold value is facilitated, and the accuracy and the identification speed of mutual identification of robots are improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a mobile robot sensing and recognizing method provided by the present invention;
fig. 2 is a flowchart illustrating a sub-step of step S11 in the perception identification method for a mobile robot shown in fig. 1;
FIG. 3 is a schematic diagram of a volume block in the model constructed by the method of FIG. 1;
FIG. 4 is a schematic diagram of a residual convolution module in the model constructed by the method shown in FIG. 1;
FIG. 5 is a schematic diagram of a backbone network in a model constructed by the method shown in FIG. 1;
FIG. 6 is a schematic diagram of a detection head in the model constructed by the method shown in FIG. 1;
FIG. 7 is a schematic diagram of a network of the robot knowledge difference detector constructed by the method shown in FIG. 1;
FIG. 8 is a block diagram of a mobile robotic perception-recognition device;
fig. 9 is a block diagram of a training module in the mobile robot sensing and recognizing apparatus shown in fig. 8.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantageous effects of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the detailed description. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
It should be noted that, convolution: in the field of computer vision, convolution kernels and filters are usually small-sized matrixes, such as 3 × 3, 5 × 5 and the like, and digital images are 2-dimensional (multidimensional) matrixes (tensors) with relatively large sizes, and features (patterns) from simple to complex are learned layer by layer through convolutional neural networks.
In an embodiment of the present invention, a first aspect is to provide a mobile robot sensing recognition method for recognizing a plurality of robots with each other. As shown in FIG. 1, the following steps S11-S13 are included.
And step S11, training a mobile robot perception recognition model based on a CenterNet target detection algorithm.
CenterNet: also known as Objects as Points, an algorithm based on a full convolution depth neural network, performs feature extraction on an input picture, performs target detection by predicting the position of a central point of an object, and can predict the size of the object. Further, as shown in FIG. 2, step S11 includes the following sub-steps S111-S115.
And step S111, collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, labeling target boxes of the robots in the images, and establishing a training set. Specifically, a plurality of robots are used for collecting robot data among buildings at random and from multiple angles. In order to increase the complexity of the robot, the data richness can be increased by randomly pasting advertisement patterns, placing sundries and the like on the robot. In addition, the robot in the collected picture is marked in a rectangular frame selection mode.
Step S112, building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are connected in sequence; the detection head includes a center probability branch, a size branch, and a center point offset branch.
In this step, fig. 3 is a schematic diagram of a rolling block, which includes a rolling layer (Cin, Cout, K, S), a batch normalization layer and an active layer; the convolutional layer is a basic unit composed of a visual deep neural network, and has attributes such as a convolution window (Kernel, abbreviated as K), a convolution span (Stride, abbreviated as S), and the number of input/output channels (Cin, Cout). Fig. 4 is a schematic diagram of a residual convolution module, including a plurality of convolution blocks. Fig. 5 is a schematic diagram of a backbone network, in an actual scene, the recognition effect of a near robot is more valuable than that of a far robot in a picture, therefore, a detector (including the backbone network and a detection head) only needs to pay attention to the recognition rate of the near robot, and for this, the backbone network of the preferred embodiment of the present invention detects a target object at 8 times of a sampling layer; where 8X represents the multiple of the down-sampling, if the input picture is 320 × 320, the feature layer size is 40 × 40. Fig. 6 is a schematic diagram of a detection head according to a preferred embodiment of the present invention, which includes three branches for predicting the feature map target object center probability (score), the target size (w, h), and the center point offset (offset _ x, offset _ y). Fig. 7 is a schematic diagram of a robot sensing and recognizing detector network based on centret in this embodiment.
Step S113 initializes the network parameters of the mobile robot sensing recognition model, and generates an initial weight and an initial bias. In particular embodiments, the initialization may be performed using ImagNet pre-training weights.
Step S114, inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through a backbone network; generating a feature graph center point probability predicted value through a center probability branch; generating a size predicted value of the feature map through the size branch; and generating an offset predicted value of each feature point from the center point through the center point offset branch.
In particular, a robot object box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd training target value offset from center point offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
Wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively, the center point X, Y coordinate value, ox of the target boxgt、oygtThe target value is trained for the X, Y coordinate offset of each feature point from the center point of the target box]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
Figure BDA0002852637160000091
it can be understood that since the thermodynamic diagram of the down-sampling by 8 times is directly obtained by adopting the full convolution depth neural network, the anchors (anchors) do not need to be set in advance, so that the network parameter quantity and the calculation quantity are greatly reduced. The number of channels of the thermodynamic diagram is equal to the number of target categories to be detected, the first 100 peak values of the thermodynamic diagram are used as target center points extracted by the network, and finally, the final target center points are obtained by setting confidence threshold values for screening.
All upsampling in the centret algorithm is preceded by a deformable convolution (deformable) which has the effect of making the network's field of view more accurate, rather than being confined to the 3 x 3 rectangular convolution box. Meanwhile, the resolution of the 8-time downsampling feature map (the output of the convolution layer in the convolution network) is much higher than that of the general network, so that the large and small targets can be detected better without a characteristic pyramid network. The centret does not need NMS (Non-Maximum Suppression), because all detected center points are obtained from the peak of the thermodynamic diagram, so there is already a process of Non-Maximum Suppression, and NMS is very time consuming, so the computation time of the edge-side GPU is reduced. The Centernet adopts a full-convolution backbone network for encoding and decoding, the up-sampling uses the transposition convolution, the transposition convolution is greatly different from the bilinear difference value in the general up-sampling, and the transposition convolution can better restore the semantic information and the position information of the image.
And S115, calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated circulation of forward propagation and back propagation until a preset iteration stop condition is reached, and generating the trained mobile robot perception recognition model.
Specifically, the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Figure BDA0002852637160000101
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredPredicting X, Y coordinate offset value s of each feature point from the central pointpredAnd the probability prediction value is the central point probability of the feature map.
Further, in the model training process, a random gradient descent method and a momentum method are adopted to carry out minimum calculation on the preset loss function, the training is terminated after 120 times of training, and the network parameters of the mobile robot perception recognition model are stored; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
And step S12 is executed, real-time images of the mobile robot are collected and substituted into the trained mobile robot perception recognition model for detection, and a detection frame is output.
Specifically, the first 100 peaks of the characteristic thermodynamic diagram are taken as target center points extracted by the network, and a detection frame based on the center points of the multiple peaks is output. Some of the detection boxes with lower confidence in the center point may be referred to as redundant boxes.
And 513, performing maximum pooling layer filtering on the detection frames with different windows, and outputting the mobile robot frame reaching the detection confidence threshold.
Specifically, the detection frame is filtered by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spred
location=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, Spred is the feature map center point probability predicted value, threshold is the detection confidence threshold, heatmap is the feature heat map, and location indicates the location of the mobile robot on the feature heat map.
Further, the mobile robot frame information is obtained by the following formula, so that the mobile robot is subjected to frame selection on an input picture, and mutual perception identification among the robots is further realized.
center=location+0.5+offsetpred
wh=e-wh pred
Wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredIs a predicted value of the offset of the feature point from the central point, whpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
In summary, the mobile robot sensing identification method provided by the invention is based on the complete convolution CenterNet target detection algorithm, the picture to be detected is put into the trained mobile robot sensing identification model, and a plurality of largest pooling layers with different scales are adopted to filter out redundant detection frames, so that the time consumption of GPU calculation at the edge end is reduced, meanwhile, the setting of a confidence threshold value is facilitated, and the accuracy and the identification speed of the mutual identification of the robots are improved.
The second aspect of the present invention provides a mobile robot sensing and recognizing device 100, which is applied to a GPU at an edge of a robot and is used for sensing and recognizing other robots. It should be noted that the implementation principle and the implementation mode of the mobile robot sensing and recognizing device 100 are consistent with those of the mobile robot sensing and recognizing method, and therefore, the following description is omitted.
As shown in fig. 8, the mobile robot sensing and recognizing device 100 includes:
the training module 10 is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module 20 is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the trained mobile robot perception identification model for detection, and outputting a detection frame;
and the filtering output module 30 is configured to perform maximum pooling layer filtering on the detection frames with different size windows, and output a mobile robot frame reaching the detection confidence threshold.
Further, as shown in fig. 9, the training module 10 includes:
the training set establishing unit 11 is used for collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image and establishing a training set;
the model building unit 12 is used for building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are connected in sequence; the detection head comprises a central probability branch, a size branch and a central point offset branch;
the initialization unit 13 is configured to initialize network parameters of the mobile robot sensing recognition model, and generate an initial weight and an initial bias;
a feature extraction unit 14, configured to input all images of the training set into the initialized mobile robot sensing recognition model, and extract a feature map of the input image through a backbone network; generating a feature graph center point probability predicted value through a center probability branch; generating a size predicted value of the feature map through the size branch; generating an offset predicted value of each feature point from the central point through the central point offset branch;
and the training unit 15 is used for calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
In yet another aspect, the present invention provides a terminal (not shown in the drawings), where the terminal includes a memory, a processor, and a mobile robot sensing and recognizing program stored in the memory and capable of running on the processor, and when the mobile robot sensing and recognizing program is executed by the processor, the mobile robot sensing and recognizing program implements the steps of the mobile robot sensing and recognizing method according to any one of the foregoing embodiments.
The present invention further provides a computer-readable storage medium (not shown in the drawings), in which a mobile robot sensing and recognizing program is stored, and when the mobile robot sensing and recognizing program is executed by a processor, the mobile robot sensing and recognizing program implements the steps of the mobile robot sensing and recognizing method according to any one of the above embodiments.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system or apparatus/terminal device and method can be implemented in other ways. For example, the above-described system or apparatus/terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The invention is not limited solely to that described in the specification and embodiments, and additional advantages and modifications will readily occur to those skilled in the art, so that the invention is not limited to the specific details, representative apparatus, and illustrative examples shown and described herein, without departing from the spirit and scope of the general concept as defined by the appended claims and their equivalents.

Claims (10)

1. A mobile robot perception identification method is characterized by comprising the following steps:
training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
acquiring a real-time image of the mobile robot, substituting the real-time image into the trained perception recognition model of the mobile robot for detection, and outputting a detection frame;
and filtering the maximum pooling layers of the windows with different sizes of the detection frame, and outputting a mobile robot frame reaching a detection confidence coefficient threshold value.
2. The mobile robot-aware recognition method of claim 1, wherein the training of the mobile robot-aware recognition model based on the centrnet target detection algorithm comprises:
collecting robot image data which are acquired by a plurality of robots at different positions randomly and from multiple angles, marking a target square frame of the robot in the image, and establishing a training set;
building a mobile robot perception recognition model based on a Centernet target detection algorithm; the mobile robot perception identification model comprises a backbone network and a detection head which are sequentially connected; the detection head comprises a central probability branch, a size branch and a central point offset branch;
initializing network parameters of the mobile robot perception identification model, and generating an initial weight and an initial bias;
inputting all images of the training set into the initialized mobile robot perception recognition model, and extracting a characteristic diagram of the input images through the backbone network; generating a feature map central point probability predicted value through the central probability branch; generating a size prediction value of the feature map through the size branch; generating an offset predicted value of each characteristic point from the central point through the central point offset branch;
and calculating a loss value according to a preset loss function, performing back propagation, updating the weight and the configuration of the mobile robot perception recognition model through repeated cycle forward propagation and back propagation until a preset iteration stop condition is reached, and generating a trained mobile robot perception recognition model.
3. The mobile robot-aware recognition method of claim 2, wherein a robot target box (x) on an image is given1,y1,x2,y2) Training target value of size whgtAnd offset training target value offsetgtThe definition is as follows:
wgt=log(x2-x1),hgt=log(y2-y1),
centerx=(x1+x2)/2,centery=(y1+y2)/2,
oxgt=centerx+0.5-[centerx+0.5],
oygt=centery+0.5-[centery+0.5];
Wgt、hgttraining target values, center, for the width and height dimensions of the target box, respectivelyx、centeryRespectively as the coordinate value of the center point of the target frame, oxgt、oygtTraining a target value for the amount of deviation of each feature point from the center point of the target block, square bracket]Represents rounding down;
training target value s of target center probability of feature mapgtThe definition is as follows:
Figure FDA0002852637150000021
4. the mobile robot-aware recognition method of claim 3, wherein the predetermined loss function is:
Loss=Lossscore+0.1*Losswh+Lossoffset(ii) a Wherein,
Losswh=||wgt-wpred||+||hgt-hpred||,
Lossoffset=||oxgt-oxpred||+||oygt-oypred||,
pos=(sgt==1)
neg=(sgt<1)
negweight=(1-sgt)4
Losspos=log(spred)*(1-spred)2*pos
Figure FDA0002852637150000031
Lossscore=Losspos+Lossneg
wherein, Wpred、hpredRespectively returning the width and height dimension predicted values, ox, of the target frame for each feature pointpred、oypredFor the predicted value of the offset of each feature point from the center point, SpredAnd the probability prediction value is the central point probability of the feature map.
5. The mobile robot sensing recognition method according to claim 4, wherein the preset loss function is subjected to minimum calculation by using a stochastic gradient descent method and a momentum method, training is terminated after 120 times of training, and network parameters of the mobile robot sensing recognition model are saved; the learning momentum parameter is set to 0.9, the convolution parameter L2 regular penalty coefficient is set to 0.000125, and the learning rate is polynomial slow decline.
6. The mobile robot-aware recognition method of any one of claims 1 to 5, wherein the step of filtering the detection boxes by a maximum pooling layer of windows of different sizes and outputting the mobile robot box reaching the detection confidence threshold is performed by the following formula:
index3=(MaxPooling3(spred)==spred)
index5=(MaxPooling5(spred)==spred)
index7=(MaxPooling7(spred)==spred)
heatmap=0.5*index3*spred+0.3*index5*spred+0.2*index7*spredlocation=(heatmap>=threshold)
where MaxPooling3 indicates the maximum pooling layer with a window size of 3 × 3, MaxPooling5 indicates the maximum pooling layer with a window size of 5 × 5, MaxPooling7 indicates the maximum pooling layer with a window size of 7 × 7, SpredThe probability prediction value of the center point of the feature map is obtained, threshold is a detection confidence threshold value, heatmap is a feature heat map, and location represents the position of the mobile robot on the feature heat map.
7. The mobile robot sensing recognition method of claim 6, wherein the mobile robot box information is obtained by the following formula:
center=location+0.5+offsetpred
Figure FDA0002852637150000041
wherein center is the center point of the mobile robot frame, wh is the width and height dimension value of the mobile robot frame, offsetpredPredicting the offset of the feature point from the central point, WhpredAnd (4) performing regression on the feature points to predict the size of the mobile robot box.
8. A mobile robotic perception identification device, comprising:
the training module is used for training a mobile robot perception recognition model based on a CenterNet target detection algorithm;
the detection module is used for acquiring a real-time image of the mobile robot, substituting the real-time image into the mobile robot perception recognition model after training for detection, and outputting a detection frame;
and the filtering output module is used for filtering the maximum pooling layers of the windows with different sizes of the detection frame and outputting the mobile robot frame reaching the detection confidence coefficient threshold.
9. A terminal, characterized in that the terminal comprises a memory, a processor and a mobile robot sensing and recognition program stored in the memory and executable on the processor, the mobile robot sensing and recognition program, when executed by the processor, implementing the steps of the mobile robot sensing and recognition method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a mobile robot perception-recognition program, which when executed by a processor implements the steps of the mobile robot perception-recognition method according to any one of claims 1 to 7.
CN202011533569.6A 2020-12-22 2020-12-22 Mobile robot perception recognition method, device, terminal and storage medium Active CN112580529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011533569.6A CN112580529B (en) 2020-12-22 2020-12-22 Mobile robot perception recognition method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011533569.6A CN112580529B (en) 2020-12-22 2020-12-22 Mobile robot perception recognition method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN112580529A true CN112580529A (en) 2021-03-30
CN112580529B CN112580529B (en) 2024-08-20

Family

ID=75139029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011533569.6A Active CN112580529B (en) 2020-12-22 2020-12-22 Mobile robot perception recognition method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN112580529B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688989A (en) * 2021-08-31 2021-11-23 中国平安人寿保险股份有限公司 Deep learning network acceleration method, device, equipment and storage medium
CN115063410A (en) * 2022-08-04 2022-09-16 中建电子商务有限责任公司 Steel pipe counting method based on anchor-free target detection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181864A1 (en) * 2016-12-27 2018-06-28 Texas Instruments Incorporated Sparsified Training of Convolutional Neural Networks
CN110532894A (en) * 2019-08-05 2019-12-03 西安电子科技大学 Remote sensing target detection method based on boundary constraint CenterNet
CN110633731A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Single-stage anchor-frame-free target detection method based on staggered sensing convolution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181864A1 (en) * 2016-12-27 2018-06-28 Texas Instruments Incorporated Sparsified Training of Convolutional Neural Networks
CN110532894A (en) * 2019-08-05 2019-12-03 西安电子科技大学 Remote sensing target detection method based on boundary constraint CenterNet
CN110633731A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Single-stage anchor-frame-free target detection method based on staggered sensing convolution

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688989A (en) * 2021-08-31 2021-11-23 中国平安人寿保险股份有限公司 Deep learning network acceleration method, device, equipment and storage medium
CN113688989B (en) * 2021-08-31 2024-04-19 中国平安人寿保险股份有限公司 Deep learning network acceleration method, device, equipment and storage medium
CN115063410A (en) * 2022-08-04 2022-09-16 中建电子商务有限责任公司 Steel pipe counting method based on anchor-free target detection

Also Published As

Publication number Publication date
CN112580529B (en) 2024-08-20

Similar Documents

Publication Publication Date Title
CN105938559B (en) Use the Digital Image Processing of convolutional neural networks
CN106709461B (en) Activity recognition method and device based on video
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
US9710697B2 (en) Method and system for exacting face features from data of face images
CN110222718B (en) Image processing method and device
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN112580529B (en) Mobile robot perception recognition method, device, terminal and storage medium
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN112308087B (en) Integrated imaging identification method based on dynamic vision sensor
CN111950702A (en) Neural network structure determining method and device
CN114565092A (en) Neural network structure determining method and device
CN115222896A (en) Three-dimensional reconstruction method and device, electronic equipment and computer-readable storage medium
Xu et al. Tackling small data challenges in visual fire detection: a deep convolutional generative adversarial network approach
CN111428566A (en) Deformation target tracking system and method
CN113449548A (en) Method and apparatus for updating object recognition model
CN114663769A (en) Fruit identification method based on YOLO v5
CN117252928B (en) Visual image positioning system for modular intelligent assembly of electronic products
CN117237858B (en) Loop detection method
CN110659641B (en) Text recognition method and device and electronic equipment
CN111797849A (en) User activity identification method and device, storage medium and electronic equipment
CN111611917A (en) Model training method, feature point detection device, feature point detection equipment and storage medium
CN111611852A (en) Method, device and equipment for training expression recognition model
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN111797986A (en) Data processing method, data processing device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant