CN114626452A

CN114626452A - Model training method, model detection method, robot, and storage medium

Info

Publication number: CN114626452A
Application number: CN202210236681.6A
Authority: CN
Inventors: 刘力格; 许铭淏; 程冉; 孙涛
Original assignee: Midea Robozone Technology Co Ltd
Current assignee: Midea Robozone Technology Co Ltd
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-14

Abstract

The application discloses a model training method, a detection method, a robot and a storage medium for detecting a mirror surface object. The model training method for detecting the mirror surface object comprises the following steps: acquiring a training image, wherein the training image comprises a first image with a first mirror-like area and a second image with a second mirror-like area; constructing a preset neural network model; alternately training a preset neural network model by using the first image and the second image; and obtaining a trained mirror surface object detection model under the condition that the preset neural network model meets the test conditions. According to the model training method, the detection method, the robot and the storage medium for detecting the mirror surface object, the preset neural network model for sharing the object characteristic information is constructed, the characteristic that glass and a mirror have strong similarity is utilized, and meanwhile, the training data of the preset neural network model is greatly expanded through the data sets of the two objects of the glass and the mirror, so that the trained preset neural network model has better precision and generalization.

Description

Model training method, model detection method, robot, and storage medium

Technical Field

The present application relates to the field of image recognition technology in the field of computer vision, and in particular, to a model training method, a detection method, a robot, and a storage medium.

Background

In the field of robotic perception, the accuracy of the sensors is of paramount importance. And aiming at natural objects with high refractivity and high reflectivity such as glass and mirrors, a large number of traditional sensors are easy to generate abnormal conditions such as cavities and the like. Therefore, the detection of objects of the class of glasses remains a challenging but significant computer vision task.

The existing robot has low precision of identifying glass and mirror surfaces, and can cause the robot to bump the glass and the mirror surfaces. That is, the current method for recognizing glass and mirror surfaces by a robot cannot satisfy the requirement of pixel-level segmentation of glass and mirror objects at the same time, cannot have wide universality, and cannot realize the lightweight model while ensuring the recognition accuracy.

Disclosure of Invention

In view of the above, the present invention is directed to solving, at least to some extent, one of the problems in the related art. To this end, the present application aims to provide a model training method, a detection method, a robot, and a storage medium.

The application provides a model training method for detecting a mirror surface object. The model training method comprises the following steps: acquiring a training image, wherein the training image comprises a first image with a first mirror-like area and a second image with a second mirror-like area; constructing a preset neural network model; alternately training the preset neural network model by using the first image and the second image; and obtaining a trained mirror surface object detection model under the condition that the preset neural network model meets the test conditions.

In some embodiments, the acquiring training images comprises: acquiring images to be processed, which are respectively provided with a first mirror surface object and a second mirror surface object, in a scene through a camera; classifying the image to be processed to obtain the first image and the second image; and respectively obtaining mask images corresponding to the first image and the second image according to the labeling information of the image to be processed.

In some embodiments, the alternately training the pre-set neural network model using the first image and the second image comprises: processing the first image or the second image through the preset neural network model to obtain a predicted image; calculating the error of the preset neural network model according to the mask image and the predicted image; and optimizing the preset neural network model according to the error.

In some embodiments, the processing the first image or the second image through the preset neural network to obtain a predicted image includes: taking the first image or the second image as an input image of the preset neural network model; performing feature extraction on the input image to obtain a multi-level feature map; fusing the multi-level feature maps to obtain a multi-level fused feature map; processing the multi-level fusion feature map and synthesizing a target feature map; predicting according to the target feature map and a first prediction module to obtain a first prediction result; and generating the predicted image according to the first prediction result.

In some embodiments, the extracting the features of the input image to obtain a multi-level feature map includes: and performing feature extraction on the input image by using ResNeXt to obtain a multi-level feature map.

In some embodiments, the fusing the multi-level feature maps to obtain a multi-level fused feature map includes: sequentially up-sampling the characteristic diagrams of each level to obtain a plurality of up-sampled characteristic diagrams; and fusing the up-sampling feature map corresponding to the feature map of each level with the feature map of the previous level to obtain the fused feature map.

In some embodiments, the fusing the up-sampling feature maps corresponding to the feature maps of each level with the feature map of the previous level to obtain the fused feature map includes: predicting according to the up-sampling feature map of the current level and a second prediction module to obtain a second prediction result; processing the second prediction result to obtain an attention diagram; multiplying the attention diagram with the feature diagram of the previous level to obtain an intermediate feature diagram; and adding the intermediate feature map and the up-sampling feature map of the current level to obtain the fused feature map.

In some embodiments, the step of calculating the error of the preset neural network model from the mask image and the prediction image is performed by the following loss function:

L＝ω_predL_pred+ω_auxL_aux

wherein L is_predAnd L_auxLov-sz Hinge error, ω, for the first prediction module and the second prediction module, respectively_predAnd ω_auxIs the weight of the first prediction module and the second prediction module errors.

In some embodiments, the predicting according to the target feature map and the first prediction module to obtain the first prediction result includes: and predicting the target characteristic diagram by using an SPhead module to obtain the first prediction result.

The application also provides a detection method using the mirror surface object detection model. The mirror surface object detection model is obtained by training according to the model training method in any one of the above embodiments, and the detection method includes: acquiring an image to be detected; processing the image to be detected by using the mirror surface object detection model to obtain a detection result; and generating a detection image according to the detection result.

The application also provides a robot. The robot comprises a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the detection method described above.

The present application also provides a non-transitory computer-readable storage medium containing a computer program. The computer program, when executed by one or more processors, implements the detection method described above.

According to the model training method and the detection method for detecting the mirror surface object, the robot and the storage medium, the preset neural network model sharing the object characteristic information is constructed, the characteristic that glass and a mirror have strong similarity is utilized, meanwhile, the data sets of two objects of the glass and the mirror are used, the training data of the preset neural network model are greatly expanded, and the trained preset neural network model has better precision and generalization.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 2 is a schematic diagram of a model training apparatus according to some embodiments of the present application;

FIG. 3 is a schematic diagram of a structural framework of a neural network model according to some embodiments of the present disclosure;

FIG. 4 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 5 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 6 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 7 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 8 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 9 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 10 is a partial schematic diagram illustrating the prediction results of partial mirror class images in the training set of the model training method according to some embodiments of the present application;

FIG. 11 is a graphical illustration of the predicted outcome of a portion of the glass images in the training set of the model training method of some embodiments of the present application;

FIG. 12 is a schematic flow chart diagram of a model training method according to some embodiments of the present application;

FIG. 13 is a schematic flow chart of a detection method according to certain embodiments of the present application;

FIG. 14 is a schematic structural view of a detection device according to certain embodiments of the present application;

FIG. 15 is a schematic structural view of a robot according to certain embodiments of the present application;

FIG. 16 is a schematic diagram of a computer-readable storage medium according to some embodiments of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present application.

In the description of the present application, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features described. In the description of the present application, "a plurality" means two or more unless specifically defined otherwise.

The present application may repeat reference numerals and/or letters in the various examples, which have been repeated for purposes of simplicity and clarity and do not in themselves dictate a relationship between the various embodiments and/or arrangements discussed.

The current application and the work of detecting the mirror glass object mainly apply the object detection to the aspects of display, medical treatment, industrial production and the like. For example, reflection and reflection detection are applied in the field of image display, and mirror detection can also be applied in the field of electronic device display, for automatically reversing the display image left and right to ensure the correctness of the device display content when detecting that the user views the device display content through the mirror reflection. In addition, the light reflection region may be removed by a method such as an optical flow from a plurality of frames of images acquired by a camera. Both of the above two ways are processing of the specular reflection and the high reflection areas to ensure the quality of the displayed image. When the mirror glass object detection is applied to the medical field, the detection and enhancement of the light reflection region of the medical image are required.

In industrial production, there is also a lot of work applied to the detection and re-identification of the reflective area. Most mirror glass object detection methods focus on defect detection and quality assessment of the reflecting plane. For example, the defect detection of the refrigerator surface is performed by processing the light reflection region by adopting a mode of generating a countermeasure network. As another example, the conditional restriction of high reflectance, also generated by means of high brightness, is used for quality detection of the produced product, but the semantic features are ignored. As another example, detection is accomplished with a new type of sensor (e.g., with special structured light), the primary purpose of which is defect stain detection. However, this detection method cannot satisfy the input conditions of the detection method of the present application, and cannot complete the detection and positioning of the positions to which the mirror surface and the glass surface of the entire image belong. Different from all the methods applied to the industry mentioned previously, the detection completed by using the new type of sensor detects the position and size of the transparent object (glass) in a breakthrough manner, but the core method is the detection of the glass size completed by combining various mechanical structures, and the technical effect of multi-level detection and positioning of glass and mirror objects in real time by only inputting RGB images cannot be realized.

In the field of computer vision, methods for detecting glass mirror-like objects by using a deep learning method mainly comprise various methods. However, although most detection methods use deep learning methods to detect glass, the detection target is limited to glassware of the category of experimental equipment, and the detection range of the method is wider, and the method not only includes large-area glass windows and wall screens, but also includes small-sized glass products such as glass cups and the like, and the traditional detection method can only detect objects at image level, and has lower precision. Some detection methods perform the detection task of the mirror through a multi-task method, but are limited by a data set and a neural network with a remarkable huge amount, and the good generalization degree and lightweight property of the model are lacked, so that real-time inference and large-scale production cannot be performed. Other methods use a Transformer model to construct a task for glass segmentation, and the Transformer model has low generalization and universality.

In view of the above, referring to fig. 1, the present application provides a model training method for detecting a mirror object, the model training method comprising:

01: acquiring a training image, wherein the training image comprises a first image with a first mirror-like area and a second image with a second mirror-like area;

03: constructing a preset neural network model;

05: alternately training a preset neural network model by using the first image and the second image;

07: and obtaining a trained mirror surface object detection model under the condition that the preset neural network model meets the test conditions.

Referring to fig. 2, the present application further provides a model training device 10. The model training device 10 comprises a training image acquisition module 11, a preset model construction module 13, a training module 15 and a model output module 17.

Step 01 may be implemented by the training image obtaining module 11, step 03 may be implemented by the preset model building module 13, step 05 may be implemented by the training module 15, and step 07 may be implemented by the model output module 17.

That is, the training image acquiring module 11 is configured to acquire a training image, where the training image includes a first image with a first mirror-like area and a second image with a second mirror-like area; the preset model building module 13 is used for building a preset neural network model; the training module 15 is used for alternately training the preset neural network model by using the first image and the second image; the model output module 17 is configured to obtain a trained mirror surface object detection model under the condition that the preset neural network model meets the test condition.

Specifically, the first type mirror surface region may be a glass type mirror surface region, the second type mirror surface region may be a mirror type mirror surface region, and correspondingly, the first image having the first type mirror surface region refers to an image including glass. The second image having a second type of mirror surface area refers to an image containing a mirror. That is, the model training method of the present application first acquires two images including a mirror and glass.

Then, a preset neural network model is constructed. The preset neural network model comprises a feature extraction module and a glass and mirror recognition module. The first image and the second image can be firstly input into a feature extraction module of a preset neural network model, and then the output result of the feature extraction module is input into a recognition module of the glass and the mirror, and the output result of the recognition module is a segmentation result of the glass and the mirror.

Then, the preset neural network model is alternately trained by using the first image and the second image. So, utilized the similar department of glass and mirror, come the study through using two kinds of data sets of glass and mirror simultaneously, when having compensatied a kind of object of individual recognition, predetermine the not enough problem of generalization of neural network model, realized both having promoted the precision that detects predetermine the neural network model, strengthened the function of predetermineeing the neural network model again.

And finally, obtaining a trained mirror surface object detection model under the condition that the preset neural network model meets the test conditions.

The test condition refers to that the obtained training image is divided into a training set and a verification set, the verification set is applied to verify a preset neural network model trained by the training set to obtain a verification result, and under the condition that the conformity of the obtained verification result and an expected result reaches a preset threshold value, namely the accuracy of identifying the mirror surface object by using the preset neural network model reaches the preset threshold value, the preset neural network model meets the test condition, the preset neural network model is trained, and the trained mirror surface object detection model is obtained. The preset threshold may be, for example, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, and is not limited herein. The structural framework of the constructed preset neural network model of the application is shown in fig. 3.

Therefore, the model training method, the detection method, the robot and the storage medium for detecting the mirror surface object greatly expand the training data of the neural network model by constructing the preset neural network model sharing the object characteristic information and utilizing the characteristic that glass and a mirror have strong similarity, and meanwhile, the data sets of the two objects of the glass and the mirror greatly expand the training data of the neural network model, so that the trained neural network model has better precision and generalization.

Referring to fig. 4, step 01 includes:

011: acquiring images to be processed, which are respectively provided with a first mirror surface object and a second mirror surface object, in a scene through a camera;

012: classifying the image to be processed to obtain a first image and a second image;

013: and respectively obtaining mask images corresponding to the first image and the second image according to the labeling information of the image to be processed.

Referring to fig. 2, step 011, step 012 and step 013 can be implemented by the training image obtaining module 11. That is, the training image obtaining module 11 is configured to collect, by using a camera, to-be-processed images of a scene respectively having a first mirror surface object and a second mirror surface object; classifying the image to be processed to obtain a first image and a second image; and respectively obtaining mask images corresponding to the first image and the second image according to the labeling information of the image to be processed.

Specifically, the first mirror surface object is a mirror and the second mirror surface object is glass, or the first mirror surface object is glass and the second mirror surface object is a mirror. The first mirror surface object and the second mirror surface object may also be other two mirror surface objects with higher similarity but different types, and are not limited herein.

The specific steps of acquiring the training images can be that images to be processed with glass and mirrors in scenes are collected indoors and outdoors through an RGB camera, and the images to be processed are divided into two types of images, namely a first image with glass and a second image with mirrors. Then, at least 3000 images of each type of image can be collected under different scenes to construct a data set, and the obtained data set is divided into a training set and a verification set.

Then, the image data sets need to be labeled, specifically, the images in the two data sets can be manually labeled, and a mask image is labeled on the glass or mirror part in the image according to the labeling information of the image to be processed, that is, the mask images corresponding to the first image and the second image can be obtained respectively.

A mask is a bitmap that can select which pixels allow processing and which pixels do not. The image may be divided into regions of interest and regions of non-interest by a masking process. Where the regions of interest are white, indicating that the pixels of the regions of interest are all non-0, and the non-regions of interest are all black, indicating that the pixels of those regions are all 0. It can be understood that after the and operation is performed on the original image and the mask image, the obtained result image is the image that only the region of interest of the original image is left. That is, the accuracy of the trained preset neural network model is higher, and the segmentation of the interest region at the pixel level can be achieved accurately.

The computer vision task of image annotation can include tasks such as object detection line, edge detection, segmentation attitude prediction, key point identification and image classification, and correspondingly, the annotation information of the application includes information such as object detection line annotation information, edge annotation information, segmentation attitude prediction information, key point information and image classification information.

Therefore, the mask image corresponding to the first image and the second image can be obtained through the steps, and a foundation can be laid for calculating the error of the preset neural network model and optimizing the preset neural network model according to the error.

Referring to fig. 5, step 05 includes:

051: processing the first image or the second image through a preset neural network model to obtain a predicted image;

052: calculating the error of the preset neural network model according to the mask image and the predicted image;

053: and optimizing a preset neural network model according to the error.

Please refer to fig. 2,

steps

051, 052 and 053 may be implemented by the training module 15. That is, the training module 15 is configured to process the first image or the second image through the preset neural network model to obtain a predicted image; calculating the error of the preset neural network model according to the mask image and the predicted image; and optimizing a preset neural network model according to the error.

Specifically, the first image or the second image is processed through a preset neural network model to obtain a predicted image, namely, the image of the labeled glass and mirror data set is used as a model input, and the labeled mask image is used as a target result of the model output, so that the error of the preset neural network model can be calculated.

The error of the preset neural network model may refer to a pixel deviation between the prediction image and the mask image. Specifically, with the mask image as a standard, if a pixel region in the predicted image that does not match the mask image is an error region, for example, if a pixel portion in the predicted image that is a region of interest is 0, that is, a partial missing occurs, or if a pixel in an originally non-perceptual region is not 0, it can be considered that a pixel deviation occurs between the predicted image and the mask image. Therefore, after the pixel comparison between the predicted image and the mask image, the specific pixel deviation data between the predicted image and the mask image can be calculated, so as to calculate the error of the preset neural network model.

Then, the preset neural network model can be optimized according to the error to obtain the preset neural network model with higher precision, so that the result of identifying the glass or mirror in the image through the preset neural network model is more accurate.

Therefore, the preset neural network model in the model training method of the application simultaneously uses the data sets of two objects of glass and mirror, the training data of the preset neural network model is greatly expanded, the preset neural network model has better precision and generalization, and the robot can effectively recognize and position the glass and the mirror in a complex scene, so that collision is avoided.

Referring to fig. 6, step 051 includes:

0511: taking the first image or the second image as an input image of a preset neural network model;

0512: performing feature extraction on an input image to obtain a multi-level feature map;

0513: fusing the multi-level feature maps to obtain a multi-level fusion feature map;

0514: processing the multi-level fusion feature map and synthesizing a target feature map;

0515: predicting according to the target feature map and a first prediction module to obtain a first prediction result;

0516: and generating a prediction image according to the first prediction result.

Referring to fig. 2, step 0511, step 0512, step 0513, step 0514, step 0515 and step 0516 may be implemented by the training module 15. That is, the training module 15 is configured to use the first image or the second image as an input image of the preset neural network; performing feature extraction on an input image to obtain a multi-level feature map; fusing the multi-level feature maps to obtain a multi-level fusion feature map; processing the multi-level fusion feature map and synthesizing a target feature map; predicting according to the target feature map and a first prediction module to obtain a first prediction result; and generating a prediction image according to the first prediction result.

Specifically, an image of a glass or mirror data set is input into a preset neural network model as an input image, a convolutional neural network is adopted to perform feature extraction on the input image to obtain a multi-level feature map, and the feature map dimension of each level is reduced to 512 by using 1 × 1 convolution.

And then, fusing the multi-level feature maps to obtain feature maps fused in each level, namely the multi-level fusion feature maps. It can be understood that the deep feature maps contain more semantic information to facilitate classification, and the matching detection effect of the multi-level fusion feature maps using the multi-level convolution features is better than that of other feature maps with a single layer.

Then, the feature map after the fusion of each hierarchy is processed by utilizing 3 x 3 convolution, so that the aliasing effect can be eliminated. Among them, the aliasing effect is the frequency error that can be caused by "aliasing" when converting a signal from its true analog form to a digital form. That is, aliasing is the effect of spectral distortion of the sampled signal due to the sampling rate being too low to properly capture the correct frequency information.

And finally, the processed multi-level fusion feature maps are up-sampled to the same size and added to form a target feature map, the target feature map is input into a first prediction module for prediction, two semantic segmentation results of glass and mirror with a channel of 2 can be output, the segmentation result is a first prediction result, and a prediction image can be generated according to the first prediction result. Wherein, the first prediction module is a prediction head module.

Therefore, the model training method can obtain the first prediction result by performing a series of processing procedures of feature extraction processing, fusion processing, aliasing effect elimination processing and combination processing on the first image and the second image, and generate the predicted image, so that a more real prediction effect can be obtained.

That is, the model training method for detecting the mirror surface object realizes that only RGB images need to be input, realizes real-time multi-level detection and positioning of glass and mirror objects through an efficient deep learning method, and has better generalization capability and good reasoning speed and reasoning effect.

Meanwhile, the model training method utilizes the characteristic that glass and a mirror have strong similarity, enables the preset neural network to simultaneously cut glass and mirror objects, improves the cutting precision of the glass and the mirror, and enables the model generalization and the universality to be higher. Meanwhile, the model training method of the application uses the data sets of the two objects of the glass and the mirror, so that the training data of the preset neural network model are greatly expanded, and the trained preset neural network model has better precision and generalization.

Referring to fig. 7, step 0512 includes:

05121: and performing feature extraction on the input image by using ResNeXt to obtain a multi-level feature map.

Referring to fig. 2, step 03121 may be implemented by training module 15. That is, the training module 15 is configured to perform feature extraction on the input image by using resenext to obtain a multi-level feature map.

It will be appreciated that the network structure of resenext is simpler and may prevent overfitting for a particular data set. The customization and modification of ResNeXt is also simpler.

Specifically, the feature extraction may extract features such as edges, corners, and the like in the input image. The purpose of feature extraction is to examine each pixel to determine whether the pixel represents a feature.

Therefore, the model training method utilizes ResNeXt to perform feature extraction on the glass and mirror images to obtain the multilayer feature map, and the mode for obtaining the multilayer feature map is simpler and more efficient.

Referring to fig. 8, step 0513 includes:

05131: sequentially upsampling each level of characteristic diagram to obtain a plurality of upsampled characteristic diagrams;

05132: and fusing the up-sampling feature map corresponding to each level of feature map with the feature map of the previous level to obtain a fused feature map.

Referring to fig. 2,

steps

05131 and 05132 may be implemented by training module 15. That is, the training module 15 is configured to sequentially up-sample each level of feature map to obtain a plurality of up-sampled feature maps; and fusing the up-sampling feature map corresponding to each level of feature map with the feature map of the previous level to obtain a fused feature map.

Specifically, the upsampled feature maps of each level may be upsampled twice layer by layer to obtain a plurality of upsampled feature maps, and the upsampled feature maps corresponding to the feature maps of each level are fused with the feature map of the previous layer and added to obtain a fused feature map. The up-sampling amplification factor can be set according to the user requirement, and is not limited herein.

It will be understood that upsampling is the re-sampling of a digital signal, the re-sampling having a sampling rate that is greater than the sampling rate at which the digital signal was originally obtained (e.g., sampled from an analog signal), and is referred to as upsampling.

Therefore, according to the model training method, the multiple up-sampling characteristic graphs are obtained after up-sampling and twice-amplifying are carried out on the characteristic graphs of each level layer by layer, so that the resolution of each level characteristic graph is higher, and the fused characteristic graph obtained by fusion is more real.

Referring to fig. 9, step 05132 includes:

051321: predicting according to the up-sampling feature map of the current level and a second prediction module to obtain a second prediction result;

051322: processing the second prediction result to obtain an attention diagram;

051323: multiplying the attention diagram with the feature diagram of the previous level to obtain an intermediate feature diagram;

051324: and adding the intermediate feature map and the up-sampling feature map of the current level to obtain a fused feature map.

Referring to fig. 2,

steps

051321, 051322, 051323, and 051324 may be implemented by training module 15. That is, the training module 15 is configured to predict according to the upsampling feature map of the current level and the second prediction module to obtain a second prediction result; processing the second prediction result to obtain an attention diagram; multiplying the attention diagram with the feature diagram of the previous level to obtain an intermediate feature diagram; and adding the intermediate feature map and the up-sampling feature map of the current level to obtain a fused feature map.

Specifically, referring to fig. 3, first, the upsampled feature map of the current level is input to the second prediction module for prediction alone to obtain a second prediction result, and the second prediction result is used as an auxiliary output and a calculation error of the label mask. The second prediction module is a new SPhead module, namely an auxiliary prediction head module. The second predictor may be a partial predictor as shown in fig. 10 and 11. As can be seen from fig. 10 and 11, the second prediction result has higher accuracy of edge recognition of the glass or mirror than the first prediction result, and the image is more accurate.

That is, the target characteristic diagram of the present application is predicted twice by the first prediction module and the second prediction module, so that the recognition accuracy of the glass or the mirror can be further ensured, and the robot can be effectively enabled to recognize and position the glass and the mirror in a complex scene, thereby avoiding collision.

Then, the prediction result can be processed by a Sigmoid function to obtain a probability map with a value range of 0 to 1, and the probability map is subtracted from 1 to obtain an attention map.

And multiplying the attention diagram by the feature diagram of the previous layer to obtain an intermediate feature diagram, and then adding the intermediate feature diagram and the feature diagram of the current layer to obtain a fused feature diagram.

Therefore, the obtained fusion characteristic diagram can further ensure the identification precision of the glass or the mirror, and the robot can effectively identify and position the glass and the mirror in a complex scene, so that collision is avoided.

In an embodiment of the present application, step 032: in calculating the error of the preset neural network from the mask image and the predicted image, the calculation is performed by the following loss function:

L＝ω_predL_pred+ω_auxL_aux

wherein L is_predAnd L_auxLov-sz Hinge error, ω, for the first prediction module and the second prediction module, respectively_predAnd omega_auxThe first prediction module and the second prediction module.

Thus, the loss function can be calculated through the formula, so that the error between the mask image and the predicted image is obtained, and whether the mask image reaches the standard or not can be judged.

Referring to fig. 12, step 0515 includes:

05151: and predicting the target characteristic diagram by using an SPhead module to obtain a first prediction result.

Referring to fig. 2, step 05151 may be implemented by training module 15. That is, the training module 15 is configured to predict the target feature map by using the SPHead module to obtain a first prediction result.

Specifically, the first prediction results obtained may be partial prediction results as shown in fig. 10 and fig. 11.

Referring to fig. 13, the present application further provides a detection method using the mirror surface object detection model. The mirror surface object detection model is obtained by training according to the model training method, and the detection method comprises the following steps:

02: acquiring an image to be detected;

04: processing an image to be detected by using a mirror surface object detection model to obtain a detection result;

06: and generating a detection image according to the detection result.

Referring to fig. 14, the present application further provides a detection apparatus 20 using a mirror object detection model, wherein the detection apparatus 20 includes a detection image obtaining module 22, a processing module 24, and an image generating module 26.

Step 02 may be implemented by the inspection image acquisition module 22, step 04 may be implemented by the acquisition module 24, and step 06 may be implemented by the image generation module 26. That is, the detection image obtaining module 22 is used for obtaining an image to be detected; the acquisition module 24 is configured to process the image to be detected by using the mirror surface object detection model to obtain a detection result; the image generation module 26 is used for generating a detection image according to the detection result.

It can be understood that, since the detection method of the present application employs the mirror surface object detection model obtained by the model training method described above, the accuracy of identifying the glass or mirror can be further ensured.

Specifically, first, the acquired image to be detected may be an image to be detected with a mirror or glass collected by a camera or a sensor of the robot.

And then, processing the image to be detected by using the trained mirror surface object detection model to obtain a detection result.

Finally, a detection image similar to the arrangement of the black area and the white area in fig. 10 or fig. 11 is generated according to the detection result, wherein the white area represents the area where the mirror or the glass exists, and the black area represents the area of the physical building beside the mirror or the glass area.

So, the robot can carry out accurate obstacle avoidance according to this detection image, can effectively let the robot discern and fix a position glass and mirror in complicated scene to avoid the collision.

The mirror surface object detection model in the detection method by using the mirror surface object detection model has the characteristics that glass and a mirror have strong similarity, and simultaneously uses data sets of two types of objects of glass and the mirror, so that the training data of the mirror surface object detection model is greatly expanded, and the trained mirror surface object detection model has better precision and generalization.

Referring to fig. 15, the present application further provides a robot 100, wherein the robot 100 comprises a computer program 110 and a processor 120. The detection method described in the above embodiments can be implemented when the processor 120 executes the computer program 110.

The robot 100 of the present application can be an intelligent robot such as a service robot, an underwater robot, an entertainment robot, a military robot, an agricultural robot, etc., wherein the service robot has a wide application range, mainly a robot that is engaged in the work of maintenance, repair, transportation, cleaning, security, rescue, monitoring, etc.

When the robot 100 of the application realizes the detection method, the preset neural network model sharing object characteristic information is constructed, the characteristic that glass and mirror have strong similarity is utilized, and the data sets of two types of objects of glass and mirror are used simultaneously, so that the training data of the preset neural network model are greatly expanded, and the trained preset neural network model has better precision and generalization.

Referring to fig. 16, the present application further provides a computer readable storage medium 200, the computer readable storage medium 200 comprising a computer program 210 and a processor 220. The computer program 210, when executed by the one or more processors 220, implements the detection method of any of the above embodiments. For example, the computer program 210, when executed by the processor 220, implements the steps of the following detection method:

02: acquiring an image to be detected;

06: and generating a detection image according to the detection result.

The computer-readable storage medium 200 of the application utilizes the characteristic that glass and mirrors have strong similarity by constructing the preset neural network model sharing object characteristic information, and simultaneously uses data sets of two objects of the glass and the mirrors, so that training data of the preset neural network model are greatly expanded, and the trained preset neural network model has better precision and generalization.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A model training method for detecting a mirror surface object is characterized by comprising the following steps:

acquiring a training image, wherein the training image comprises a first image with a first mirror-like area and a second image with a second mirror-like area;

constructing a preset neural network model;

alternately training the preset neural network model by using the first image and the second image;

and obtaining a trained mirror surface object detection model under the condition that the preset neural network model meets the test conditions.

2. The model training method of claim 1, wherein the obtaining training images comprises:

acquiring images to be processed, which are respectively provided with a first mirror surface object and a second mirror surface object, in a scene through a camera;

classifying the image to be processed to obtain the first image and the second image;

and respectively obtaining mask images corresponding to the first image and the second image according to the labeling information of the image to be processed.

3. The model training method of claim 2, wherein the alternately training the preset neural network model using the first image and the second image comprises:

processing the first image or the second image through the preset neural network model to obtain a predicted image;

calculating the error of the preset neural network model according to the mask image and the predicted image;

and optimizing the preset neural network model according to the error.

4. The model training method according to claim 3, wherein the processing the first image or the second image through the preset neural network model to obtain a predicted image comprises:

taking the first image or the second image as an input image of the preset neural network model;

performing feature extraction on the input image to obtain a multi-level feature map;

fusing the multi-level feature maps to obtain a multi-level fused feature map;

processing the multi-level fusion feature map and synthesizing a target feature map;

predicting according to the target feature map and a first prediction module to obtain a first prediction result;

and generating the predicted image according to the first prediction result.

5. The model training method according to claim 4, wherein the performing feature extraction on the input image to obtain a multi-level feature map comprises:

and performing feature extraction on the input image by using ResNeXt to obtain a multi-level feature map.

6. The model training method according to claim 4, wherein the fusing the multi-level feature maps to obtain a multi-level fused feature map comprises:

sequentially up-sampling the characteristic diagrams of each level to obtain a plurality of up-sampled characteristic diagrams;

and fusing the up-sampling feature map corresponding to the feature map of each level with the feature map of the previous level to obtain the fused feature map.

7. The model training method according to claim 6, wherein the fusing the upsampled feature map corresponding to the feature map of each level with the feature map of the previous level to obtain the fused feature map comprises:

predicting according to the up-sampling feature map of the current level and a second prediction module to obtain a second prediction result;

processing the second prediction result to obtain an attention diagram;

multiplying the attention diagram with the feature diagram of the previous level to obtain an intermediate feature diagram;

and adding the intermediate feature map and the up-sampling feature map of the current level to obtain the fused feature map.

8. The model training method according to claim 7, wherein the step of calculating the error of the preset neural network model from the mask image and the prediction image is calculated by the following loss function:

L＝ω_predL_pred+ω_auxL_aux

wherein L is_predAnd L_auxLov-sz Hinge error, ω -l, for the first and second prediction modules, respectively_predAnd omega_auxIs the first stepAnd the weights of the error of the measurement module and the second prediction module.

9. The model training method of claim 4, wherein the predicting according to the target feature map and the first prediction module to obtain the first prediction result comprises:

and predicting the target characteristic diagram by using an SPhead module to obtain the first prediction result.

10. A detection method using a specular object detection model trained according to the model training method of any one of claims 1 to 9, the detection method comprising:

acquiring an image to be detected;

processing the image to be detected by using the mirror surface object detection model to obtain a detection result;

and generating a detection image according to the detection result.

11. A robot comprising a processor and a memory, said memory having stored thereon a computer program which, when executed by said processor, carries out the detection method of claim 10.

12. A non-transitory computer-readable storage medium containing a computer program, wherein the computer program, when executed by one or more processors, implements the detection method of claim 10.