CN116542881A

CN116542881A - Robot vision image processing method

Info

Publication number: CN116542881A
Application number: CN202310816532.1A
Authority: CN
Inventors: 焦文文; 何小英; 王旭; 余江浩; 司嘉怡; 彭虹铭; 张震宇; 徐家平; 王婧宁; 徐志洪; 何国瑞; 李坤; 袁苡恒
Original assignee: Chengdu College of University of Electronic Science and Technology of China
Current assignee: Chengdu College of University of Electronic Science and Technology of China
Priority date: 2023-07-05
Filing date: 2023-07-05
Publication date: 2023-08-04

Abstract

The invention discloses a robot vision image processing method, which relates to the technical field of image data processing and comprises the following steps: preprocessing two-dimensional or three-dimensional data obtained by scanning a robot vision sensor into gray images with dark background and bright objects to be detected, and taking the gray images as original images; performing binarization and noise reduction filtering treatment on the original image, and obtaining the contour information of each measured object through two-dimensional differential operation and double-threshold treatment; dividing each measured object image from the original image according to the contour information of each measured object; and extracting the feature vector of each measured object image through the transverse and longitudinal scanning feature fusion model so as to finish the robot vision image processing. The method discards irrelevant data such as background and the like, greatly reduces the operand of visual image feature extraction, fully acquires the transverse and longitudinal texture information of the measured object, does not involve convolution floating point multiplication operation with multiple layers, and has the characteristics of small operand, low hardware calculation force dependency, high accuracy and strong robustness.

Description

Robot vision image processing method

Technical Field

The invention relates to the technical field of image data processing, in particular to a robot vision image processing method.

Background

Because of the complexity of the image data itself, visual image processing techniques typically use a neural network with depth, characterized by the inclusion of a large number of convolutional layers, involving extensive floating-point multiplication operations. Even the YOLO model algorithm, known as a lightweight network, contains multiple convolution layers and pooling layers.

However, robots often need to be lightweight in hardware design and cannot be loaded with computer systems that are as powerful as workstations and servers. As such, it is often difficult for a robot to locally perform visual image processing, and it is often necessary to upload the acquired image data to a cloud end. However, with the spread of the application area of robots, robots have been applied in various directions such as mechanical fault detection, mountain geological detection, house quality inspection detection, agricultural monitoring, etc., and in order to ensure timeliness of processing, the directions almost need to be processed in local real time, so that a visual image processing method suitable for local operation of robots needs to be constructed, and has the advantages of small operation amount, low hardware calculation force dependency, high accuracy and strong robustness.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a robot vision image processing method which solves the problems that the existing image processing method comprises a plurality of convolution layers, involves huge floating point multiplication operation, is difficult to load on a robot hardware device and is not suitable for processing a robot local vision image.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

a robot vision image processing method, comprising the steps of:

s1, preprocessing two-dimensional or three-dimensional data obtained by scanning a robot vision sensor into gray images with dark background and bright objects to be tested, and taking the gray images as original images;

s2, binarizing and noise reduction filtering processing is carried out on the original image, and contour information of each measured object is obtained through two-dimensional differential operation and double-threshold processing;

s3, dividing each object image from the original image according to the contour information of each object;

and S4, extracting the feature vector of each measured object image through the transverse and longitudinal scanning feature fusion model so as to finish the robot vision image processing.

The beneficial effects of the invention are as follows: the invention takes the preprocessed gray level image as an object, and divides each measured object image from the original gray level image through the measured object contour information extraction, and carries out the transverse scanning feature extraction, the longitudinal scanning feature extraction and the feature fusion on the measured object image to obtain the measured object features, thereby realizing the visual image processing. Irrelevant data such as background is discarded, the operand of visual image feature extraction is greatly reduced, the transverse and longitudinal texture information of the measured object is fully obtained, convolution floating point multiplication operation with multiple layers is not involved, and the method has the characteristics of small operand, low hardware calculation force dependency, high accuracy and strong robustness.

Further, the step S2 is to perform binarization processing on the original image by the following formula:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for binarized image +.>Go->Values of column pixels, +.>Is the first->Row of linesValues of column pixels, +.>For maximum pixel value, +.>Is a first threshold.

Further, the step S2 is to perform noise reduction filtering processing on the binarized image by:

a1, constructionFilter window matrix of size, < >>Is a positive integer;

a2, carrying out noise reduction and filtering treatment on the binarized image according to the filtering window matrix.

Further, the A1 is constructed by the following formulaSize filter window matrix:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the filter window matrix->Go->Values of column elements, ">Is of circumference rate>For the line of the filter coefficients,for column filter coefficients, +.>For rank filtering cosine correlation, < >>Is a sine function +.>As cosine function +.>Is a natural constant.

Further, the A2 performs noise reduction and filtering processing on the binarized image according to the filter window matrix by the following formula:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the noise-reduced filtered image +.>Go->Values of column pixels, +.>For the filter window matrix->Go->Values of column elements, ">For binarized image +.>Go->Values of column pixels.

The beneficial effects of the above-mentioned further scheme are: in the process of extracting the outline information of each measured object, the original image is converted into a binary image, and the difference between the measured object and the background is pulled open; in the design of the filter window matrix, a two-dimensional element value operation formula with row-column correlation is created, wherein the row filter coefficient and the column filter coefficient in the formula are both exponential coefficients and proportional coefficients, and the window matrix can be flexibly adjusted so as to be suitable for filtering various noises obeying a normal statistical distribution rule; the expression of noise reduction and filtering adopts a two-dimensional window moving mode to filter the image, so that image impurities are removed, and the accuracy of contour recognition of the subsequent measured object is improved.

Further, the two-dimensional differential operation in S2 is:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the first edge image +>Go->Values of column pixels, +.>For the noise-reduced filtered image +.>Go->Values of column pixels, +.>For the noise-reduced filtered image +.>Go->Values of column pixels.

Further, the dual threshold processing in S2 includes the steps of:

b1, setting a second threshold value, and processing the first edge image by the following formula to obtain a second edge image:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the second edge image +.>Go->Values of column pixels, +.>Is a second threshold, ++>Is the maximum pixel value;

b2, setting a third threshold value smaller than the second threshold value, and processing the first edge image by the following formula to obtain a third edge image:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the third edge image +>Go->Values of column pixels, +.>Is a third threshold;

b3, judging whether the bright lines in the second edge image are continuous, if so, obtaining the contour information of each measured object, and if not, jumping to the step B4;

and B4, searching pixel points from the bright pixel points in the third edge image to enable the bright lines in the second edge image to be communicated, and obtaining continuous bright lines as contour information of each measured object.

The further method has the beneficial effects that: the binary image has only a minimum value of 0 and a maximum value, so that the difference operation can directly find the image contour, but since the difference operation in the discrete domain is equivalent to the difference operation in the continuous domain, the pixels around the contour are also affected by the derivative of the differential, and therefore, further processing is required to extract the real contour information. In the further processing, two different thresholds are set to binarize the first edge image having the thick edge information. Because in the binary image, the pixel point with the value of 0 is black, the pixel point with the value of the maximum value is bright, the image conversion is conservative due to the larger threshold value, some real edge points are removed, so that bright lines representing edges are discontinuous, and the bright pixel points can be found in the binary image obtained through smaller threshold value processing to ensure that the edge pixel points are continuous, and complete real contour information is reconstructed.

Further, the transversal and longitudinal scanning feature fusion model in S4 includes:

a sweep sub-model for according toAnd extracting a transverse scanning feature map of the measured object image through the first scanning window vector of the dimension:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,is a positive integer>For the sweep pattern +.>Go->Values of column pixels, +.>For the first scanning window vector +.>Value of element->For the object image +.>Go->Values of column pixels;

longitudinal scanning sub-model for according toAnd extracting a longitudinal scanning characteristic diagram of the measured object image through a second scanning window vector of the dimension:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,is a positive integer>For longitudinal scan feature map->Go->Values of column pixels, +.>For the second scanning window vector +.>Value of element->For the object image +.>Go->Values of column pixels;

the feature fusion sub-model is used for processing the transverse scanning feature map and the longitudinal scanning feature map of the measured object image to obtain feature vectors of the measured object image.

Further, the feature fusion submodel is a BP fully connected neural network, and the method for processing the transverse scanning feature map and the longitudinal scanning feature map of the measured object image to obtain the feature vector of the measured object image comprises the following steps:

c1, sequentially splicing each row of the transverse scanning feature map to form a transverse scanning feature vector;

c2, sequentially splicing each column of the longitudinal scanning feature images as a sub-vector to form a longitudinal scanning feature vector;

and C3, splicing the transverse scanning feature vector and the longitudinal scanning feature vector into a preprocessing feature vector, sending the preprocessing feature vector into an input layer of the BP full-connection neural network, and taking output data of the BP full-connection neural network as the feature vector of the measured object image.

The beneficial effects of the above-mentioned further scheme are: two different scanning window vectors are arranged to respectively carry out transverse scanning and longitudinal scanning on the image of the measured object, compared with a matrix convolution kernel convolution operation, the moving window multiplication mode of the vectors greatly reduces the number of element multiplication, and the two transverse scanning and the longitudinal scanning can obtain two-dimensional texture information of the measured object, and the extraction effect is no less than that of the depth multi-level convolution operation. Although the transverse scanning characteristic diagram and the longitudinal scanning characteristic diagram fully obtain two-dimensional texture information of a measured object, the element quantity is huge, compared with the prior art that characteristic data are compressed by using a plurality of pooling layers and convolution layers in a matching mode, the method provided by the invention has the characteristics of small operation quantity, high accuracy and strong robustness by performing rolling-free pooling operation processing on the BP full-connected neural network with the widest range and the most stable BP full-connected neural network, so that the measured object image characteristic vector adapting to the user requirement dimension is obtained, the operation quantity of visual image characteristic extraction is greatly reduced, and the problems of gradient disappearance, gradient explosion and the like are not faced like a multi-level deep network.

Drawings

Fig. 1 is a flowchart of a robot vision image processing method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

As shown in fig. 1, in one embodiment of the present invention, a robot vision image processing method includes the steps of:

s1, preprocessing two-dimensional or three-dimensional data obtained by scanning a robot vision sensor into gray-scale images with dark background and bright objects to be tested, and taking the gray-scale images as original images.

In the embodiment, the laser radar is adopted as a robot vision sensor, the bridge body is used as a measured object, whether the bridge body has cracks or not is detected, three-dimensional point cloud data are obtained, and gray level images are obtained after projection and gray level conversion.

S2, binarizing and noise reduction filtering processing is carried out on the original image, and contour information of each measured object is obtained through two-dimensional differential operation and double-threshold processing.

S2, performing binarization processing on the original image by the following formula:

，

S2, carrying out noise reduction and filtering processing on the binarized image by the following steps:

a1, constructionFilter window matrix of size, < >>Is a positive integer;

，

A2, according to the filter window matrix, denoising and filtering the binarized image by the following formula:

，

In the process of extracting the outline information of each measured object, the original image is converted into a binary image, and the difference between the measured object and the background is pulled open; in the design of the filter window matrix, a two-dimensional element value operation formula with row-column correlation is created, wherein the row filter coefficient and the column filter coefficient in the formula are both exponential coefficients and proportional coefficients, and the window matrix can be flexibly adjusted so as to be suitable for filtering various noises obeying a normal statistical distribution rule; the expression of noise reduction and filtering adopts a two-dimensional window moving mode to filter the image, so that image impurities are removed, and the accuracy of contour recognition of the subsequent measured object is improved.

The two-dimensional difference operation in S2 is:

，

The double thresholding in S2 includes the steps of:

，

and B3, judging whether the bright lines in the second edge image are continuous, if so, obtaining the contour information of each measured object, and if not, jumping to the step B4.

The binary image has only a minimum value of 0 and a maximum value, so that the difference operation can directly find the image contour, but since the difference operation in the discrete domain is equivalent to the difference operation in the continuous domain, the pixels around the contour are also affected by the derivative of the differential, and therefore, further processing is required to extract the real contour information. In the further processing, two different thresholds are set to binarize the first edge image having the thick edge information. Because in the binary image, the pixel point with the value of 0 is black, the pixel point with the value of the maximum value is bright, the image conversion is conservative due to the larger threshold value, some real edge points are removed, so that bright lines representing edges are discontinuous, and the bright pixel points can be found in the binary image obtained through smaller threshold value processing to ensure that the edge pixel points are continuous, and complete real contour information is reconstructed.

S3, dividing each object image from the original image according to the contour information of each object.

The transverse and longitudinal scanning characteristic fusion model in the S4 comprises the following steps:

，

The feature fusion sub-model is a BP full-connection neural network, and the method for processing the transverse scanning feature map and the longitudinal scanning feature map of the measured object image to obtain the feature vector of the measured object image comprises the following steps:

The invention sets two different scanning window vectors to respectively carry out transverse scanning and longitudinal scanning on the measured object image, compared with the convolution operation of a matrix type convolution kernel, the vector window-shifting multiplication mode greatly reduces the number of element multiplication, and the two scanning window vectors can obtain the two-dimensional texture information of the measured object, and the extraction effect is no more than that of the depth multi-level convolution operation. Although the transverse scanning characteristic diagram and the longitudinal scanning characteristic diagram fully obtain two-dimensional texture information of a measured object, the element quantity is huge, compared with the prior art that characteristic data are compressed by using a plurality of pooling layers and convolution layers in a matching mode, the method provided by the invention has the characteristics of small operation quantity, high accuracy and strong robustness by performing rolling-free pooling operation processing on the BP full-connected neural network with the widest range and the most stable BP full-connected neural network, so that the measured object image characteristic vector adapting to the user requirement dimension is obtained, the operation quantity of visual image characteristic extraction is greatly reduced, and the problems of gradient disappearance, gradient explosion and the like are not faced like a multi-level deep network.

The BP fully-connected neural network of this embodiment includes a three-layer feedforward network, whose output layer uses a sigmod function as an activation function, and the output layer has 400 neurons, that is, the feature vector of the measured object image is a 0, 1 vector of 400 dimensions. The feature vector is used as data obtained by processing a robot vision image and is input into an existing traditional classification model, so that bridge body crack detection can be realized.

In summary, the invention takes the preprocessed gray level image as the object, and the object contour information is extracted to segment each object image from the original gray level image, and the object image is subjected to horizontal scanning feature extraction, vertical scanning feature extraction and feature fusion to obtain the object features, thereby realizing the visual image processing. Irrelevant data such as background is discarded, the operand of visual image feature extraction is greatly reduced, the transverse and longitudinal texture information of the measured object is fully obtained, convolution floating point multiplication operation with multiple layers is not involved, and the method has the characteristics of small operand, low hardware calculation force dependency, high accuracy and strong robustness.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. A method for processing a robot vision image, comprising the steps of:

2. The robot vision image processing method according to claim 1, wherein the S2 binarizes the original image by:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for binarized image +.>Go->Values of column pixels, +.>Is the first->Go->Values of column pixels, +.>For maximum pixel value, +.>Is a first threshold.

3. The robot vision image processing method according to claim 2, wherein the S2 performs noise reduction filtering processing on the binarized image by:

a1, constructionFilter window matrix of size, < >>Is a positive integer;

4. A method of processing a robotic visual image as claimed in claim 3, wherein A1 is constructed bySize filter window matrix:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the filter window matrix->Go->Values of column elements, ">Is of circumference rate>For the line filter coefficients>For column filter coefficients, +.>For rank filtering cosine correlation, < >>Is a sine function +.>As cosine function +.>Is a natural constant.

5. The method according to claim 4, wherein the A2 performs noise reduction filtering processing on the binarized image according to a filter window matrix by:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,for the noise-reduced filtered image +.>Go->Values of column pixels, +.>Is the filter window matrixGo->Values of column elements, ">For binarized image +.>Go->Values of column pixels.

6. The method according to claim 5, wherein the two-dimensional difference operation in S2 is:

，

7. The robot vision image processing method of claim 6, wherein the double threshold processing in S2 includes the steps of:

，

8. The method according to claim 1, wherein the transverse and longitudinal scan feature fusion model in S4 includes:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,is a positive integer>For longitudinal scan feature map->Go->Values of column pixels, +.>For a second scanning windowVector->Value of element->For the object image +.>Go->Values of column pixels;

9. The method for processing the robot vision image according to claim 8, wherein the feature fusion sub-model is a BP fully connected neural network, and the method for processing the sweep feature map and the longitudinal scan feature map of the object image to obtain the feature vector of the object image comprises the following steps: