CN117292199A

CN117292199A - Segment bolt identification and positioning method for lightweight YOLOV7

Info

Publication number: CN117292199A
Application number: CN202311308300.1A
Authority: CN
Inventors: 肖艳秋; 贺振东; 王一鸣; 刘洁; 莫海川; 王鹏鹏; 崔光珍; 孙春亚; 黄荣杰
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2023-10-10
Filing date: 2023-10-10
Publication date: 2023-12-26

Abstract

The invention discloses a segment bolt identification and positioning method of lightweight YOLOV7, which comprises the steps of constructing a lightweight YOLOV7 model, and carrying out target identification detection on a segment inner groove bolt to be spliced; the lightweight YOLOV7 model is adopted as a basis, the backbone network is replaced by mobilenet v3, and a mixed attention mechanism is introduced; combining orb feature matching to generate a measurement target point, and finding out the pixel position of the bolt on the target duct piece; and acquiring the position of the target bolt by using an industrial camera, and acquiring the position coordinate of the bolt according to the relation between the world coordinate system and the calibration camera coordinate. According to the invention, the backbone network is replaced by the MobileNet V3, so that the detection speed of the network is increased; the mixed attention mechanism is introduced, so that the attention degree of the target bolt is increased; YOLOV7 adds a new feature fusion layer on a shallow basis to preserve feature information to the maximum.

Description

Segment bolt identification and positioning method for lightweight YOLOV7

Technical Field

The invention relates to a segment automatic assembling technology, in particular to a segment bolt identification and positioning method of a lightweight YOLOV 7.

Background

With the continuous development of industrial automation, segment assembly is an important infrastructure construction link, and is important for improving production efficiency and quality. At present, the system plays a key role in the construction of urban highways, railways and bridges. In the segment assembling process, the identification and the positioning of the segment bolts are a key task, and the accurate identification and the positioning of the segment bolts can help workers to quickly and accurately perform assembly work, so that the installation efficiency and the installation quality are improved. However, the conventional manual identification method not only requires a lot of manpower input, is time-consuming and labor-consuming, but also is susceptible to human error.

In recent years, rapid developments in computer vision and deep learning techniques have provided new solutions for automated segment bolt identification and positioning. Among them, the object detection technology is widely applied to object recognition and positioning tasks in industrial scenes. Particularly, a target detection method based on deep learning, such as RCNN (RCNN) and YOLO (You Only Look Once) series models, becomes a hot spot for research and application due to the rapid detection speed and high accuracy.

Accurate positioning of the duct piece bolts is a key for realizing automatic duct piece assembly. Wada proposes a method that combines six degrees of freedom of rotation, heave, slip, pitch, yaw and roll with a laser sensor to improve the efficiency of measuring the segment. Wang et al designed a novel hydraulic system based on electrohydraulic proportional control technology, established a kinematic and dynamic model, determined the displacement of each actuator, and finally calculated the position of the segment. The method requires higher precision of a control algorithm and a sensor, has weaker anti-interference capability and is not suitable for positioning the segment bolt under a complex background. Zhang et al, establishes a mathematical model of the moving object, performs real-time positioning detection on the object in the whole process, and provides real-time information for feeding back the position information of the object by utilizing the machine vision fusion position information characteristic. The target position detection method can be used for positioning the segment bolts.

With the continuous advancement of deep learning research in the field of vision, YOLO series and fast-Rcnn target recognition algorithms are emerging for target object detection. Based on the object detected by the deep learning, pixel information of the object on an imaging plane of the camera can be known. DU et al propose a real-time identification and detection method of rolling non-cooperative targets, combining the detected images with PNP algorithm to obtain 6D gestures of the targets, selecting segmented images by using depth information, and transmitting the relative positions of capture points to a manipulator. Finally, the physical experiment under different illumination conditions is completed on the six-degree-of-freedom air floating platform by using a rotating non-cooperative target; the method can be applied to the identification of the duct piece, and can measure the coordinates of the target object under the three-dimensional coordinate system by combining with other sensor information to finish pose measurement.

Detection of segment bolts is critical to the whole assembly process. Because the design of YOLOV7 is simple and effective, the candidate frame generation and screening process in the traditional two-stage detection method is avoided by converting the target detection task into the regression problem, and the calculation amount and complexity are reduced. The method uses a single neural network to simultaneously predict the position and the category of the target, and has better interpretability and practicability. However, the current identification of bolts in a pipe sheet has the following two key problems: 1) The background is complex, and the data set is scarce; 2) The model training speed is slower, and the accuracy is lower.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a light-weight YOLOV7 segment bolt identification and positioning method.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:

a method for identifying and positioning a lightweight YOLOV7 segment bolt comprises the following steps:

(1) Constructing a lightweight YOLOV7 model, and carrying out target identification detection on bolts in the inner grooves of the segments to be spliced; adopting a lightweight YOLOV7 model as a basic target detector, replacing a backbone network with MobileNetV3, and introducing a mixed attention mechanism;

(2) Generating a measurement target point by combining orb feature matching through an improved lightweight YOLOV7 algorithm, and finding out the pixel position of the bolt on the target duct piece; and then, acquiring the position of the target bolt by using an industrial camera, and acquiring the accurate position coordinate of the bolt in the three-dimensional world by combining the depth information and the pixel information acquired by the two.

Further, in step (1), the improved lightweight YOLOV7 network structure includes an input layer, a backbone network, a neck network, and a regressive output layer, and the backbone network includes an improved MobileNetV3 module and an SPPCSPC module.

Further, the improved mobilenet v3 network comprises a back-residual structure, a mixed attention module CBAM, an intermediate extension module, an up-sampling module, a perceptron structure, and a hole convolution.

Further, the mixed attention module comprises a channel attention module and a space attention module, and the channel attention module and the space attention module respectively carry out channel attention and space attention; the channel information of the feature map is then aggregated by two pooling operations, which are then concatenated and convolved by a standard convolution layer to generate an attention map, with the two modules being in series.

Further, the inverse residual structure includes a 1x1 convolution kernel for channel number expansion, followed by a dw convolution kernel, and finally a 1x1 convolution kernel for channel reduction.

Further, the improved multi-layer perceptron structure in the MobileNet V3 module is characterized in that a 5×5 convolution layer in the bottleneck module is added with a 1×1 convolution layer to serve as a full-connection layer of the perceptron, an h-swish activation function is introduced to form a perceptron to be embedded into a deep network, and a cavity convolution is introduced into the MobileNet V3 module.

Further, in the step (2), an industrial camera is mounted on the segment erector, and an industrial camera coordinate system O is established _A -X _A Y _A Z _A ，O _A Is a camera optical center point; and calibrating the internal and external parameters of the camera, acquiring a parameter matrix of the camera, and acquiring the accurate coordinates of the complete segment bolt in the three-dimensional world according to the relationship between the world coordinate system and the calibrated camera coordinates.

Further, the acquisition of the parameter matrix k form in the industrial camera is as follows:

wherein k is _x 、k _y Is the scale factor of the camera in the X-axis and Y-axis, (u) ₀ ，v ₀ ) The intersection point of the optical axis center line of the industrial camera and the imaging plane;

describing the relation between the bolt and the image point by using an internal reference matrix, and then the pixel coordinates (u, v) are as follows:

further, the process of calibrating the parameters outside the camera is expressed by the following formula:

P＝R*X+T

where P is the point coordinates in the camera coordinate system, R is the rotation matrix, X is the point coordinates in the world coordinate system, and T is the translation vector.

Compared with the prior art, the invention adopts a lightweight YOLOV7 model as a basic target detector, replaces a backbone network with mobilenet v3, and increases the detection speed of the network; the mixed attention mechanism is introduced, the attention degree of the target bolt is increased in the detection process by fusing the spatial attention and the channel attention, and the performance of the model in the segment bolt recognition and positioning task is improved; YOLOV7 adds a new feature fusion layer on a shallow basis to preserve feature information to the maximum.

Drawings

FIG. 1 is an improved lightweight YOLOV7 network;

FIG. 2 is a modified MobileNet V3 network;

fig. 3 is a mixed attention module CBAM.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.

The invention relates to a method for identifying and positioning a lightweight YOLOV7 segment bolt, which comprises the following steps:

(1) Constructing a lightweight YOLOV7 model, and carrying out target identification detection on bolts in the inner grooves of the segments to be spliced; the lightweight YOLOV7 model is adopted as a basic target detector, a backbone network is replaced by MobileNetV3, and a mixed attention mechanism is introduced to improve the performance of the model in segment bolt recognition and positioning tasks;

the improved lightweight YOLOV7 network overall network structure diagram is shown in fig. 1, firstly, the backbone network of YOLOV7 is replaced by mobilenet v3, the self-contained attention module SE of mobilenet v3 is replaced by a mixed attention module, and the attention degree of a target bolt is increased while the detection precision of the network is improved. Secondly, because the size of the target bolt is greatly changed in the assembling process, in order to enhance the characteristic fusion effect of the target, an additional pre-measuring head and the residual connection from a shallower backbone network are added so as to reserve the characteristic information to the maximum extent. Meanwhile, the low-level and high-resolution characteristic information is introduced into the characteristic fusion layer, so that the added prediction head is more sensitive to a long-distance segment bolt target. The four different scale features are more suitable for the scale change of the segment bolts.

The network structure of the improved YOLOV7 comprises four parts of an Input layer (Input), a Backbone network (Backbone), a Neck network (ck) and a regression output layer (Head). The improved Backbone network (Backbone) includes a MobileNetV3 module and an SPPCSPC module.

The SPPCSPC module adds parallel MaxPool operation for a plurality of times in a series of convolutions, so that the problems of image distortion and the like caused by image processing operation are avoided, and the problem that the convolutional neural network extracts the repeated characteristics of the picture is solved. SPP realizes feature extraction of the same feature map on different scales, and is beneficial to improving detection precision. The CSP reduces the calculated amount and improves the reasoning speed.

The backbone network also comprises a Focus module, wherein before the image is input into the backbone network, features of different layers are extracted from the image through depth convolution, and the feature image is downsampled through slicing operation in focusing, so that the original image information is reserved as much as possible.

The neck network adopts a structure of combining a feature pyramid network FPN and a path aggregation network PAN.

As shown in fig. 2, the modified MobileNetV3 network includes an inverted residual structure, a mixed attention module CBAM, an intermediate extension module, and an upsampling module. The MLP layer in the MLP in the network model is a perceptron structure constructed on the basis of 5×5 convolution, can flexibly add network layers, and the full-connection layer is composed of 1×1 convolution. Simultaneously, expansion convolution is introduced into the last two bottleneck modules of the MobileNet V3 to obtain receptive fields with different sizes, and more multi-scale information is obtained.

MobileNetV3 uses network structure search to automatically find the optimal network structure. By searching through a large scale of candidate network structures, an efficient network structure suitable for the mobile device can be found. The network structure obtained after searching can carry out migration learning on different tasks and data sets. The MobileNet V3 introduces an inverse residual structure to construct a depth separable convolution block, and the structure can increase the nonlinear transformation capability while keeping the model lightweight. The inverse residual structure includes a 1x1 convolution kernel for channel number expansion, followed by a dw (depthwise) convolution kernel, and finally by a 1x1 convolution kernel for channel reduction.

The mixed attention module CBAM, as shown in fig. 3, comprises 2 independent sub-modules, a channel attention module (Channel Attention Module, CAM) and a spatial attention module (Spartial Attention Module, SAM), which perform channel and spatial attention, respectively. The channel information of the feature map is then aggregated by two pooling operations, which are then concatenated and convolved by a standard convolution layer to generate an attention map, with the two modules being in series.

Given intermediate feature map F ε R ^C×H×W The CBAM derives a one-dimensional channel attention map M therefrom _c ∈R ^C×1×1 And two-dimensional spatial attention map M _s ∈R ^1×H×W 。

Wherein,representing element-based multiplication.

The improved orb characteristic matching algorithm can perform stable characteristic matching with scale invariance and rotation invariance, and a matching library is built by using bolt pictures of different angles, so that duct piece bolts of different angles and distances can be matched. An improved orb feature matching algorithm comprising the steps of:

1) Detecting characteristic points;

extracting characteristic points by using a FAST-9 algorithm, and defining moments of an image block in one image block of an intra-segment groove bolt as follows:

finding the centroid of the image block by moment:

a direction vector OC is constructed from the center O and the centroid O of the image block, and the feature point direction is defined:

θ＝atan2(m ₀₁ ,m ₁₀ )

2) Calculating a characteristic point calculation operator;

the method is realized by using a BRIEF algorithm, taking a neighborhood window of S by taking a characteristic point as a center, comparing the pixel values of two points by selecting the two points p (x) and p (y) on the window, and carrying out the following assignment:

3) Matching the characteristic points;

and finding out the corresponding relation of the feature points between different images, measuring the distances of descriptors for all the feature points in the two images, and then sequencing to obtain the nearest one as a matching point.

By installing the industrial camera on the segment erector, an industrial camera coordinate system O is established _A -X _A Y _A Z _A ，O _A Is the camera optical center point. And calibrating the internal and external parameters of the camera by using a Zhang's calibration method to obtain a parameter matrix of the camera. By combining the depth information and the pixel information obtained by the two, the accurate coordinates of the complete segment bolt in the three-dimensional world can be obtained according to the relation between the world coordinate system and the calibration camera coordinates, and the automatic assembly of the segment is completed.

(2.1) calibrating parameters in a camera;

the camera imaging principle is aperture imaging, and an object in the three-dimensional world is mapped into an inverted real image on an imaging plane provided with a photosensitive element through a camera aperture. The real image converts the optical signal into an electric signal through a photosensitive element, and a digital image is obtained through conversion and amplification processing. Obtaining an industrial camera internal parameter matrix k, which is in the form of:

wherein k is _x 、k _y Is the scale factor of the camera in the X-axis and Y-axis, (u) ₀ ，v ₀ ) Is the intersection point of the optical axis center line of the industrial camera and the imaging plane. For example, the bolt has a coordinate (x ₁ ，y ₁ ，z ₁ ) Describing the relation between the bolt and the image point by using an internal reference matrix, the pixel coordinates (u, v) are as follows:

(2.2) Camera external parameter calibration

The external parameter calibration of the camera refers to the process of determining the position and orientation of the camera in the world coordinate system. Through external parameter calibration, a camera coordinate system and a world coordinate system can be corresponding, so that the position of an object in the world coordinate system can be determined in an image. The external parameters of the camera are typically represented by a Rotation Matrix (Rotation Matrix) and translation vectors (Translation Vector). The rotation matrix describes a rotational relationship between the camera coordinate system and the world coordinate system, and the translation vector describes a translational relationship between the camera coordinate system and the world coordinate system.

The process of external parameter calibration can be expressed by the following formula:

P＝R*X+T

where P is the point coordinates in the camera coordinate system, R is the rotation matrix, X is the point coordinates in the world coordinate system, and T is the translation vector. The rotation matrix R and translation vector T can be solved by the known point coordinates X in the world coordinate system and the corresponding point coordinates P in the camera coordinate system.

(2.3) describing the positional relationship of the camera in the world coordinate system by using the internal and external parameter matrix of the camera as follows:

wherein R and t respectively represent the conversion relation between each axis of the world coordinate system and the origin and each axis of the coordinate system and the optical center relative to the origin of the camera optical center; m is M ₁ An internal parameter matrix called camera, M ₂ Referred to as the external parameter matrix of the camera.

In order to verify the effectiveness of the improved YOLOV7 algorithm, the segment bolt images under different backgrounds are selected as data sets, the data sets are composed of at least one segment inner groove bolt image, the segment assembly site environment is complex, the sample size is small, the data are expanded to 2000 samples by using geometric class transformation (overturning, rotating, cutting, deforming and scaling), labeling is carried out one by one, and the data sets are created.

The training set is sent into a neural network, 4 samples are fed into each batch, the accuracy rate greatly fluctuates during iteration for 20 times, the accuracy rate is about 0.8 along with the reduction of a loss value, and the over-fitting phenomenon occurs during 100 iterations; training the neural network added with the mixed attention mechanism by using the same data set, and generating the same overfitting phenomenon; the data is expanded to 2000 sample sizes by utilizing data enhancement, and a mixed attention mechanism is added for testing.

Although the training is slightly slow after the attention mechanism is added in the training process, the training effect after the attention mechanism is added is better than the previous result, and for the same segment image, when the neural network predicts that the neural network is a 'backup segment', the attention degree of the network to the key region before and after the attention mechanism is added is observed. When no attention mechanism is added, the neural network focuses not only on the region in part of the "target segment" but also on the extraneous region in some images when identifying the "backup segment", so an attention mechanism is added to improve this problem.

The industrial camera is installed on the duct piece splicing machine, and the duct piece splicing machine lifting mechanism consists of a left lifting hydraulic cylinder and a right lifting hydraulic cylinder, so that the industrial camera is installed and fixed at the lower end of the right lifting hydraulic cylinder; the optical center axis of the industrial camera is parallel to the axis of the lifting hydraulic cylinder, so that the industrial camera can rotate and lift along with the rotating mechanism and the right lifting hydraulic cylinder; establishing an industrial camera coordinate system O _A -X _A Y _A Z _A ，O _A Is the industrial camera optical center point, Y _A The axis being parallel to the axis of the translation mechanism, Z _A The axis being parallel to the axis of the lifting mechanism, X _A The axis can be derived from the left hand rule; and acquiring bolt information at different positions and angles by using an industrial camera.

The positions of the segment bolts are measured from different angles and distances, and the measurement results of the positions of the segment bolts are shown in a table 1.

TABLE 1 numerical table of actual errors of segment bolts

Sequence number	Actual value of coordinates/mm	Coordinate measurement/mm	Error per mm of axis
				1	(-180,360,423)	(-181.215,359.303,421.656)	(1.215,0.697,1.344)
2	(300,200,700)	(299.426,199.156,701.223)	(0.574,0.844,1.223)
				3	(400,326,532)	(400.568,325.344,533.662)	(0.568,0.656,1.662)
4	(500,486,320)	(500.582,487.102,320.633)	(0.582,1.102,0.633)
				5	(-200,100,500)	(-200.326,101.546,500.692)	(0.326,1.546,0.692)
6	(350,230,450)	(350.852,231.521,450.237)	(0.852,1.521,0.273)

As can be seen from the above measurement results, the measured values have an error of not more than 3mm in the 3 axes. Therefore, the measuring method can meet the requirement of segment grabbing.

Compared with the prior art, the lightweight YOLOV7 model is designed and optimized, so that the calculation efficiency and the real-time performance of the model are improved; the backbone network is replaced by the MobileNet V3, so that the detection speed of the network is increased; in addition, a mixed attention mechanism is introduced, and the recognition and positioning capability of the model to the key segment bolts is enhanced by fusing spatial attention and channel attention. The invention collects and constructs a segment bolt data set for model training and evaluation; the recognition accuracy reaches 96% based on the improved YOLOV7 algorithm; the maximum error of the position coordinates of the segment bolts is not more than 3mm, and the industrial requirement is met.

While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims

1. The method for identifying and positioning the lightweight YOLOV7 duct piece bolt is characterized by comprising the following steps of:

2. The lightweight yoov 7 segment bolt identification and locating method of claim 1 wherein in step (1) the modified lightweight yoov 7 network structure comprises an input layer, a backbone network, a neck network, a regression output layer, the backbone network comprising a modified MobileNetV3 module and SPPCSPC module.

3. The method for identifying and positioning the segment bolts of the lightweight YOLOV7 according to claim 2, wherein the modified MobileNetV3 network comprises an inverted residual structure, a mixed attention module CBAM, an intermediate expansion module, an up-sampling module, a perceptron structure, and a hole convolution.

4. The method for recognizing and positioning a segment bolt of a lightweight yoov 7 of claim 3 wherein the hybrid attention module includes a channel attention module and a spatial attention module for channel and spatial attention, respectively; the channel information of the feature map is then aggregated by two pooling operations, which are then concatenated and convolved by a standard convolution layer to generate an attention map, with the two modules being in series.

5. A lightweight YOLOV7 segment bolt identification and location method according to claim 3 wherein the inverted residual structure comprises a 1x1 convolution kernel for channel number expansion followed by a dw convolution kernel and finally a 1x1 convolution kernel for channel reduction.

6. The method for recognizing and positioning the segment bolts of the lightweight YOLOV7 according to claim 3, wherein the improved multi-layer perceptron structure in the mobilenet v3 module is characterized in that a 5×5 convolution layer in the bottleneck module is added with a 1×1 convolution layer to serve as a full connection layer of the perceptron, an h-swish activation function is introduced to form a sensor which is embedded into a deep network, and a cavity convolution is introduced into the mobilenet v3 module.

7. The method for recognizing and positioning a segment bolt of a lightweight YOLOV7 according to claim 1, wherein in the step (2), an industrial camera is mounted on a segment erector to establish an industrial camera coordinate system O _A -X _A Y _A Z _A ，O _A Is a camera optical center point; and calibrating the internal and external parameters of the camera, acquiring a parameter matrix of the camera, and acquiring the accurate coordinates of the complete segment bolt in the three-dimensional world according to the relationship between the world coordinate system and the calibrated camera coordinates.

8. The method for identifying and positioning the segment bolts of the lightweight YOLOV7 according to claim 7, wherein the acquisition of the parameter matrix k in the industrial camera is in the form of:

9. the method for identifying and positioning the segment bolts of the lightweight YOLOV7 according to claim 7, wherein the process of calibrating the parameters outside the camera is represented by the following formula:

P＝R*X+T