CN114820563A

CN114820563A - Industrial component size estimation method and system based on multi-view stereo vision

Info

Publication number: CN114820563A
Application number: CN202210535712.8A
Authority: CN
Inventors: 汪建基; 刘蒴; 蒋明吕; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-07-29

Abstract

The invention discloses an industrial part size estimation method and system based on multi-viewpoint stereo vision, which comprises the following processes: carrying out multi-view visual acquisition on the industrial component through a plurality of cameras with pre-calibrated internal parameters to obtain a plurality of multi-view industrial component images; selecting an image suitable for scale estimation from the plurality of multi-view industrial component images through an SfM algorithm by utilizing internal references of a plurality of cameras, calculating an image pose of the image suitable for scale estimation and generating a sparse point cloud; performing dense reconstruction by using the internal parameters of a plurality of cameras, the image suitable for scale estimation, the image pose and the sparse point cloud through an MVS algorithm of deep learning to obtain dense three-dimensional stereo vision of the industrial part; and performing scale estimation of the industrial component by a consistency check algorithm by using the dense three-dimensional stereo vision. The invention adopts a stereoscopic vision method to solve the problem of size estimation of industrial parts and can balance the contradiction between the estimation precision and the limitation of expensive equipment.

Description

Industrial component size estimation method and system based on multi-view stereo vision

Technical Field

The invention belongs to the field of industrial part size estimation, and particularly relates to an industrial part size estimation method and system based on multi-view stereo vision.

Background

The size estimation of industrial parts is an essential and important link in modern mass production. Traditional inspection is often carried out layer by the manual work and is detected, and is not only consuming time and hard, easily receives the influence of measurement personnel's subjective factor moreover, can not guarantee the efficiency and the precision of detection. Particularly, with the continuous improvement of the automation degree of the production process, more and more automatic product production lines and faster flows, the human visual inspection is more and more difficult to meet the requirements of the current industrial field, and therefore the efficiency and the precision of estimation are urgently needed to be improved by utilizing a machine vision method. However, most of the existing machine vision devices employ expensive hardware devices, such as depth cameras, multiple GPUs, and the like. The precision difference is huge sometimes aiming at different application occasions, so that the whole estimation method is unstable, long in period and high in cost, and meanwhile, the previous method for carrying out scale estimation by using machine vision still only carries out size estimation with low precision and large range. How to solve the contradiction between the size estimation of industrial parts and the precision of balance estimation and the limitation of expensive equipment is a problem to be solved in the field.

Disclosure of Invention

In order to solve the problems in the prior art, the invention aims to provide an industrial part size estimation method and system based on multi-view stereo vision.

The technical scheme adopted by the invention is as follows:

an industrial part size estimation method based on multi-view stereo vision comprises the following processes:

carrying out multi-view visual acquisition on the industrial component through a plurality of cameras with pre-calibrated internal parameters to obtain a plurality of multi-view industrial component images;

selecting an image suitable for scale estimation from the plurality of multi-view industrial component images through an SfM algorithm by utilizing internal references of a plurality of cameras, calculating an image pose of the image suitable for scale estimation and generating a sparse point cloud;

performing dense reconstruction by using the internal parameters of a plurality of cameras, the images suitable for scale estimation, the image poses and the sparse point cloud through an MVS algorithm of deep learning to obtain dense three-dimensional stereo vision of the industrial components;

and performing scale estimation of the industrial component by a consistency check algorithm by using the dense three-dimensional stereo vision.

Preferably, the process of acquiring a plurality of multi-view industrial component images by performing multi-view visual acquisition on the industrial component through a plurality of cameras with pre-calibrated internal parameters includes:

when the industrial component stably moves in the visual fields of a plurality of cameras with pre-calibrated internal parameters, multi-view industrial component images under various spatial combinations are acquired by utilizing a target detection algorithm and a data acquisition script.

Preferably, the SfM algorithm adopts an incremental SfM algorithm.

Preferably, the process of performing dense reconstruction by using the internal references of the plurality of cameras, the image suitable for scale estimation, the image pose and the sparse point cloud through an MVS algorithm of deep learning to obtain dense three-dimensional stereo vision of the industrial component includes:

processing the internal parameters of the cameras, the images suitable for scale estimation, the image poses and the sparse point cloud, extracting features in a plurality of resolution layering modes, promoting depth map estimation from a coarse mode to a fine mode, and performing point cloud fusion after estimating the depth map to obtain dense three-dimensional stereo vision of the industrial component.

Preferably, the process of performing the scale estimation of the industrial component by the consistency check algorithm using the dense three-dimensional stereo vision comprises:

the method comprises the steps of conducting down-sampling on the dense three-dimensional stereo vision, fitting an assembly plane in down-sampled point cloud based on a 3D RANSAC algorithm to obtain parameters of an assembly plane equation, transforming the assembly plane to a parallel XOY plane according to normal vectors in the parameters of the assembly plane equation to conduct point cloud alignment, detecting peripheral contour points of the XOY plane points, filtering noise outside the peripheral contour points, conducting parameter fitting on the peripheral contour points based on a two-dimensional RANSAC algorithm to obtain a scale proportion of the dense three-dimensional stereo vision and industrial components, and estimating the actual physical size of the industrial components according to the virtual size of the industrial components in the dense three-dimensional stereo vision to obtain the scale estimation of the industrial components.

Preferably, when calibrating the internal parameters of the plurality of cameras, calibrating the plurality of cameras according to the chessboard calibration version to obtain the internal parameters of each camera.

Preferably, the chessboard calibration plate adopts a chessboard pattern formed by alternate black and white rectangles.

Preferably, when the plurality of cameras are calibrated according to the chessboard calibration plate, the calibration plate is shot at different positions, different angles and different postures under the indoor good lighting condition, each camera shoots a plurality of photos, and finally the internal parameters of each camera are obtained according to the orthogonal and normalized constraints.

The invention also provides an industrial part size estimation system based on multi-viewpoint stereo vision, which comprises the following steps:

an image acquisition unit: adopting a plurality of cameras with pre-calibrated internal parameters for carrying out multi-view visual acquisition on the industrial component to obtain a plurality of multi-view industrial component images;

an SfM calculation unit: the system comprises a plurality of cameras, a scale estimation module and a scale estimation module, wherein the scale estimation module is used for selecting an image suitable for scale estimation from the plurality of multi-view industrial component images by utilizing internal references of the cameras through an SfM algorithm, calculating an image pose of the image suitable for scale estimation and generating a sparse point cloud;

an MVS calculation unit: the system comprises a plurality of cameras, an image acquisition module, a scale estimation module, an image processing module, a scale estimation module and a scale estimation module, wherein the scale estimation module is used for performing deep learning on the image acquisition module, the scale estimation module and the scale estimation module;

a consistency checking unit: and the scale estimation is used for carrying out scale estimation on the industrial part through a consistency check algorithm by utilizing the dense three-dimensional stereo vision.

Preferably, the plurality of cameras employ an active ethernet connection.

Compared with the prior art, the invention has the following beneficial effects:

the method for estimating the size of the industrial part based on the multi-view stereo vision adopts a method of target detection, SfM, MVS and consistency check, firstly builds a reasonable spatial layout of a multi-view camera, obtains a multi-view image of the industrial part by using a target detection algorithm and a data acquisition script, calculates a camera pose and a sparse point cloud by using the SfM for the acquired image, gradually generates a depth map from roughness to fineness by using the MVS of deep learning, generates a 3D point cloud from low resolution to high resolution, finally performs consistency check on the dense point cloud of the industrial part, filters abnormal points, and finally realizes size estimation of the industrial part. The invention can be practically applied to industrial production lines. It has the following advantages: (1) the spatial layout of a multi-view camera which is most suitable for an industrial production line can be set up, multi-view, omnibearing and multi-level images are obtained for stereoscopic vision reconstruction, and the quality of reconstructed point cloud is determined to a great extent by good image quality; (2) the SfM algorithm can ensure that the obtained camera pose has robustness and integrity, and meanwhile, the efficiency can be ensured; (3) the learning-based MVS algorithm can learn the low texture area of the industrial component, so that the integrity of dense three-dimensional stereo vision can be greatly improved, which is very important for the structural reconstruction of the industrial component; (4) the consistency test can filter out irrelevant noise in the dense three-dimensional stereoscopic vision background, so that the structure is more robust, and the scale estimation accuracy can be ensured by the calculated scale proportion of the dense three-dimensional stereoscopic vision and the actual industrial component. In summary, the present invention utilizes multi-view images for stereoscopic reconstruction, and the reconstructed dense three-dimensional stereoscopy enables robust estimation of the dimensions of industrial components.

Drawings

FIG. 1 is a diagram of simulated industrial components in an embodiment of the present invention.

Fig. 2 is a reasonable spatial layout for image acquisition in an embodiment of the present invention.

Fig. 3 shows the pose of the camera and the sparse point cloud reconstructed using incremental SfM in the embodiment of the present invention.

Fig. 4 is a flowchart of MVS implementing deep learning based on embodiments of the present invention.

FIG. 5 is a flowchart illustrating a consistency check on a three-dimensional point cloud according to an embodiment of the present invention.

FIG. 6 is a flow chart of the industrial part size estimation method based on multi-view stereo vision according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the detailed description.

In order to estimate the size of an industrial part to meet the task requirement, the invention provides an industrial part size estimation method and system based on multi-viewpoint stereo vision, and the method sequentially completes four processes: the method comprises the steps of target detection, SfM, MVS and consistency inspection, wherein multi-viewpoint images of industrial components are obtained, the pose and sparse point cloud of a camera are calculated, a depth map is generated step by step from rough to fine and from low resolution to high resolution, then the depth map is projected to a space again to generate dense three-dimensional stereo vision, consistency inspection is carried out on dense point cloud of the industrial components, and finally size estimation of the industrial components is achieved.

Referring to fig. 6, the method for estimating the size of the industrial component based on the multi-view stereo vision of the present invention specifically includes the following steps:

s1: calibrating the plurality of cameras according to the chessboard calibration version to obtain the internal parameters of each camera;

the chessboard is a calibration board formed by black and white square spaces and can be used as a calibration object for calibrating the camera. The calibration images need to be shot by using a calibration plate at different positions, different angles and different postures, and at least 3 images are needed, preferably 10-20 images. And finally, obtaining the internal parameters of the camera according to the orthogonal and normalized constraints.

S2: performing multi-view visual acquisition on the industrial component, and collecting a plurality of multi-view industrial component images at intervals by using a detection algorithm and a data acquisition script when the industrial component moves in the visual field of acquisition equipment;

as shown in fig. 1, a multi-view visual acquisition is performed on this simulated industrial part. As shown in fig. 2, a multi-view camera spatial layout which is most suitable for scale estimation is built, a plurality of camera acquisition devices are connected through POE (active ethernet), and when an industrial component is stably in a camera view and moves from a position3 to a position1, multi-view images under various spatial combinations are acquired by using a target detection algorithm and a data acquisition script.

S3: using the internal references of the plurality of cameras obtained from S1 and the plurality of multi-view industrial component images obtained from S2, an image suitable for scale estimation is selected by SfM (motion recovery structure), and the image suitable for scale estimation generally needs to satisfy the following conditions: (1) the number of the features in the image is enough, and (2) the number of the features in the image which can be matched with the features of other images as much as possible is enough, the selection of the image suitable for scale estimation can be set according to the requirements and specific conditions of technicians, and the selection is not limited here; then calculating the image pose of the selected image suitable for scale estimation, generating a sparse point cloud, and storing camera internal parameters, the selected image suitable for scale estimation, the image pose and the sparse point cloud;

the incremental SfM method is more robust and accurate, although slower, so the incremental SfM algorithm is to be adopted in the industrial component estimation. The input at this stage is the camera parameters obtained at S1

(i is the camera number, K) _i For this camera, internal reference, N _K Number of cameras) and a plurality of images y ═ { I ═ obtained in S2 _j |j＝1...N _I J is the number of the image, I _j Representing the image, N _I Number of images), the output is the image pose estimate P ═ { P of the selected image _c ∈SE(3)}(p _c Pose estimation of image, SE (3) representation rotation plus displacement) and sparse point cloud

(k is the number of the sparse point cloud, X _k Is a sparse point cloud, N _k The number of sparse point clouds), as shown in fig. 3, the result of the incremental SfM calculation is obtainedAnd finally, storing camera internal parameters, N selected images with the size of W multiplied by H (W represents the horizontal pixel number, and H represents the vertical pixel number), image poses and sparse point clouds. Wherein the camera internal parameters in the step S1 are selectable items, and if not adopted, the camera internal parameters calculated by the incremental SfM are saved.

S4: inputting the camera internal parameters stored in the S3, the selected image suitable for scale estimation, the image pose and the sparse point cloud into an MVS (multi-view stereo) adopting deep learning for dense reconstruction to obtain dense three-dimensional stereo vision of the industrial component;

as shown in fig. 4, the PatchmatchNet is a novel and learnable cascaded pipeline that can be used for high-resolution multi-view stereoscopic processing. And (4) taking the camera internal parameters, N selected images with the size of W multiplied by H (W represents the horizontal pixel number, and H represents the vertical pixel number), image poses and sparse point clouds stored in the S3 as input, extracting features in a plurality of resolution layers, and advancing depth map estimation from rough to fine. And after the depth map is estimated, point cloud fusion is carried out to obtain three-dimensional dense point cloud (namely dense three-dimensional stereo vision) of the industrial part.

S5: and carrying out scale estimation on the industrial component by using the obtained dense three-dimensional stereo vision through a consistency check algorithm.

As shown in fig. 5, after the point cloud data is acquired from S4, in order to ensure robustness, the point cloud is downsampled, and an assembly plane in the downsampled point cloud is fitted based on the 3D RANSAC algorithm, so as to obtain parameters of an assembly plane equation. And transforming the assembly point cloud (namely the assembly plane) to a parallel XOY plane according to a normal vector in the parameters for point cloud alignment, detecting peripheral contour points of the XOY plane points, filtering noise outside the peripheral contour points, performing parameter fitting on the peripheral contour points based on a two-dimensional RANSAC algorithm to obtain a scale ratio of the dense point cloud and the industrial component, and estimating the actual physical size of the industrial component according to the virtual size of the industrial component in the three-dimensional dense point cloud to obtain the scale estimation of the industrial component.

Compared with the result obtained by reconstruction by the traditional method, the method utilizes the learning-based MVS, and the current learning-based MVS method can reach the standard of industrial requirements on precision and integrity, so that the industrial efficiency and precision can be greatly improved, the error of the size estimation of industrial parts can reach centimeter level, and the speed can meet the requirements of industrial production lines. According to the method, through a consistency check algorithm, the correct scale proportion of the three-dimensional point cloud and the industrial component can be obtained, and therefore the scale of the industrial component can be obtained through comparison.

Examples

The method for estimating the size of the industrial part based on the multi-viewpoint stereo vision comprises the following steps:

the method comprises the following steps: firstly, calibrating 4 cameras according to a chessboard calibration version to obtain internal parameters of each camera;

the calibration plate needs to be a chessboard pattern formed by black and white rectangles, and the manufacturing precision requirement is high. The calibration board was photographed at different positions, different angles, and different postures under good indoor lighting conditions, with 16 pictures taken per camera for a total of 64 pictures. Finally, obtaining the internal reference K of the camera according to the orthogonal and normalized constraints _i ,i＝1...4。

Step two: carrying out 4 viewpoint visual collection on the simulated industrial component;

according to the spatial arrangement of cameras as shown in fig. 2, 4 camera acquisition devices are connected via POE (active ethernet), and when the industrial component is moved from position3 to position1 in the camera field of view as it is stabilized, multi-viewpoint images under various spatial combinations are acquired using an object detection algorithm and data acquisition scripts, with approximately 16 images acquired per camera for a total of 64 images.

Step three: using the internal reference K of the 4 cameras obtained from step one _i Selecting 22 images suitable for scale estimation through an SfM (motion recovery structure), calculating the pose of the selected image and generating a sparse point cloud, and storing camera parameters, the selected image suitable for scale estimation, the image pose of the image suitable for scale estimation and the sparse point cloud;

the incremental SfM method is more robust and accurate, although slower, so the incremental SfM algorithm is to be adopted in the industrial component estimation. The delivery of this stageObtaining camera internal parameters in the first step

(i is the camera number, K) _i For this camera, internal reference, N _K Number of cameras) and the plurality of images y ═ I { I } obtained in step two _i |i＝1...N _I J is the number of the image, I _j Representing the image, N _I Number of images), the output is the image pose estimate P ═ { P of the selected image _c ∈SE(3)}(p _c Pose estimation of image, SE (3) representation rotation plus displacement) and sparse point cloud

(k is the number of the sparse point cloud, X _k Is a sparse point cloud, N _k The number of the sparse point clouds), and finally storing camera internal parameters, N selected images with the size of W multiplied by H (W represents the number of horizontal pixels, and H represents the number of vertical pixels), the image pose of the image suitable for scale estimation and the sparse point clouds. And if not, storing the camera internal parameters obtained by the incremental SfM calculation.

Step four: inputting camera internal parameters, 22 selected images, 22 image poses and sparse point clouds stored in the third step into an MVS (multi-view stereo) adopting deep learning to carry out dense reconstruction so as to obtain dense three-dimensional stereo vision of the industrial component;

the PatchmatchNet is composed of multi-scale feature extraction, learning-based Patchmatch in a coarse-to-fine framework, and a spatial refinement module. Given 22 input images of size 2592x2048, the 17 th image and the remaining 21 images are used to represent the reference image and the source image, respectively. Pixel features are extracted using a similar Feature Pyramid Network (FPN), extracting features at 3 resolution levels, which enables depth map estimation to be advanced in a coarse-to-fine manner. And finally, after estimating the depth map, performing 3D point cloud fusion to obtain 3D dense point cloud (namely dense three-dimensional stereo vision) of the industrial part.

Step five: and D, utilizing the three-dimensional stereo vision obtained in the step four to estimate the scale of the industrial part through a consistency check algorithm.

And after point cloud data are obtained from the fourth step, in order to ensure robustness, down-sampling is carried out on the point cloud, and an assembly plane in the down-sampled point cloud is fitted based on a 3D RANSAC algorithm to obtain parameters of an assembly plane equation. And transforming the assembly point cloud (namely the assembly plane) to a parallel XOY plane according to a normal vector in the parameters for point cloud alignment, detecting peripheral contour points of the XOY plane points, filtering noise outside the peripheral contour points, performing parameter fitting on the peripheral contour points based on a two-dimensional RANSAC algorithm to obtain a scale ratio of the 3D dense point cloud and the industrial components, and estimating the actual physical size of the industrial components according to the virtual size of the industrial components in the 3D dense point cloud.

Determining the direction of screw points in point cloud simulating an industrial part by a statistical method, acquiring the number of points in different radian intervals by using a double-interval sampling method, acquiring the position of the screw under the limitation of a mean value threshold, finally scanning along the screw direction to acquire the number of screw point clouds and counting the number of points under different lengths, and calculating the length of the screw by using the scale ratio of the point clouds to the industrial part.

In conclusion, the reasonable spatial layout of the multi-view camera is firstly established, when an industrial part stably passes through the visual field of the multi-view camera, a multi-view image of the industrial part is obtained by using a target detection algorithm and a data acquisition script, after the acquired image is calibrated by the camera, a calibration result is input into an SfM to calculate the pose and the sparse point cloud of the camera, finally, a depth map is generated step by step from rough to fine and from low resolution to high resolution by adopting an MVS based on deep learning, and then the depth map is projected to the space again to generate the dense three-dimensional stereo vision. The quality of the dense three-dimensional stereo vision is closely related to the accuracy of scale estimation, so in the previous step, the collected multi-viewpoint high-resolution image, camera calibration, incremental SfM and MVS based on deep learning make indispensable contribution to the reconstruction of the quality of the dense three-dimensional stereo vision, and the stability and robustness of the scale estimation are met. And finally, carrying out consistency check on the dense three-dimensional stereo vision of the industrial component, filtering abnormal points, and finally realizing size estimation of the industrial component. The invention can be practically applied in an industrial production line, can save a large amount of manpower and material resources compared with manual inspection, can obviously improve the inspection accuracy and has great economic value.

Claims

1. An industrial part size estimation method based on multi-view stereo vision is characterized by comprising the following processes:

2. The method for estimating the size of the industrial part based on the multi-view stereo vision as claimed in claim 1, wherein the process of acquiring a plurality of multi-view industrial part images by using a plurality of cameras with pre-calibrated internal parameters to perform multi-view visual acquisition on the industrial part comprises:

3. The method for estimating the size of the industrial part based on the multi-view stereo vision as claimed in claim 1, wherein the SfM algorithm adopts an incremental SfM algorithm.

4. The method for estimating the size of the industrial part based on the multi-view stereo vision according to claim 1, wherein the process of performing dense reconstruction through an MVS algorithm of deep learning by using the internal parameters of a plurality of cameras, the image suitable for scale estimation, the image pose and the sparse point cloud to obtain the dense three-dimensional stereo vision of the industrial part comprises the following steps:

5. The method for estimating the size of the industrial part based on the multi-viewpoint stereo vision as claimed in claim 1, wherein the process of performing the scale estimation of the industrial part by the consistency check algorithm by using the dense three-dimensional stereo vision comprises:

6. The method as claimed in claim 1, wherein when calibrating the internal parameters of the plurality of cameras, calibrating the plurality of cameras according to a chessboard calibration version to obtain the internal parameters of each camera.

7. The method as claimed in claim 6, wherein the checkerboard pattern is a checkerboard pattern formed by black and white rectangles.

8. The method as claimed in claim 7, wherein the calibration of the cameras is performed according to a chessboard calibration plate, the calibration plate is photographed at different positions, different angles and different postures under good indoor lighting conditions, each camera takes several photos, and finally the internal parameters of each camera are obtained according to the orthogonal and normalized constraints.

9. An industrial part size estimation system based on multi-view stereo vision, comprising:

MVS calculation Unit: the system comprises a plurality of cameras, an image acquisition module, a scale estimation module, an image processing module, a scale estimation module and a scale estimation module, wherein the scale estimation module is used for performing deep learning on the image acquisition module, the scale estimation module and the scale estimation module;

10. The system of claim 9, wherein the plurality of cameras are connected using active ethernet.