CN113033435A

CN113033435A - Whole vehicle chassis detection method based on multi-view vision fusion

Info

Publication number: CN113033435A
Application number: CN202110343920.3A
Authority: CN
Inventors: 田涌涛; 李成龙
Original assignee: Suzhou Parking Intelligent Technology Co ltd
Current assignee: Suzhou Parking Intelligent Technology Co ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-25
Anticipated expiration: 2041-03-31
Also published as: CN113033435B

Abstract

The invention discloses a whole vehicle chassis detection method based on multi-view vision fusion.A plurality of vision cameras are added on a mechanical device for automatically holding/placing a vehicle to capture a plurality of part pictures of the appearance of the vehicle, particularly the chassis; and generating an integral picture of the vehicle chassis in real time by adopting a picture visual fusion technology in combination with historical detection data. The invention can make quick evidence for reserving whether the appearance of the vehicle chassis is damaged or not in the handover process of the whole vehicle logistics, provides technical support for the automatic whole vehicle logistics, improves the handover efficiency of the whole vehicle logistics, and makes the responsibility for rubbing and scratching in the vehicle logistics definite.

Description

Whole vehicle chassis detection method based on multi-view vision fusion

Technical Field

The invention relates to a whole vehicle chassis detection method based on multi-view vision fusion, and belongs to the technical field of vehicle detection.

Background

In the whole vehicle logistics link of a vehicle enterprise, a new vehicle needs to go off-line from a vehicle factory assembly line to a 4S store through a plurality of logistics links. The risks such as scratch damage and the like caused by scratching of the appearance of the whole vehicle can be caused in each moving and carrying link. In order to avoid the responsibility for damaging the vehicle by scratching during transportation, repeated confirmation by manpower or naked eyes is required during the transfer process, but the responsibility for damaging the vehicle cannot be confirmed because of the careless check and confirmation, which is often caused.

Disclosure of Invention

The invention aims to provide a whole vehicle chassis detection method based on multi-view vision fusion, which is characterized in that a plurality of vision cameras are added on a mechanical structure for automatically holding/placing a vehicle to capture a plurality of part pictures of the appearance of the vehicle, particularly the chassis. And generating the whole picture of the vehicle chassis in real time by using a picture visual fusion technology. For the handing-over flow of the whole vehicle logistics, whether the appearance of the vehicle chassis is damaged or not is proved and saved.

In order to achieve the purpose, the invention provides the following technical scheme:

a vehicle chassis detection method based on multi-view vision fusion comprises the following steps:

s1, a plurality of vision cameras are distributed and installed on a mechanical device for holding/placing the vehicle, each vision camera corresponds to a part of the chassis area of the vehicle, and the sum of the chassis areas corresponding to all the vision cameras completely covers all the chassis areas of the vehicle;

s2, when the mechanical device holds/places the vehicle, all the vision cameras are used for shooting to obtain the local chassis images of the corresponding vehicles;

s3, extracting key points on each local chassis image, and matching feature vectors of the key points of two different local chassis images; the key points refer to small areas which are protruded in the image, and the characteristic vectors represent the intensity modes around the key points;

s4, estimating a homography matrix by adopting a Ranmac algorithm and successfully matched characteristics, calculating an affine transformation matrix of every two successfully matched images by using the homography matrix, and fusing all local chassis images by adopting a linear gradient method to obtain a complete panoramic image of the vehicle chassis;

and S5, detecting the panoramic image of the vehicle chassis by adopting a neural network object detection method, identifying the positions of four hubs or tires of the vehicle, calculating a projection transformation matrix of the panoramic image, and performing projection correction on the panoramic image by using the projection transformation matrix to obtain a final synthetic chassis panoramic image.

Further, in step S1, the vision camera is mounted on the mechanical device through a position adjustment mechanism.

Further, in step S3, the process of extracting the keypoints on each local chassis image and matching the feature vectors of the keypoints of two different local chassis images includes the following steps:

s31, performing image enhancement processing on the shot local chassis image by adopting an automatic white balance algorithm;

s32, extracting key points according to each local chassis image by combining the brightness relation of the connected pixels;

s33, aiming at each key point, creating a feature vector according to the brightness relation of each pixel in the key point field range;

and S34, matching the feature vectors of the key points of the two different local chassis images, and judging the matching relationship between the two different local chassis images according to the similarity of the feature vectors of the key points.

Further, in step S32, the process of extracting the key points for each local chassis image in combination with the luminance relationship of the connected pixels includes the following steps:

s321, a pixel is given, the brightness of N pixels in the area range centered on the pixel is compared, and the pixels in the area range are divided into three categories:

setting the pixels with higher brightness than the pixel point and brightness difference larger than a preset brightness threshold as a class I, setting the pixels with lower brightness than the pixel point and brightness larger than the preset brightness threshold as a class II, and setting the pixels with the rest brightness as a class III;

s322, counting the number of the connected pixels in the region, and if the number of the connected I-type pixels or II-type pixels is larger than a preset number threshold, taking a given pixel point as a key point;

s323, repeating steps S321 to S322 until all pixel points of the whole image are processed.

Further, in step S33, the process of creating a feature vector according to the luminance relationship of each pixel in the range of the key point for each key point includes the following steps:

s331, smoothing the local chassis image by using a Gaussian kernel;

s332, giving a key point;

s333, randomly selecting a pair of pixels in sequence from a defined square field taking a given key point as a center;

s334, comparing the brightness of the two randomly selected pixels, if the brightness of the first pixel is higher than that of the second pixel, setting the corresponding bit of the descriptor of the given key point to be 1, otherwise, setting the corresponding bit of the descriptor of the given key point to be 0;

s335, repeating the step S333 to the step S334 for M times aiming at the given key point, and putting the pixel brightness comparison result of M times into the binary feature vector of the key point;

and S335, repeating the step S323 to the step S335 until all key points are processed.

Further, in step S34, the process of matching the feature vectors of the key points of the two different local chassis images and determining the matching relationship between the two different local chassis images according to the similarity of the feature vectors of the key points includes the following steps:

s341, taking one of the local chassis images as a training image;

s342, calculating and storing an ORB descriptor of the training image into a memory, wherein the ORB descriptor of the training image comprises a binary feature vector of a key point;

and S343, inquiring ORB descriptors of other local chassis images, and performing key point matching on the ORB descriptors of the other local chassis images and ORB descriptors of the training images by adopting a matching function.

Further, in step S343, the process of matching the key points with the ORB descriptors of the training images by using the matching function includes:

and calculating the similarity of the standard Euclidean distance between any two key points in the two images as the key matching quality.

and calculating whether the feature vectors between any two key points in the two images contain a descriptor sequence with the similarity larger than a preset similarity threshold.

Further, in step S4, the process of fusing all the local chassis images by using the linear gradient method to obtain a complete panoramic image of the vehicle chassis includes:

s41, matching the extracted key points on all local chassis images by using the feature descriptors to obtain the same key points in different local chassis images, and generating a plurality of groups of matched pairs;

s42, referring to the shooting parameters of the current logistics link, the shooting parameters of the historical logistics environment and the historical chassis images of the vehicles, and analyzing the relative positions of all local chassis images according to the obtained matching pairs;

s43, adjusting the picture direction of each local chassis image to make the directions consistent;

s44, projecting all partial chassis images on a spherical surface or a cylindrical surface through projective transformation;

s45, calculating to obtain a splicing region of adjacent local chassis images, processing pixels of the splicing region according to a preset fusion algorithm, and removing staggered pixels between the adjacent local chassis images to obtain a final chassis panoramic image; the preset fusion algorithm is to perform weighting processing on pixels in the splicing region according to the distance between the pixels and the joint.

Further, the detection method further comprises the following steps:

s4, packaging the acquired chassis panoramic image and the corresponding local chassis image processing data into a detection data file of the current logistics link;

and S5, performing hash processing on the detection data file, uploading the corresponding hash value to the cloud server, and loading the hash value to the corresponding block of the block chain.

The invention has the beneficial effects that:

(1) the technical support is provided for the automatic whole vehicle logistics, the handover efficiency of the whole vehicle logistics is improved, and the responsibility for rubbing and rubbing in the vehicle logistics is clear.

(2) The workload of personnel detection is reduced, the manual work miss is reduced, and the cost is effectively saved.

(3) And a mechanical device is adopted to automatically collect picture operation, so that the detection operation safety is ensured.

(4) And the block chain technology is used for storing the detection data, so that the safety of the detection data is ensured.

(5) Historical detection data are introduced in the fusion process, so that the fusion effect is effectively enhanced, and the fusion speed is increased.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

Fig. 1 is a flowchart of a vehicle chassis detection method based on multi-view vision fusion according to an embodiment of the present invention.

Fig. 2 is an exemplary diagram of a fusion result of a chassis panoramic image according to an embodiment of the present invention.

Fig. 3 is a schematic representation of one of the eigenvectors.

FIG. 4 is a diagram of a neighborhood corresponding to one of the keypoints.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a flowchart of a vehicle chassis detection method based on multi-view vision fusion according to an embodiment of the present invention. The embodiment can be applied to the condition of detecting the whole vehicle chassis through the server, the multi-view vision equipment and the like.

Referring to fig. 1, the method for detecting the chassis of the whole vehicle specifically includes:

and S1, a plurality of vision cameras are distributed and installed on a mechanical device for holding/placing the vehicle, each vision camera corresponds to a part of the chassis area of the vehicle, and the sum of the chassis areas corresponding to all the vision cameras completely covers all the chassis areas of the vehicle.

With respect to the camera object distance (within 200 mm), the chassis of a typical passenger car is oversized (over 3000 mm) and cannot be imaged by a single camera. Therefore, the invention provides that a plurality of visible light cameras are arranged on a mechanical device for holding/placing a vehicle, local images of the chassis are shot at different positions, and then the whole panoramic picture of the whole chassis is spliced by an image fusion technology. The invention further provides that the vision camera is arranged on the mechanical device through the position adjusting mechanism due to different vehicle appearance parameters, and the shooting position of the vision camera can be quickly adjusted through the position adjusting mechanism for vehicles with different appearance parameters so as to adapt to different shooting requirements. For example, the vision camera is installed by adopting a clamping device capable of being disassembled and assembled quickly, or the vision camera is moved quickly by arranging a short-distance guide rail on a mechanical device, and even a plurality of installation positions and fixed components are reserved on the mechanical device so as to realize the quick disassembly and assembly of the vision camera.

And S2, when the mechanical device holds/places the vehicle, all the visual cameras are used for shooting to obtain the local chassis images of the corresponding vehicles.

S3, extracting key points on each local chassis image, and matching feature vectors of the key points of two different local chassis images; keypoints refer to small areas of the image that stand out, and feature vectors represent intensity patterns around the keypoints. Specifically, the method comprises the following steps:

and S31, performing image enhancement processing on the shot local chassis image by adopting an automatic white balance algorithm.

And S32, extracting key points according to the brightness relation of the connected pixels of each local chassis image. The method comprises the following steps: s321, a pixel is given, the brightness of N pixels in the area range centered on the pixel is compared, and the pixels in the area range are divided into three categories: setting the pixels with higher brightness than the pixel point and brightness difference larger than the preset brightness threshold as class I, setting the pixels with lower brightness than the pixel point and brightness larger than the preset brightness threshold as class II, and setting the pixels with the rest brightness as class III. And S322, counting the number of the connected pixels in the region, and if the number of the connected I-type pixels or II-type pixels is larger than a preset number threshold, taking the given pixel point as a key point. S323, repeating steps S321 to S322 until all pixel points of the whole image are processed.

The present embodiment employs ORB to quickly create feature vectors for keypoints in an image, which can be used to identify ground part features in the image. ORB is characterized by being ultra fast and to some extent immune to noise and image transformations, such as rotation and scaling transformations.

Specifically, the ORB first finds a special region, called a keypoint, from the image. The key points are small areas, such as corner points, protruding from the image, or pixel points with the characteristic that the pixel value changes from light color to dark color sharply. The ORB then computes a corresponding feature vector for each keypoint. In this embodiment, the feature vector created by the ORB algorithm contains only 1 and 0, which is called binary feature vector. The order of 1 and 0 will vary depending on the particular keypoint and the pixel area around it. The vector represents the intensity pattern around the keypoint, so multiple feature vectors can be used to identify larger regions, even particular objects in the image.

For example, the process of quickly selecting key points is as follows: giving a pixel point p, and dividing 16 pixels in the circle range of a FAST comparison target p into three classes according to the proportion that each pixel is higher than p, smaller than p or similar to p. On this basis, for a given threshold h, brighter pixels, i.e. pixels with a luminance above Ip + h, darker pixels, i.e. pixels with a luminance below Ip-h, similar pixels will be pixels with a luminance between these two values. After the pixels are classified, if there are more than 8 contiguous pixels on the circle, pixel p is selected as the keypoint if it is darker or lighter than p.

S33, for each keypoint, a feature vector is created based on the luminance relationship of each pixel in the region of the keypoint. The method comprises the following steps:

s331, smoothing the local chassis image by using a Gaussian kernel; s332, giving a key point; s333, randomly selecting a pair of pixels in sequence from a defined square field taking a given key point as a center;

s334, comparing the brightness of the two randomly selected pixels, if the brightness of the first pixel is higher than that of the second pixel, setting the corresponding bit of the descriptor of the given key point to be 1, otherwise, setting the corresponding bit of the descriptor of the given key point to be 0; s335, repeating the step S333 to the step S334 for M times aiming at the given key point, and putting the pixel brightness comparison result of M times into the binary feature vector of the key point; and S335, repeating the step S323 to the step S335 until all key points are processed.

In this embodiment, a binary eigenvector (also called binary descriptor) is used, which is an eigenvector containing only 1 and 0. Each keypoint is described by a binary feature vector, which is typically an 128.512-bit string containing only 1's and 0's. These feature vectors may collectively represent an object. Fig. 3 is a schematic representation of one of the eigenvectors.

A given image is first smoothed using a gaussian kernel to prevent the descriptors from being overly sensitive to high frequency noise. Second, for a given keypoint, a pair of pixels is randomly selected within a well-defined neighborhood around the keypoint, called Patch, which is a square with a particular pixel width and height. FIG. 4 is a diagram of a neighborhood corresponding to one of the keypoints. Shown as light gray squares in fig. 4, is the first pixel in a random pair, which is one pixel extracted from a gaussian distribution centered on a keypoint, with a standard deviation or dispersion trend of σ. The pixel shown as a dark gray square in fig. 4 is the second pixel in the random pair. Which is the pixel extracted from the gaussian distribution centered around this first pixel, with a standard deviation of sigma/2, experience has shown that this gaussian selection improves the feature matching rate. Finally, binary descriptors are constructed for the keypoints by comparing the intensities of the two pixels. For example, if a first pixel is brighter than a second pixel, the corresponding bit in the descriptor is assigned a value of 1, otherwise the value of 0 is assigned. For a 256-bit vector, the present embodiment repeats this process 256 times for the same keypoint and then moves to the next keypoint. The 256 pixel intensity comparison results are then placed into the binary feature vector for the keypoint. The foregoing process is repeated until a vector is created for each keypoint in the image.

Specifically, one of the local images is given as a training image, similar features are searched for in other local images, and the searched image is defined as a query image.

The first step is to compute and store the ORB descriptors of the training images in memory. The ORB descriptor will contain a binary feature vector to describe the keypoints in this training image. The second step is to calculate and save the ORB descriptor of the query image, and after the descriptors of the training and query images are obtained, the last step is to use the corresponding descriptors to perform keypoint matching on the two images, and this step is usually completed by using a matching function.

The purpose of the matching function is to match the keypoints of two different images by comparing the descriptors of the two images to see if they are very close to match. When the matching function compares two keypoints, it derives the matching quality according to some index, and the index represents the similarity of the feature vectors of the keypoints. This index can be viewed as the standard euclidean distance similarity between two keypoints. Some indicators will directly detect whether the feature vector contains 1's and 0's in a similar order. It should be noted that different matching functions use different criteria to determine the matching quality. For binary descriptors used by ORB et al, the hamming index is typically used because it is very fast to execute.

And S4, estimating a homography matrix by adopting a Randac algorithm and the characteristics of successful matching, calculating an affine transformation matrix of every two images successfully matched by using the homography matrix, and fusing all local chassis images by adopting a linear gradient method to obtain a complete panoramic image of the vehicle chassis.

Considering that the vehicle needs to pass through a plurality of logistics links, each logistics link needs chassis photo fusion. Due to the particularity of vehicle transportation, shooting parameters (including external factors such as ambient light) of each link cause slightly different shot local images, but as a shooting object is the same vehicle chassis, the embodiment provides that historical detection data are introduced in the fusion process. For example, the feature points of the non-edge region are additionally selected according to the historical chassis images, and the relative position, the picture direction and the like between the local chassis images are quickly obtained, so that the fusion effect is further enhanced or the fusion speed is increased.

Specifically, the method comprises the following steps:

and S41, matching the extracted key points on all the local chassis images by using the feature descriptors to obtain the same key points in different local chassis images, and generating a plurality of groups of matched pairs. And S42, analyzing and obtaining the relative positions of all local chassis images according to the obtained matching pairs by referring to the shooting parameters of the current logistics link, the shooting parameters of the historical logistics environment and the historical chassis images of the vehicles. S43, the screen direction of each partial chassis image is adjusted so as to be the same. S44, because the different stitching of the perspective angles of the multiple cameras can destroy the consistency of the field of view, all images are projected on a spherical surface or a cylindrical surface through the projection transformation. S45, calculating to obtain a splicing region of adjacent local chassis images, processing pixels of the splicing region according to a preset fusion algorithm, and removing staggered pixels between the adjacent local chassis images to obtain a final chassis panoramic image; the preset fusion algorithm is to perform weighting processing on pixels in the splicing region according to the distance between the pixels and the joint.

A patchwork refers to the most similar line in the overlap region between images. After the splicing positions of the two images are obtained, a plurality of pixels near the splicing are fused, and the dislocation between the images is removed to obtain a splicing result. The fusion here may be, for example, weighted fusion of the positions near the seam based on the distance from the seam. And the weight can be obtained by fast calculation according to the shooting parameters of the current logistics link, the shooting parameters of the historical logistics environment and the corresponding historical weight. Fig. 2 is an exemplary diagram of a fusion result of a chassis panoramic image in an embodiment of the present invention.

In some examples, the detection method further comprises the steps of:

and S4, packaging the acquired chassis panoramic image and the corresponding local chassis image processing data into a detection data file of the current logistics link. And S5, performing hash processing on the detection data file, uploading the corresponding hash value to the cloud server, and loading the hash value to the corresponding block of the block chain. The data security and the traceability of the logistics handover process are ensured through the characteristic that the block records can not be tampered. Preferably, the shooting parameters of the historical logistics environment and the historical chassis image of the vehicle are downloaded from the corresponding block of the block chain.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A vehicle chassis detection method based on multi-view vision fusion is characterized by comprising the following steps:

2. The vehicle chassis inspection method based on multi-view vision fusion of claim 1, wherein in step S1, the vision camera is mounted on a mechanical device through a position adjustment mechanism.

3. The vehicle chassis detection method based on multi-view vision fusion of claim 1, wherein in step S3, the process of extracting the key points on each local chassis image and matching the feature vectors of the key points of two different local chassis images comprises the following steps:

4. The vehicle chassis detection method based on multi-view vision fusion of claim 3, wherein in step S32, the process of extracting key points for each local chassis image in combination with the luminance relationship of the connected pixels comprises the following steps:

5. The vehicle chassis inspection method based on multi-view vision fusion of claim 3, wherein in step S33, the process of creating feature vectors according to the brightness relationship of each pixel in the domain of each key point for each key point comprises the following steps:

s331, smoothing the local chassis image by using a Gaussian kernel;

s332, giving a key point;

6. The vehicle chassis detection method based on multi-view vision fusion of claim 5, wherein in step S34, the process of matching the feature vectors of the key points of the two different local chassis images and determining the matching relationship between the two different local chassis images according to the similarity of the feature vectors of the key points comprises the following steps:

s341, taking one of the local chassis images as a training image;

7. The vehicle chassis detection method based on multi-view vision fusion of claim 6, wherein in step S343, the process of matching key points between the ORB descriptors of the training images and the matching functions comprises:

8. The vehicle chassis detection method based on multi-view vision fusion of claim 6, wherein in step S343, the process of matching key points between the ORB descriptors of the training images and the matching functions comprises:

9. The vehicle chassis detection method based on multi-view vision fusion of claim 6, wherein in step S4, the process of fusing all the partial chassis images by using a linear gradient method to obtain a complete vehicle chassis panoramic image includes:

10. The vehicle chassis detection method based on multi-view vision fusion of claim 1, characterized in that the detection method further comprises the following steps: