CN114004895A

CN114004895A - Moving object image feature point identification and reconstruction method based on multi-view vision

Info

Publication number: CN114004895A
Application number: CN202111210015.7A
Authority: CN
Inventors: 撒国栋; 撒国良; 吴昊; 殷兴耀; 刘振宇
Original assignee: Zhejiang University of Science and Technology ZUST
Current assignee: Zhejiang University of Science and Technology ZUST
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-02-01

Abstract

The invention discloses a moving object image feature point identification and reconstruction method based on multi-view vision. Calibrating a multi-view visual camera, preprocessing an image shot by a moving object and converting the image into a Lab space, acquiring color sampling data of a background and an ROI area, segmenting the ROI area from the image, processing the image according to human-computer interaction to acquire all feature points and auxiliary identification points, sequencing the feature points from near to far according to the distance of a straight line constructed by the auxiliary identification points, and performing three-dimensional reconstruction by reversely calculating the actual coordinates of the feature points from different image coordinates in different images shot by the multi-view camera at the same time by using the same feature point. The method has the characteristics of simple operation, effective identification, high precision and the like, and is suitable for occasions of three-dimensional reconstruction, static and dynamic target identification tracking and the like of complex scenes.

Description

Moving object image feature point identification and reconstruction method based on multi-view vision

Technical Field

The invention relates to a method for identifying and reconstructing target points in an image, in particular to a method for identifying and reconstructing characteristic points of an image of a moving object based on multi-view vision.

Background

The stereoscopic vision technique is an important technique in the field of computer vision, and aims to reconstruct three-dimensional geometric information of a scene. At present, the technology is applied to the fields of industrial automation, medicine, building and the like, and the application range is very wide. The main principle of the technology is that camera parameters are obtained through calibration, a target image is shot, and two-dimensional information in the image is converted into three-dimensional information in space. The technology has the advantages of high calculation efficiency, high processing speed, proper precision, low cost, no need of contacting with a target object and the like, and is very suitable for product identification detection, industrial automation equipment identification detection and the like.

However, as the application range is wider, the demand for the stereoscopic vision technology is also increasing. The traditional binocular stereo vision has some defects aiming at the scene that the target object is a moving object. The traditional binocular stereo vision adopts two cameras to shoot the same target object, and adopts the parallax between two pictures shot simultaneously to calculate the coordinate, so that when the coordinate is applied to a moving object, the condition of inapplicability can occur, and the following two main defects exist:

(1) the installation of a general camera is fixed, a binocular vision method is adopted, the visual field is limited, and the condition outside the visual field cannot be obtained, so the general camera is only used for identifying a static object, and if a target object is a moving object, the traditional method is difficult to normally use;

(2) the traditional binocular vision method is not suitable for monotonous scenes lacking textures, and if the texture of a shot target is single, the situation of difficult matching can occur, so that the three-dimensional situations of the scene and the target can not be accurately restored. If in a motion scene, the contour and the texture can be changed, and the method is not suitable for the traditional method.

For the above situation, for the moving object, a larger view field is needed to ensure that the shot object is always within the measurement range, and a new identification method is needed to better identify the target object, which cannot be realized by the traditional stereoscopic vision.

Disclosure of Invention

In order to solve the problems existing in the background technology, the invention aims to provide a moving object image feature point identification and reconstruction method based on multi-view vision, and solves the problems that the field of view is limited and the method can not be applied to a moving object with insufficient texture in the conventional general stereoscopic vision at present. The recognition method of the invention uses multi-view vision to capture the image, and can obviously improve the recognition acquisition rate of effective images.

In order to achieve the purpose, the invention adopts the technical scheme that:

step (1): calibrating the multi-view vision camera by adopting a Zhangyingyou calibration method;

step (2): reading an image shot by a multi-view camera aiming at a moving object, carrying out distortion correction on the image, preprocessing Gaussian noise reduction and converting an image color space into a Lab space;

and (3): acquiring color sampling data of a background and an ROI (region of interest), dividing an image into a background region and an ROI region according to the color sampling data, and segmenting the ROI region from the image;

and (4): selecting different feature points and auxiliary identification points in the identification ROI area according to human-computer interaction to obtain color sampling values of the feature points and the auxiliary identification points, identifying the feature point area and the auxiliary identification point area according to the color sampling values of the feature points and the auxiliary identification points, further obtaining the number of the feature points and the auxiliary identification points, and comparing the number of the feature points and the auxiliary identification points to obtain all the feature points and the auxiliary identification points;

and (5): constructing a marking straight line by using the auxiliary identification points, and sequencing the characteristic points from near to far according to the distance from the points to the straight line;

and (6): and reversely calculating the actual coordinates of the feature points by using the different image coordinates in different images shot by the multi-view camera at the same time by the same feature points to carry out three-dimensional reconstruction.

The invention completes the accurate extraction of the characteristic points of the ROI area in the image and the matching among the multi-view cameras through the processing of the steps (3) to (5), and realizes the accurate capture of the target object.

In the step (2), an XYZ color space is borrowed from an RGB color space to a Lab color space, and the specific calculation formula is as follows:

R^*＝R/255，G^*＝G/255，B^*＝B/255

r, G, B represents the RGB values of the pixels in the original image, R^*、G^*、B^*Respectively representing values for scaling RGB values to a range between 0 and 1, X, Y, Z respectively representing hypothetical three primary XYZ values in the XYZ color space; l, a, b represent three values in the Lab color space, respectively, where L represents luminance, a represents a component from green to red, and b represents a component from blue to yellow; h represents a coefficient matrix, X_n、Y_n、Z_nThe sum of the coefficients in the first row and the coefficients in the second row of the coefficient matrix H in the matrix operationThe sum and the sum of the coefficients of the third row; f (t) is a correction function required for calculating three values of L, a and b, t represents an independent variable and is expressed by

Bringing in.

In the step (3), a plurality of landmark pixels of the ROI area and the background area are selected through human-computer interaction, and the color values of the landmark pixels of the ROI area and the background area are obtained and then averaged to be used as respective color sampling values; after obtaining the color sampling values, the ROI area is extracted from the image:

for each pixel p, the following formula euclidean distance dpk is used to determine whether the pixel belongs to the ROI region or the background region:

wherein d is_pkThe sampling value of the pixel p in a region k is represented, and k represents the sequence number of the region and represents whether the region is a background region or an ROI region; a is_p、b_pRespectively representing the components of the two channels a, b in Lab color space, a, of the pixel p_k、b_kRespectively representing the components of a channel and a channel b in Lab color space of the color sampling value of the region k;

sampling value d if pixel p is in background area_p0Sampling value d of pixel p in set ROI area_p1Then pixel p belongs to the background region; sampling value d if pixel p is in background area_p0>Sampling value d of pixel p in set ROI area_p1Then pixel p belongs to the ROI region.

The auxiliary identification point is another color point which is preset on the moving object for matching points identified in different images, and the auxiliary identification point is arranged in the ROI area. Each moving object is preset with an auxiliary identification point. In specific implementation, a mark paste is attached to the moving object, a mark pattern is attached to the mark paste to serve as an auxiliary identification point, and the position of the auxiliary identification point on the moving object is known.

The feature points are color mark points which are arranged for capturing three-dimensional features of the target object, and the positions of the feature points on the moving object are unknown by additionally arranging or selecting surface positions with structural features as the feature points.

The characteristic points and the auxiliary identification points are both positioned in the ROI area.

The step (4) is specifically as follows:

(4.1) processing the characteristic points and the auxiliary identification points in the ROI to obtain color sampling values in the following modes: firstly, determining a pixel in a characteristic point/auxiliary identification point in an ROI area as a seed pixel by a man-machine interaction method, adding the seed pixel into an empty stack, and then obtaining the ROI area by adopting the algorithm processing of the following stack:

1) popping the pixels at the top of the stack to serve as popped pixels;

2) marking the pop pixels as pixels belonging to the characteristic points/auxiliary identification points;

3) calculating Euclidean distances between each adjacent pixel of the four neighborhoods of the upper, the lower, the left and the right of the pop pixel and the pop pixel respectively: if the Euclidean distance of the adjacent pixels is smaller than the preset similarity metric value T_eAnd the adjacent pixel is not marked to belong to the characteristic point or the auxiliary identification point, the adjacent pixel is stacked;

4) continuously iterating the processes 1) to 3) until the Euclidean distances of all adjacent pixels are not less than the preset similarity metric value T at the current iteration_eOr all the adjacent pixels are marked as feature points or auxiliary identification points, and then the iteration is stopped;

taking the color average value of all the finally obtained pixels belonging to the characteristic point as the color sampling value L of the characteristic point₂、a₂、b₂Taking the average value of the colors of all the pixels belonging to the auxiliary identification points obtained finally as the color sampling value L of the auxiliary identification points₃、a₃、b₃。

(4.2) for each pixel p, the following formula Euclidean distance d is adopted_pkTo judge that the pixel belongs to the characteristic point regionIs the auxiliary identification point area:

wherein d is_pkThe sampling value of the pixel p in a region k is represented, and k represents the sequence number of the region and represents whether the region is a background region or an ROI region; a is_p、b_pRespectively representing the components of the two channels a, b in Lab color space, a, of the pixel p_k、b_kThe components of the color sampling values of the region/point k in the Lab color space of the two channels a and b,

if d is_p1<d_p2And d is_p1<d_p3If the pixel belongs to the non-feature point and non-auxiliary identification point region;

if d is_p2<d_p1And d is_p2<d_p3If the pixel belongs to the feature point area, the pixel belongs to the feature point area;

if d is_p3<d_p1And d is_p3<d_p2If the pixel belongs to the auxiliary identification point area, the pixel belongs to the auxiliary identification point area;

(4.3) performing circle fitting on all the identified characteristic point areas/auxiliary identification point areas, and taking the obtained circle center as the center coordinate of the characteristic point area/auxiliary identification point area, namely the coordinate of the characteristic point/auxiliary identification point; and counting the number of the characteristic points/auxiliary identification points according to the number of the coordinates, comparing the number with the actual number of the characteristic points/auxiliary identification points on the known moving object, and if the number is consistent, successfully identifying and carrying out the next step.

In the step (5), the feature points are sorted from near to far according to the distance to the marking straight line to determine whether the feature points in different images shot by the multi-view camera at the same time are the same, and the feature points in different images with the same sort are regarded as the same feature point.

The distance and positional relationship of the feature point with respect to the mark straight line do not substantially change regardless of the camera shooting angle, and thus adopted as the feature point matching method.

And (6) matching the same feature point in different images shot by the multi-view camera at the same time, and calculating the three-dimensional world coordinates of the feature point by using the camera parameters obtained by calibration in the step (1).

The camera is calibrated, a photographed image is read in and then a Lab color space is converted, an ROI (region of interest) region in the image is extracted according to sampling values of different regions, then color sampling values of feature points and auxiliary identification points are obtained, the ROI region is subjected to color identification again to obtain a feature point region, whether identification is successful or not is judged through quantity comparison, and finally matching is carried out by means of the auxiliary identification points, three-dimensional coordinates are inversely calculated, and reconstruction is carried out.

When the characteristic points are identified, the invention firstly obtains images captured by a multi-view camera, samples color values of different image areas and color points, selects a clustering center through man-machine interaction during sampling, and adopts the average color value of an image element after the growth of an approximate color area as a sampling color. And then segmenting an ROI (region of interest) region by using a nearest neighbor principle, continuously carrying out target color identification and segmentation in the ROI region, then carrying out quantity comparison and position sequencing on the identified target feature points, and finally realizing shape identification and coordinate acquisition of the feature points.

Due to the adoption of the technical scheme, the invention has the beneficial effects that:

the invention uses multi-vision, can realize larger coverage of the visual field, and can realize accurate capture of the target object for the moving object.

The method has wider applicability, improves the identification effectiveness by adopting more direct color characteristics, ensures that the color characteristics can not change in moving objects and ensures the accuracy.

The method has the characteristics of simple operation, effective identification, high precision and the like, and is suitable for occasions of three-dimensional reconstruction, static and dynamic target identification tracking and the like of complex scenes, such as a feedback-free network system of an unmanned aerial vehicle, an intelligent vehicle tracking system with complex terrain or an animal tracking system.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic view of a shooting model according to an embodiment of the method of the present invention;

FIG. 3 is a diagram of the recognition results of an embodiment of the method of the present invention;

in fig. 2, 1 and 2 are feature points with different colors, 3 is an auxiliary identification point, and 4 is a shooting camera; in fig. 3, 1 and 2 are identified different feature points, and 3 is an identified auxiliary identification point.

Detailed Description

The invention is described in further detail below with reference to the following figures and specific embodiments:

as shown in fig. 1, the embodiment and implementation process of the present invention are as follows:

step (2): reading in images shot by the multi-view camera for a moving object, wherein the aircraft is taken as a target object in the embodiment, and the simplified shooting model is shown in fig. 2. And carrying out distortion correction, preprocessing Gaussian noise reduction and converting the image color space into a Lab space, and taking the image in the preprocessed Lab color space as a solving target.

Borrowing XYZ color space from RGB color space to Lab color space, wherein the specific calculation formula is as follows:

R^*＝R/255，G^*＝G/255，B^*＝B/255

r, G, B represents the RGB values of the pixels in the original image, R^*、G^*、B^*Which respectively represent values for scaling RGB values to a range between 0 and 1, and X, Y, Z respectively represent hypothetical primary XYZ values in the XYZ color space, which can be calculated by the above formula. L, a, b represent three values in the Lab color space, respectively, where L represents luminance, a represents a component from green to red, and b represents a component from blue to yellow. H represents a coefficient matrix, X_n、Y_n、Z_nThe sum of coefficients in the first row, the sum of coefficients in the second row and the sum of coefficients in the third row of a coefficient matrix H in matrix operation is used for ensuring that XYZ values are mapped in the same range as RGB values; f (t) is a correction function required for calculating three values of L, a and b, t represents an independent variable and is expressed by

Bringing in.

In one embodiment, the coefficient matrix is

If the calculated value of L is negative, 0 is taken.

selecting a plurality of marking pixels of the ROI area and the background area through man-machine interaction, and averaging the color values of the marking pixels of the ROI area and the background area after obtaining the color values to be used as respective color sampling values; in the embodiment, 10-20 points are randomly selected from different areas respectively, and the average value is taken as a sampling value after the color values are obtained.

After obtaining the color sampling values, the ROI area is extracted from the image:

let the color sampling value of the background region be L₀、a₀、b₀Color sampling value of ROI region is L₁、a₁、b₁ROI region pixel set is U_i(i is more than 0 and less than m), the pixels of the full image form a full set U_mThen U is_i∈U_m. Meanwhile, each pixel in the image stores a three-channel Lab value, and the three-channel Lab color value of a pixel p point is set to be L_p、a_p、b_p。

For each pixel p, the following formula euclidean distance d is used_pkTo determine whether the pixel belongs to the ROI region or the background region:

wherein d is_pkThe sampling value of the pixel p in a region k is represented, and k represents the sequence number of the region and represents whether the region is a background region or an ROI region; a is_p、b_pRespectively representing the components of the two channels a, b in Lab color space, a, of the pixel p_k、b_kRespectively representing the components of a and b channels of the color sampling value of the region k in the Lab color space, specifically a₀、b₀Respectively representing the components of a and b channels of the color sampling value of the background area in Lab color space, a₁、b₁Representing the components of a channel and a channel b in Lab color space of the color sampling value of the ROI area;

the specific method comprises the following steps:

1) popping the pixels at the top of the stack to serve as popped pixels;

3) the euclidean distance in (3) is calculated in the same manner as in the step (3).

And (4.1) the calculation process is a preprocessing process, and only one time of processing is needed in the experimental process, so that the color sampling values of the characteristic points and the auxiliary identification points are obtained.

(4.2) color sample value of ROI area is L₁、a₁、b₁The color sampling value of the characteristic point is L₂、a₂、b₂The color sampling value of the auxiliary identification point is L₃、a₃、b₃. For each pixel p, the following formula euclidean distance d is used_pkJudging whether the pixel belongs to the characteristic point region or the auxiliary identification point region:

wherein d is_pkThe sampling value of the pixel p in a region k is represented, and k represents the sequence number of the region and represents whether the region is a background region or an ROI region; a is_p、b_pRespectively representing the components of the two channels a, b in Lab color space, a, of the pixel p_k、b_kThe color sampling values of the region/point k respectively represent components of a channel and b channel in Lab color space, specifically a₁、b₁Representing the components of a channel and a channel b in Lab color space of the color sampling value of the ROI area; a is₂、b₂Representing the components of a channel a and a channel b of the color sampling value of the characteristic point in the Lab color space; a is₃、b₃Representing the components of the color sampling values of the auxiliary identification points in a channel a and a channel b in the Lab color space;

(4.3) after continuous pixels are subjected to connected domain processing, performing circle fitting on all the identified characteristic point areas/auxiliary identification point areas, and taking the obtained circle center as the central coordinate of the characteristic point area/auxiliary identification point area, namely the coordinate of the characteristic point/auxiliary identification point; counting the number of the characteristic points/auxiliary identification points according to the number of the coordinates, comparing the number with the actual number of the characteristic points/auxiliary identification points on the known moving object, and if the number is consistent, successfully identifying, and carrying out the next step; otherwise, the identification fails, the image is invalidated, and the step (1) is returned to continue.

In this embodiment, two groups of feature points with different colors are used, and thus the obtained sampling values are L respectively₂、a₂、b₂，L₃、a₃、b₃(ii) a Averaging all pixel value colors belonging to the auxiliary identification point to obtain an auxiliary identification point color sampling value L₄、a₄、b₄。

The calculation process is a preprocessing process, only needs to be processed once in an experimental process, and aims to obtain the color sampling values of the characteristic points and the auxiliary identification points.

Then, the ROI area is calculated and processed again according to the similar method, and the characteristic point area is obtained. The sampling mean value of ROI is L₁、a₁、b₁The color sampling value of the characteristic point is L₂、a₂、b₂，L₃、a₃、b₃The color sampling value of the auxiliary identification point is L₄、a₄、b₄The euclidean distance dpk is also used to determine the subordinate region of a pixel. The calculation formula is as follows:

where pixel p needs to traverse the image. If d is_p1<d_p2，d_p1<d_p3And d is_p1<d_p4The pixel belongs to a non-feature point region; if d is_p2<d_p1，d_p2<d_p3And d is_p2<d_p4The pixel belongs to the feature point 1 region; if d is_p3<d_p1，d_p3<d_p2And d is_p3<d_p4Then the pixel belongs to the feature point 2An area; if d is_p4<d_p1，d_p4<d_pAnd d is_p4<d_p3Then the pixel belongs to the auxiliary recognition point region. And performing circle fitting on the identified characteristic points and the auxiliary identification point areas, and taking the obtained circle center as the coordinates of the characteristic points.

Then, quantity comparison needs to be performed on the identified feature points, 4 feature points are set for each color in the embodiment, if the quantity of the identified feature points is exactly 4, successful identification can be considered, and coordinates of all the identified feature points are calculated; if the number of the feature points is not 4, the recognition is considered to be failed, the image is invalidated, and the step (1) is returned to continue to carry out;

and sequencing the feature points according to the distance from the feature points to the marking straight line from near to far to determine whether the feature points in different images shot by the multi-view camera at the same time are the same, and regarding the feature points with the same sequence in different images as the same feature point.

Matching the same characteristic point in different images shot by the multi-view camera at the same time, and calculating the three-dimensional world coordinates of the characteristic point by using the camera parameters obtained by calibration in the step (1).

Claims

1. A moving object image feature point identification and reconstruction method based on multi-view vision is characterized by comprising the following steps:

2. The moving object image feature point identification and reconstruction method based on multi-view vision as claimed in claim 1, characterized in that in the step (2), the XYZ color space is borrowed by converting from the RGB color space to the Lab color space, and the specific calculation formula is as follows:

R^*＝R/255，G^*＝G/255，B^*＝B/255

r, G, B represents the RGB values of the pixels in the original image, R^*、G^*、B^*Respectively representing values for scaling RGB values to a range between 0 and 1, X, Y, Z respectively representing hypothetical three primary XYZ values in the XYZ color space; l, a, b represent three values in the Lab color space, respectively, where L represents luminance, a represents a component from green to red, and b represents a component from blue to yellow; h represents a coefficient matrix, X_n、Y_n、Z_nThe sum of coefficients in the first row, the sum of coefficients in the second row and the sum of coefficients in the third row of a coefficient matrix H in the matrix operation; f (t) is a correction function required for calculating three values of L, a and b, t represents an independent variable and is expressed by

Bringing in.

3. The moving object image feature point identification and reconstruction method based on multi-view vision as claimed in claim 1, wherein in the step (3), a plurality of respective landmark pixels of the ROI region and the background region are selected through human-computer interaction, and color values of the respective landmark pixels of the ROI region and the background region are obtained and then averaged to be used as respective color sampling values; after obtaining the color sampling values, the ROI area is extracted from the image:

4. The method as claimed in claim 1, wherein the moving object image feature point identification and reconstruction method based on multi-view vision,

the step (4) is specifically as follows:

1) popping the pixels at the top of the stack to serve as popped pixels;

4) continuously iterating the processes 1) to 3) until the Euclidean distances of all adjacent pixels are not less than the preset similarity metric value T at the current iteration_eOr all adjacent pixels are marked as feature points or auxiliary pointsIf not, stopping iteration;

(4.2) for each pixel p, the following formula Euclidean distance d is adopted_pkJudging whether the pixel belongs to the characteristic point region or the auxiliary identification point region:

5. The moving object image feature point identification and reconstruction method based on multi-view vision as claimed in claim 1, characterized in that in said step (5), the feature points are sorted from near to far according to the distance to the marking straight line to determine whether each feature point in different images shot by the multi-view camera at the same time is the same, and the feature points in different images with the same sort are regarded as the same feature point.

6. The moving object image feature point identification and reconstruction method based on multi-view vision as claimed in claim 1, characterized in that in step (6), the same feature point in different images shot by the multi-view camera at the same time is matched, and the three-dimensional world coordinates of the feature point are calculated by using the camera parameters obtained by calibration in step (1).