CN113077519A

CN113077519A - Multi-phase external parameter automatic calibration method based on human skeleton extraction

Info

Publication number: CN113077519A
Application number: CN202110289301.0A
Authority: CN
Inventors: 关俊志; 耿虎军; 高峰; 柴兴华; 陈彦桥; 张泽勇; 李晨阳; 王雅涵; 彭会湘; 陈韬亦
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-07-06
Anticipated expiration: 2041-03-18
Also published as: CN113077519B

Abstract

The invention discloses a multi-phase external parameter automatic calibration method based on human skeleton extraction, and belongs to the technical field of computer vision. Processing each frame of image, and extracting the positions of human skeletal joint points in the image by using a deep learning method; selecting any one camera coordinate system as a world coordinate system, and calculating external parameters of other cameras through an essential matrix; the translation vector scale is calculated by using the human body size information. The method takes human skeleton joint points as characteristic points, takes point cloud formed by motion tracks of the human skeleton joint points as a virtual calibration object, then calculates an essential matrix between cameras, obtains the relative pose between the cameras through essential matrix decomposition, and completes real-time online accurate external reference calibration of a multi-camera system.

Description

Multi-phase external parameter automatic calibration method based on human skeleton extraction

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a method for calibrating online external parameters of a multi-camera system through pedestrian skeleton extraction.

Background

In the related fields of computer vision technology and artificial intelligence, the application of a multi-camera system in the fields of scene reconstruction, smart city safety monitoring, airport monitoring, motion capture, sports video analysis, industrial measurement and the like needs to carry out accurate and quick external parameter calibration on the multi-camera system. The camera external parameters are a set of parameters representing attributes such as the position, the rotation direction and the like of the camera in a world coordinate system, so that calibration needs to be performed after the camera is installed. The calibration of the external parameters of the multi-phase machine is the process of obtaining the external parameters of the multi-phase machine

In the conventional calibration method, known scene structure information is used for calibration, and the conventional calibration method usually involves the manufacture of an accurate calibration object, a complex calibration process and high-precision known calibration information and requires complex operation of a professional. Moreover, each time the position of the camera set is changed, a recalibration operation is required.

Disclosure of Invention

In order to solve the technical problems, the invention provides a multi-phase external parameter automatic calibration method, which takes pedestrians frequently existing in a scene as a calibration object, can realize online real-time calibration of a camera system, and provides a basis for later application such as scene understanding monitoring and the like.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a multi-phase external parameter automatic calibration method based on human skeleton extraction comprises the following steps:

(1) enabling a single pedestrian to walk in a camera monitoring area, and simultaneously recording videos by a plurality of cameras to obtain synchronized videos;

(2) intercepting images of pedestrians with the same frame number at different positions from each video;

(3) processing each frame of image, and extracting pedestrian bone joint points in the image by using a deep learning algorithm to obtain image pixel coordinates of each bone joint point;

(4) calculating the image physical dimension coordinates of each bone joint point by using the image pixel coordinates of each bone joint point according to the known camera internal parameters;

(5) and selecting any one camera coordinate system as a world coordinate system, and calculating the external parameters of other cameras by using the image physical dimension coordinates of the skeletal joint points and the essential matrix.

Wherein, the specific mode of the step (3) is as follows:

(301) performing neural network prediction on each frame of image to obtain a thermodynamic diagram and a partial affinity field of each skeletal joint;

(302) extracting specific image positions and confidence degrees of the joints from the thermodynamic diagram by applying a non-maximum suppression algorithm;

(303) finding limb links by using the extracted joint information and part of the affinity fields to obtain all connections, wherein each connection is regarded as a limb;

(304) the limbs with the same joint are regarded as the limbs of the same person, the limbs are assembled to form the person, and image pixel coordinates of all the skeletal joint points are obtained.

Wherein, the specific mode of the step (5) is as follows:

(501) recording discrete three-dimensional point cloud of three-dimensional positions of skeleton joint points of human body under different cameras

k is a camera mark, i is a skeletal joint mark, and t represents different moments;

(502) the camera coordinate system of one camera is arbitrarily selected as a first world coordinate system, and the discrete three-dimensional point cloud of the camera is

Then

Wherein R is^kAnd c^kThe external parameters of the camera k are a rotation matrix and a translation vector;

(503) selecting matching points of at least eight pairs of skeletal joint points to calculate an essential matrix E^kThen by decomposing the essential matrix E^kTo obtain c^kAnd R^k(ii) a Wherein the essential matrix E^kThe calculation method is as follows:

the imaging bone joint points and the center points of the two cameras form a plane, i.e.

And c^kThree vectors are in the same plane, and can be obtained:

will be provided with

And

brought into the above formula, XiaoFalling off device

And

the following can be obtained:

wherein E is^k＝|c^k|_×R^kIs the essential matrix, | c^k|_×Is a vector c^kThe anti-symmetric matrix of (a) is,

and

respectively the physical size coordinates of the images of the bone joint point i at the moment t in the selection camera and the camera marked as k,

and

respectively the vertical coordinates of the bone joint point i at the time t in the selected camera and the camera marked as k;

(504) c obtained by triangulation and calculation in step (503)^kAnd R^kCalculating the coordinates of two different skeletal joint points in the camera coordinate system denoted by k

And

the distance between two different skeletal joint points is

And using the actual physical length between two known skeletal joint points

Calculating to obtain scale information lambda^k：

(505) Adding the scale information to the translation vector to obtain the actual translation vector of each camera as lambda^kc^k。

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides an effective multi-camera system method, which can obtain good calibration effect without additional calibration objects and complicated calibration processes.

2. The method is simple and easy to implement, and can carry out automatic online calibration under the condition that a multi-camera system does not shut down, thereby greatly improving the calibration efficiency.

3. Feature point matching, scale calculation and online calibration of a multi-camera system have been research hotspots in the field, and at present, common methods are roughly divided into two types: one type is a calibration method based on the traditional calibration object, and although the method can obtain good effect, the method has high requirement on the manufacturing precision of the calibration object, the calibration flow is complicated, and online calibration cannot be realized; the other type is a self-calibration method, a specially-made calibration object is not needed in the method, the corresponding relation between cameras is established by depending on feature points in an image, but the method cannot establish the corresponding relation of the feature points under the condition that the visual angle between the cameras is large, so that the application difficulty in a real scene is high, and the translation vector has no actual scale information. In view of the above, the invention firstly uses human skeleton joint points as feature points, uses point cloud formed by motion tracks of the human skeleton joint points as a virtual calibration object, calculates the relative pose between cameras by an essential matrix principle, and provides a scale calculation method based on human physical dimensions to solve the problem of scale uncertainty in camera calibration. This approach is an important innovation over the prior art.

Drawings

Fig. 1 is a flowchart of a calibration method of a multi-camera system according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a human skeleton extracted by a deep learning algorithm in the embodiment of the present invention.

Fig. 3 is a schematic diagram of an essential matrix adopted in the embodiment of the present invention.

Detailed description of the invention

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

step 1, after a multi-camera system is installed, enabling a single pedestrian to walk in a camera monitoring area, and simultaneously recording videos by multiple cameras to obtain synchronized videos;

step 2, intercepting images of pedestrians with the same frame number at different positions from each video;

step 3, processing each frame of image, extracting pedestrian skeleton joint points in the image by using a convolutional neural network, and obtaining the pixel coordinate of each joint point:

step 3.1, carrying out neural network prediction on the image to obtain a thermodynamic diagram (Heatmap) and a partial Affinity Field (Part Affinity Field) of each skeletal joint point;

3.2, extracting the specific image position and the confidence coefficient of the joint from the thermodynamic diagram by applying a non-maximum suppression (NMS) algorithm;

3.3, finding the limb links by utilizing the joint information and part of the affinity fields to obtain all connections, wherein each connection can be regarded as a limb;

and 3.4, after all the limbs are obtained, regarding the limbs with the same joint as the limbs of the same person, assembling the limbs to form a person, and obtaining the image pixel coordinates of the skeletal joint points of the person.

Step 4, calculating the image physical dimension coordinates of each bone joint point according to the known camera internal parameters, and recording the coordinates as

and 5, selecting any one camera coordinate system as a world coordinate system, and calculating external parameters of other cameras through the essential matrix:

step 5.1, recording discrete three-dimensional point clouds of three-dimensional positions of human skeleton joint points under different cameras

step 5.2, selecting a camera coordinate system of the camera marked as 1 as a first world coordinate system, wherein the dispersed three-dimensional point cloud of the camera is

Wherein R is^kAnd c^kThe external parameters of the camera k are the rotation matrix and the translation vector;

step 5.3, selecting matching points of a plurality of pairs of skeletal joint points to calculate an essential matrix E^kThen by decomposing the essential matrix E^kTo obtain c^kAnd R^k(ii) a Wherein the essential matrix E^kThe calculation method is as follows:

the imaging bone joint point and the central points of the two cameras form a plane

c^kThree vectors are in the same plane, so that the following can be obtained:

will be provided with

And

brought into the above formula and eliminated

And

the following can be obtained:

wherein E is^k＝|c^k|_×R^kIs the essential matrix, | c^k|_×Is a vector c^kAn antisymmetric matrix of (a);

step 5.4, c obtained in step (503)^kAll lengths of (A) are normalized to 1, i.e. | | c^k1, in practical cases, c is generally | | |^k||≠||c^mI.e. the distances of the different cameras to the camera 1 are unequal, so that the scale information needs to be calculated. C is obtained by the above calculation assuming two different skeletal joint points^kAnd R^kAnd triangulation can calculate its coordinates in the k-coordinate system of the camera as

The distance between them is

If the actual physical length between two skeletal joint points is known to be

For example, the average length of a human arm, the following can be calculated:

step 5.5, finally, calculating and adding the scale information to the translation vector to obtain the translation vector lambda of each camera^kc^k。

The following is a more specific example:

referring to fig. 1, a method for calibrating a multi-camera system based on pedestrian head recognition includes the following steps:

a multi-camera system calibration method based on pedestrian head recognition comprises the following steps:

step 3, processing each frame of image, extracting pedestrian skeleton joint points in the image by using a convolutional neural network, and obtaining pixel coordinates of each joint point, as shown in fig. 2, the method comprises the following substeps:

step 3.1, carrying out neural network prediction on the image to obtain a thermodynamic diagram (Heatmap) and a partial Affinity Field (Part Affinity Field) of each joint point;

step 3.4, after all limbs are obtained, regarding the limbs with the same joint as the limbs of the same person, assembling the limbs to form a person, and obtaining image pixel coordinates of human skeletal joint points, wherein a specific skeleton extraction algorithm is shown in documents [1], [1] Z.Cao and G.Hidalgo Martinez and T.Simon and S.Wei and Y.A.sheikh.Openpos: real Multi-person 2D Point Estimation using Part Affinity fields InIEEE Transactions on Pattern Recognition and Machine understanding, doi: 10.1109/TPAMI.2019.29257.

step 5, selecting any one camera coordinate system as a world coordinate system, and calculating external parameters of other cameras through the intrinsic matrix, wherein the method comprises the following substeps:

step 5.3, obtaining the essence matrix E by calculating at least eight pairs of matching points of the skeletal joint points^kThen by decomposing the essential matrix E^kTo obtain c^kAnd R^kThe specific algorithm is shown in the literature [2]：

[2]H.C.Longuet-Higgins.A computer algorithm for reconstructinga scene from two projections.Nature，vol.293，pages 133-135，September 1981.

Essence matrix E^kThe calculation method is as follows:

c^kThree vectors are in the same plane, as shown in fig. 3, and thus:

will be provided with

And

brought into the above formula and eliminated

And

the following can be obtained:

step 5.4, c obtained in step (503)^kAll lengths of (A) are normalized to 1, i.e. | | c^k1, in practical cases, c is generally | | |^k||≠||c^mI, k and m are two different camera labels, i.e. the distances from the different cameras to the camera 1 are not equal, so that the scale information needs to be calculated. Assuming two different skeletal joint points, pass throughStep (c) is calculated^kAnd R^kAnd triangulation can calculate its coordinates in the k-coordinate system of the camera as

The distance between them is

If the actual physical length between two skeletal joint points is known to be

step 5.5, finally, calculating and adding the scale information to the translation vector to obtain the translation vector lambda of each camera^kc^k. The calibration error projection error of the method is 1.4 pixels, the attitude error is 0.5 degrees, the offset error is 1.0 percent, and the calibration result is accurate.

In a word, the method processes each frame of image, and extracts the positions of human skeletal joint points in the image by using a deep learning method; selecting any one camera coordinate system as a world coordinate system, and calculating external parameters of other cameras through an essential matrix; the scale of the translation vector is calculated by using the human body size information. The invention takes human skeleton joint points as characteristic points, takes point cloud formed by motion tracks of the human joint points as a virtual calibration object, solves a camera rotation matrix and a translation vector by using an essential matrix, and provides a translation vector scale calculation method based on human body dimension information to complete real-time online accurate external parameter calibration of a multi-camera system.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention. Any modification, improvement or the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-phase external parameter automatic calibration method based on human skeleton extraction is characterized by comprising the following steps:

2. The method for automatically calibrating the external parameters of the multiple phases based on the human body skeleton extraction as claimed in claim 1, wherein the specific manner of the step (3) is as follows:

3. The method for automatically calibrating the external parameters of the multiple phases based on the human skeleton extraction of claim 1, wherein the specific manner of the step (5) is as follows:

k is a camera mark, i is a skeletal joint mark, and the incense represents different moments;

Then

Wherein R is^kAnd c^kThe external parameters of the camera k are rotation matrix and translation vector;

(503) selecting matching points of a plurality of pairs of skeletal joint points to calculate an essential matrix E^kThen by decomposing the essential matrix E^kTo obtain c^kAnd R^k(ii) a Wherein the essential matrix E^kThe calculation method is as follows: