CN113705388A

CN113705388A - Method and system for positioning space positions of multiple persons in real time based on camera information

Info

Publication number: CN113705388A
Application number: CN202110931717.8A
Authority: CN
Inventors: 顾海军; 肖世锋; 包飞; 贾明; 田楠; 罗清; 黄健; 侯丽娟; 刘岱; 曲量; 张晔; 彭莉; 石育; 鲁谨慈; 黄祥勇; 刘祖军
Original assignee: State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd; Xiangxi Power Supply Co of State Grid Hunan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Hunan Electric Power Co Ltd; Xiangxi Power Supply Co of State Grid Hunan Electric Power Co Ltd
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-11-26
Anticipated expiration: 2041-08-13
Also published as: CN113705388B

Abstract

The invention discloses a method and a system for positioning the space position of multiple persons in real time based on camera information, wherein the method specifically comprises the following steps: 1) calibrating each path of video camera to obtain internal parameters, external parameters and distortion coefficients of the camera; 2) acquiring a video image, detecting the head of a person in the video image, and calculating the center position of the head; 3) obtaining rays in each path of video image through the calibration of a video camera and the center position of the head of a person in the video image; each ray starts from the optical center of the video image and passes through the head center position of the person detected in the video image; 4) calculating the minimum distance between all rays in the video images of different paths, and taking the midpoint of the connecting line of the minimum distance between all the rays as an effective derivation point; 5) and clustering the effective derived points, kicking off the false gathering points, and taking the rest gathering points as the spatial positions of the personnel. The invention has the advantages of good real-time performance, high positioning precision and the like.

Description

Method and system for positioning space positions of multiple persons in real time based on camera information

Technical Field

The invention mainly relates to the technical field of visual detection, in particular to a method and a system for positioning the space position of multiple persons in real time based on camera information.

Background

The positioning of the multi-person spatial position based on the camera shooting information means that indoor video data are obtained through a plurality of cameras, and the spatial position of a person is deduced from the video data. The positioning of the multi-person space position based on the camera information is the core of monitoring the behavior of the staff in the high-voltage room, and becomes an indispensable link for the safety responsible person to leave the post, the staff members to leave the sight of the safety responsible person, the three-way traffic process identification, the process staff identity identification and the like.

Methods for positioning multi-person spatial positions based on camera information can be divided into two categories: one is to locate the spatial position of a person by reconstructing the 3D contour of the person, i.e., the human body of each path of video picture is segmented and homography matrices of different heights are calculated, and then the segmentation result is multiplied by the homography matrices to obtain the 3D contour of the person. The other type is a method for establishing association of different camera video images, which identifies that the detected people in the different camera video images are the same person, and then locates the spatial position of the person in different image areas by the same person. Both of these methods have certain drawbacks in locating the spatial position of the person. The computation time required by the 3D contour method is difficult to meet the real-time property; secondly, the number of the video paths needing to be shot for accurate 3D contour reconstruction is large, and the practical application of the method is further reduced. The association rule is influenced by factors such as shielding, imaging distortion and image boundary effect, so that the accuracy of associating the same person in different video image areas is reduced, and the positioning accuracy is influenced.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the problems in the prior art, the invention provides a method and a system for positioning the space positions of multiple persons in real time based on camera information, which have good positioning real-time performance and high positioning precision.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a method for positioning the space position of multiple persons in real time based on camera information comprises the following steps:

1) calibrating each path of video camera to obtain internal parameters, external parameters and distortion coefficients of the camera;

2) acquiring a video image, detecting the head of a person in the video image, and calculating the center position of the head;

3) obtaining rays in each path of video image through the calibration of a video camera and the center position of the head of a person in the video image; each ray starts from the optical center of the video image and passes through the head center position of the person detected in the video image;

4) calculating the minimum distance between all rays in the video images of different paths, and taking the midpoint of the connecting line of the minimum distance between all the rays as an effective derivation point;

5) and clustering the effective derived points, kicking off the false gathering points, and taking the rest gathering points as the spatial positions of the personnel.

As a further improvement of the above technical solution:

the specific process of the step 3) is as follows:

3.1) normalizing the pixel coordinates of the points before distortion to obtain normalized coordinates of distortion-removed points;

and 3.2) acquiring the position of the video camera in a world coordinate system and the direction of the ray, and then obtaining the corresponding ray according to the coordinate of the distortion removing point.

In step 4), the number of heads of the detected person in the way A video is n1, and corresponding rays ra1, ra2, … and ran1 are derived; the number of heads of the detected person in the B-path video is n2, and corresponding rays derived from the B-path video are rb1, rb2, … and rbn 2; for the ray rai of the A-path video and the ray rbj of the B-path video, the distance defines: min (Dist (p, rai) + Dist (p, rbj)), and the constraint condition is that Dist (p, rai) ═ Dist (p, rbj), that is, a spatial p point where the distance between rays rai and rbj is minimum is found, and the minimum distance is defined as the distance between rays rai and rbj, and the constraint condition ensures the uniqueness of the p point; point p is referred to as the derivative point of the minimum distance between rays rai and rbj; if the minimum distance of rays rai and rbj is less than a given threshold, then the minimum distance valid derivative point is considered.

In the step 5), the convergence points of the effective derivation points are potential spatial positions of the personnel, the convergence points are sorted, and corresponding rays are preferentially distributed to the convergence points arranged in the front; and if the subsequent convergence point possibly has only one ray or no distributable ray, the convergence point is considered as a false convergence point, the false convergence point is kicked off, and the rest convergence points are the personnel space positions.

The principle of ordering takes into account two factors: the height corresponding to the convergence point is required to be between 1.5 and 1.85 meters, and the minimum distance of the rays contained in the class corresponding to the convergence point is required to be derived as many as possible.

In the step 5), a mean shift method, a peak density method or a hierarchical clustering method is adopted for clustering.

The process of clustering by adopting a mean shift method comprises the following steps:

stretching effective points in a three-dimensional scale by adopting a Gaussian kernel function, taking each effective point as a class center initially, and finding out a point with a distance from the class center being smaller than a preset bandwidth as a cluster for each class center; calculating vectors from the class center to each point in the cluster for each cluster, adding the vectors to be recorded as an offset vector, and moving the class center along the offset vector to be used as a new class center; iteratively moving the class center in the way until convergence; and combining the class centers, wherein the distance between the class centers is smaller than the preset bandwidth, and each class center is used as the three-dimensional position of each head.

The invention also discloses a system for positioning the space positions of multiple persons in real time based on the camera information, which comprises the following steps:

the first module is used for calibrating each path of video camera to obtain internal parameters, external parameters and distortion coefficients of the camera;

the second module is used for acquiring the video image, detecting the head of a person in the video image and calculating the center position of the head;

the third module is used for obtaining rays in each path of video image through the calibration of the video camera and the center position of the head of a person in the video image; each ray starts from the optical center of the video image and passes through the head center position of the person detected in the video image;

the fourth module is used for calculating the minimum distance among all rays in different paths of video images, and the continuous midpoint of the minimum distance among all the rays is used as an effective derivation point;

and the fifth module is used for clustering the effective derived points, kicking off the false gathering points, and taking the rest gathering points as the spatial positions of the personnel.

The invention further discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for locating a spatial position of a plurality of persons in real time on the basis of camera information as described above.

The invention also discloses computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program executes the steps of the method for positioning the space positions of the multiple persons in real time based on the camera shooting information when being executed by the processor.

Compared with the prior art, the invention has the advantages that:

the invention relates to a method for positioning the space position of multiple persons in real time based on camera information, which comprises the steps of firstly, calibrating a video camera to determine the internal reference, the external reference and the distortion coefficient of the camera, detecting the heads in video images in all paths and obtaining the central position of the heads; calibrating the center position of the head in the image by a camera, wherein each center position of the head corresponds to a ray starting from an optical center, and the real head position is on the ray; calculating the minimum distance between the rays in the video images of different paths, and determining a point on the space from the minimum distance, wherein the point is equal to the two rays in distance and the sum of the distances is equal to the minimum distance between the two rays; if the minimum distance is smaller than a given threshold value, the determined point is called an effective derivation point of the minimum distance between the rays; clustering the effective lead-out points, and eliminating false clustering points by using the height of a person and only corresponding information of one ray, wherein the positions of the rest clustering points are the spatial positions of the person; the invention has good real-time performance and high positioning precision, is not limited to the space positioning of personnel in the high-pressure chamber, and is suitable for the space positioning of a plurality of personnel in any scene.

The invention adopts deep learning to detect the heads of people, the central position of each head is from the optical ray of a video camera, and the real head position is on the ray; according to the method, the 3D reconstruction is performed only by intersecting the head with the rays, so that the problem that the real-time performance of all the traditional 3D reconstruction is poor is solved, and the real-time performance is good; the head is detected by adopting deep learning, and under the condition that only part of targets are reconstructed, the 3D reconstruction precision is greatly improved, so that the positioning precision is high.

Drawings

FIG. 1 is a flow chart of an embodiment of the method of the present invention.

FIG. 2 is a diagram illustrating a result of multi-person positioning reconstruction in a specific application of the method of the present invention; the image shot by three different cameras at the same time point is shown in the figure, the frame in the figure is the recognition result, and the frame with the same gray level represents the same person.

Detailed Description

The invention is further described below with reference to the figures and the specific embodiments of the description.

From the camera model, the spatial head position of the person is located on a ray connecting from the optical center to the head position in the image. For the same person, if the head of the person is detected by the way of video image, corresponding rays exist, and the intersection point of the rays corresponding to different ways of videos is the head space 3D position of the person. Based on the facts, the invention provides a method for locating the space position of multiple people in real time based on the camera information, which is shown in fig. 1, and the method specifically comprises the following steps:

In a specific embodiment, in the step 1), the video camera calibration implementation manner is based on a checkerboard calibration principle, that is, checkerboard grids are printed on the ground, and the internal reference, the external reference and the distortion coefficient of the video camera are corrected according to the relationship among the checkerboard grids, so that the calibration manner is mature and reliable, and the operation is simple and convenient.

In a specific embodiment, in step 2), the detection of the head of the person in the video image may employ a deep learning detection network or other detection networks.

In a specific embodiment, the specific process of step 3) is:

Specifically, the specific implementation process of step 3.1) is as follows:

known as c_x，c_yIs the center of the image, f_x，f_yFocal length in two directions, distortion coefficient k₁，k₂，k₃The coordinates of the distortion front point can be extracted from the txt file as point ═ x, y]；

The coordinate relationship of the points before and after distortion is as follows:

x₁＝u(1+k₁r²+k₂r⁴+k₃r⁶)

y₁＝v(1+k₁r²+k₂r⁴+k₃r⁶)

wherein x₁，y₁The normalized coordinates before distortion, and u and v are normalized coordinates after distortion;

due to r²＝u²+v²And r is also an unknown quantity, so that the u and v are solved iteratively by using fixed point positioning, specifically as follows:

1. firstly, normalizing the pixel coordinates of the point before distortion:

x₁＝(x-c_x)/f_x

y₁＝(y-c_y)/f_y

2. then, the initial value of u, v is given as x₁，y₁，

3. Calculating currs ═ 1+ k according to the values of u and v₁r²+k₂r⁴+k₃r⁶)；

4. According to

Updating the values of u and v;

5. repeating the steps 3 and 4 for five times, so as to obtain the normalized coordinates u, v of the distortion point.

Specifically, the specific implementation process of step 3.2) is as follows:

the corresponding ray is obtained according to the coordinates of the distortion point, and the position of the corresponding video camera in a world coordinate system and the direction of the ray need to be solved.

C represents a video camera coordinate system, and W represents a world coordinate system;

firstly, solving the position of a video camera in a world coordinate system;

the coordinate of the optical center of the video camera in the world coordinate system is X_WThe coordinate under the video camera coordinate system is X_CThen, there is a transformation relation:

X_C＝R_{W_C}*X_W+T_{W_C}

X_W＝R_{C_W}*X_C+T_{C_W}

wherein R is_{W_C}And R_{C_W}Are transposed matrices to each other.

Need to solve for X_WThe coordinate of the optical center of the video camera in the video camera coordinate system is X_C＝[0,0,0]^TSo that X_W＝T_{C_W}；

The conversion relation X can be obtained according to a formula_W＝T_{C_W}＝-R_{C_W}*T_{W_C}；

Then, the direction of the ray is determined, and in the video camera coordinate system, the direction of the ray is the direction of the ray [ u, v,1 ═ n]^TThen in the world coordinate system the direction of the ray is rotated, i.e. the direction is changed to R_{C_W}*direction。

In a specific embodiment, in step 4), the number of heads of the detected person in the video in the path a is n1, and the corresponding rays ra1, ra2, … and ran1 are derived, and the number of heads of the detected person in the video in the path B is n2, and the corresponding rays rb1, rb2, … and rbn2 are derived. For the ray rbj of the video of the path a and the path B, the distance defines min (Dist (p, rai) + Dist (p, rbj)), and the constraint condition is that Dist (p, rai) ═ Dist (p, rbj), i.e. searching a spatial p point, the distance to the ray rai and rbj reaches the minimum, the minimum distance is defined as the distance of the ray rai and rbj, and the constraint condition ensures the uniqueness of the p point. The p point is called the minimum distance derivation point between rays rai and rbj, and if the minimum distance between rays rai and rbj is less than a given threshold (e.g., 150cm), then the minimum distance derivation point is considered valid.

Specifically, the minimum distance between two rays in step 4) is calculated as:

wherein r is_a(t)＝(x_a+m_at,y_a+n_at,z_a+l_at)，r_b(t)＝(x_b+m_bt,y_b+n_bt,z_b+l_bt) is a ray equation;

the solution to the optimization problem described above can be done using a derivative method. To obtain

Thereafter, the two ray-derived points p can be determined:

if it is

Less than 150cm squared, p is considered to be the minimum distance effective derivation point.

In a specific embodiment, in step 5), the minimum distance effective derivation points are clustered, the clustering method is a mean shift method, the kernel function is a three-dimensional gaussian function, the covariance matrix of the gaussian function is a diagonal matrix, and the element value on the diagonal is 100. Certainly, the effective minimum distance derivation point distance is not limited to the mean shift method, and other methods such as a peak density method and hierarchical clustering can be adopted.

Specifically, the aggregation points of the effective derivation points are potential spatial positions of the personnel, the aggregation points are sorted, and two factors are considered in the sorting principle: the height corresponding to the convergence point is required to be between 1.5 and 1.85 meters, and the minimum distance of the rays included in the class corresponding to the convergence point is more possible. For the row-ahead convergence point, the corresponding ray is preferentially assigned. And if the subsequent convergence point possibly has only one ray or no distributable ray, the convergence point is considered as a false convergence point, the false convergence point is kicked off, and the rest convergence points are the personnel space positions.

The method for eliminating the false aggregation points comprises the following steps: the foci are first classified into two categories: a type I focus point, the height of which is between 1.5 meters and 1.85 meters; the type II condensation point is not between 1.5 and 1.85 meters in height. And aiming at the point of convergence of the type I, sorting the points from large to small according to the number of the minimum distance derived points contained in the corresponding category of the point of convergence. After finishing the sorting of the type I gather points, sorting the type II gather points according to the same principle. All the type II foci are ranked after the type I foci. Considering that each minimum distance derived point corresponds to two rays in different video cameras, effective derived points corresponding to the sequenced previous gathering points are processed firstly, namely rays corresponding to the derived points are identified, and each ray can only be identified once at most. And regarding the subsequently processed point focus, if the number of the rays which are not identified is less than 2, the point focus is considered as a false point focus.

As shown in FIG. 2, the above invention is further described in detail with reference to a specific embodiment:

(1) acquiring four paths of video image data, aligning the video image data according to time points, inputting each path of video image into a depth detection network, and detecting the head area of a person in the image.

(2) And calibrating the four paths of video cameras. Firstly, printing a template and pasting the template on a plane, then shooting a plurality of template images from different angles, detecting characteristic points of the images, then solving internal parameters and external parameters of a camera under an ideal distortion condition, improving the precision by using maximum likelihood estimation, then calculating an actual radial distortion coefficient by using least square, then integrating the internal parameters, the external parameters and the distortion coefficient, optimizing estimation by using a maximum likelihood method, improving the estimation precision, and finally obtaining the internal parameters, the external parameters and the distortion coefficient of the video camera.

(3) And obtaining rays according to the calibration information and the detection result. The optical center is determined using the video camera calibration information, and rays are determined that start at the optical center and pass through the detected center region of the head in the image. According to the video camera imaging model, the real space position of the human head is on a ray; the step of obtaining the ray is to normalize the head center coordinate of the image detection by utilizing the video camera internal parameter and the distortion model, and then to convert the normalized coordinate into a ray equation through the video camera external parameter.

(4) And calculating the minimum distance between the rays derived from the video images of different paths and the derived point thereof. The four cameras are paired pairwise, and six groups of calculation are provided (1-2 paths, 1-3 paths, 1-4 paths, 2-3 paths, 2-4 paths and 3-4 paths). Each group of specific calculations is described in the summary of the invention. The minimum distance between any two rays derived from different paths of video images is calculated by finding a point on each of the two rays so that the distance between the two points is minimized. The centers of the two points sought are the minimum distance derivation points between the rays. If the minimum distance is less than 150cm, the derived point is called the effective point.

(5) And clustering the derived points. Clustering the derived points by adopting a mean shift method, wherein the clustering is specifically set as follows: stretching effective points in a three-dimensional scale by adopting a Gaussian kernel function, taking each effective point as a class center initially, taking the bandwidth as 150cm, and finding out points with the distance of less than 150cm from the class center as a cluster for each class center; calculating vectors from the class center to each point in the cluster for each cluster, adding the vectors to be recorded as an offset vector, and moving the class center along the offset vector to be used as a new class center; iteratively moving the class center in this manner until convergence (class center no longer changes); and the distance between the class centers is less than 150cm, the class centers are combined, and each class center is used as the three-dimensional position of each head.

(6) Eliminating false clustering points. And eliminating false clustering centers according to the prior information such as height and ray matching characteristics. Grouping reconstructed heights, height at (150cm, 185cm) considered a standing head position; the error of the reprojection in the standing state is minimum, and the focus is sorted by the height in the interval preferentially. The reconstructed three-dimensional ray can only belong to one convergent point, and the three-dimensional ray which belongs to a plurality of convergent points and has a height to judge the non-standing state is considered to be abnormal, so that the head frame corresponding to the three-dimensional ray is eliminated.

The invention adopts the deep learning technology to detect the heads of the personnel, the central position of each head is started from the optical ray of the video camera, and the real head position is on the ray. The video cameras correspond to the rays of the same person, and intersection points of the rays in the video images in different paths are calculated, namely the spatial coordinates of the head. And clustering the spatial coordinates, and eliminating false clustering points according to the height and the prior information that one ray only corresponds to one spatial coordinate and the like, wherein the position of the residual clustering points is the spatial position of the head of the person. Only the head is subjected to 3D reconstruction by adopting ray intersection, so that the problem of poor real-time performance of the traditional 3D reconstruction is solved; the head is detected by adopting deep learning, and the 3D reconstruction precision is greatly improved under the condition that only part of targets are reconstructed.

The system for positioning the space positions of the multiple persons in real time based on the camera shooting information corresponds to the method and has the advantages of the method.

The invention further discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for locating a spatial position of a plurality of persons in real time on the basis of camera information as described above. The invention also discloses computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program executes the steps of the method for positioning the space positions of the multiple persons in real time based on the camera shooting information when being executed by the processor. All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and executed by a processor, to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. The memory may be used to store computer programs and/or modules, and the processor may perform various functions by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A method for positioning the space position of multiple persons in real time based on camera shooting information is characterized by comprising the following steps:

2. The method for positioning the spatial position of the multiple persons in real time based on the camera information as claimed in claim 1, wherein the specific process of the step 3) is as follows:

3. The method for locating the spatial position of a plurality of persons in real time based on the camera information as claimed in claim 1, wherein in the step 4), the number of heads of the detected persons in the A-way video is n1, and corresponding rays ra1, ra2, … and ran1 are derived; the number of heads of the detected person in the B-path video is n2, and corresponding rays derived from the B-path video are rb1, rb2, … and rbn 2; for the ray rbj of the video of the path a and the path B, the distance is defined as min (Dist (p, rai) + Dist (p, rbj)), and the constraint condition is that Dist (p, rai) ═ Dist (p, rbj), that is, a spatial p point where the distance between the ray rai and the ray rbj is minimum is found, the minimum distance is defined as the distance between the ray rai and the ray rbj, and the constraint condition ensures the uniqueness of the p point; point p is referred to as the derivative point of the minimum distance between rays rai and rbj; if the minimum distance of rays rai and rbj is less than a given threshold, then the minimum distance valid derivative point is considered.

4. The method for positioning the spatial position of multiple persons in real time based on the camera information according to any one of claims 1 to 3, wherein in the step 5), the convergence points of the effective derivation points are the potential spatial positions of the persons, the convergence points are sorted, and the corresponding rays are preferentially distributed to the convergence points arranged in the front; and if the subsequent convergence point possibly has only one ray or no distributable ray, the convergence point is considered as a false convergence point, the false convergence point is kicked off, and the rest convergence points are the personnel space positions.

5. The method for real-time multi-person spatial location based on camera information as claimed in claim 4, wherein the ranking principle takes into account two factors: the height corresponding to the convergence point is required to be between 1.5 and 1.85 meters, and the minimum distance of the rays contained in the class corresponding to the convergence point is required to be derived as many as possible.

6. The method for positioning the spatial position of multiple persons in real time based on the camera information as claimed in claim 5, wherein in the step 5), the clustering is performed by using a mean shift method, a peak density method or a hierarchical clustering method.

7. The method for positioning the spatial position of the multiple persons in real time based on the camera information as claimed in claim 6, wherein the clustering process by the mean shift method comprises:

8. A system for positioning the space position of a plurality of people in real time based on camera shooting information is characterized by comprising:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for locating a multi-person spatial position in real time based on camera information according to any one of claims 1 to 7.

10. Computer device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the computer program, when being executed by the processor, performs the steps of the method for locating a multi-person spatial position in real time based on camera information according to any one of claims 1 to 7.