CN104318547B

CN104318547B - The many binoculars accelerated based on GPU splice intelligent analysis system

Info

Publication number: CN104318547B
Application number: CN201410528309.8A
Authority: CN
Inventors: 尚凌辉; 高勇; 王弘玥; 刘家佳; 余天明; 施展
Original assignee: ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd
Current assignee: ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd
Priority date: 2014-10-09
Filing date: 2014-10-09
Publication date: 2017-06-23
Anticipated expiration: 2034-10-09
Also published as: CN104318547A

Abstract

Splice intelligent analysis system the present invention relates to a kind of many binoculars accelerated based on GPU.The present invention is using binocular camera as video capture device, wherein binocular camera is ceiling mounted, vertical vertical view shoots ground, blocked mutually so as to ensure that the target for photographing does not exist, when scene domain cannot be covered with a binocular camera, many cameras are then installed, and form visual range by the way of the splicing；Based on the vertical depth map and cromogram for shooting, you can realize the intelligent video analysis function of various high accuracies.Two-way output video simultaneously for each binocular camera carries out parallel computation to accelerate analysis result using GPU.Present invention uses depth information, and RGB information is combined so that the stability of target detection is greatly improved, for follow-up intellectual analysis provide good basis.

Description

The many binoculars accelerated based on GPU splice intelligent analysis system

Technical field

The invention belongs to video brainpower watch and control technical field, it is related to a kind of many binoculars splicing intelligence point accelerated based on GPU Analysis system.

Background technology

At present, the Intellectual Analysis Technology based on video has been widely applied to all trades and professions, including bank, and traffic is public Peace etc..But actual effect is usually not fully up to expectations, to find out its cause, mainly have it is following some：

1) in order to cost-effective, many intelligent video analysis frequently with existing mounted camera, shooting angle and into As in effect, being often unfavorable for intellectual analysis.

2) common 2D cameras cannot judge the distance between target and camera, cause in same optical axis, different far and near mesh Mark is easy to overlap and blocks, so as to cause the erroneous judgement of intellectual analysis.

3) calculating platform of intellectual analysis is limited in one's ability, and algorithm is usually needed by after the lossy optimization of performance, ability Meet treatment in real time.

The content of the invention

The present invention is in view of the shortcomings of the prior art, there is provided a kind of to be shot based on vertical splicing and GPU speed-up computations Binocular video intelligent analysis system.

The technical solution adopted for solving the technical problem of the present invention is：

, using binocular camera as video capture device, wherein binocular camera is ceiling mounted, vertically bows for the present invention Depending on shooting ground, so that ensure the target for photographing in the absence of blocking mutually, when scene domain cannot be covered with a binocular camera Gai Shi, then install many cameras, and forms visual range by the way of the splicing；Based on the vertical depth map and colour for shooting Figure, you can realize the intelligent video analysis function of various high accuracies.

Described splicing is specifically：To the pixel in each magazine each target area, entered according to its depth information Then row is offset using set polyphaser and demarcated to the projection on ground, the association of target after being projected, so that World coordinates fastens the target detection realized under polyphaser.

Two-way output video simultaneously for each binocular camera carries out parallel computation to accelerate analysis result using GPU.

Furtherly, the association of target is specifically after described projection：

The target for being provided with certain altitude is in two public domains of camera, if camera heights are z0, two camera specifications Unanimously, then areas imaging is W*H, is highly z if target image coordinate is (x, y), then wherein first overall situation of camera subject Coordinate (Xw, Yw) is：

Relative to first skew of camera it is (dx, dy) in view of second camera, then second camera subject is complete Office coordinate (Xw, Yw) be：

Cluster the projected centre point that obtains and represent the target by the MeanShift that is a little to target, by than Compared with the distance of all projected centre points under adjacent cameras, and given threshold Td, target of the distance less than Td is considered same target, So as to reach the purpose that removal repeats target.

Furtherly, when camera lens are fixed, when the distance of vertically vertical view shooting, and target to camera lens is fixed, it is assumed that because The fat thin imaging size for causing of behaving meets Gaussian distribution model：

Wherein μ represents under this height that the average of target imaging size, σ represents that the imaging caused because human body is fat or thin is missed Poor standard deviation.Assuming that above-mentioned model is to count to set up when target is fixed as D to distance of camera lens, when camera heights and target When highly changing, according to pinhole imaging system principle, the size of target imaging is inversely proportional with it to the distance of camera lens, then we Obtain, when camera heights are H, and object height is h, its imaging size should meet Gaussian distribution model：

So, when known to camera setting height(from bottom), by depth map it is estimated that the big mini Mod of the imaging of target, so that For follow-up intellectual analysis provide effective priori.

Beneficial effects of the present invention：

1) depth information has been used, and has combined RGB information so that the stability of target detection has been greatly improved, be follow-up intelligence Can analyze and provide good basis.

2) installed using ceiling, the camera mounting means for vertically being shot towards ground efficiently solves the problems, such as target occlusion.Together Shi Liyong depth informations, floor projection is carried out to target, so as to realize the target association under polyphaser, is solved due to ceiling peace Fill the small problem of the visual range for causing.

3) calculating platform uses GPU, and the parallelization optimization of combination algorithm greatly improves arithmetic speed, so as to realize high score Resolution, the depth map intellectual analysis of polyphaser.

4) counted by the priori to different fat or thin targets, with reference to depth information, the size to target imaging sets up model, For follow-up intellectual analysis provide priori.

Brief description of the drawings

Fig. 1 is polyphaser imaging schematic diagram；

Fig. 2 is that polyphaser is imaged top view；

Fig. 3 is that binocular camera sets up schematic diagram；

Fig. 4 is binocular camera and ATM sphere of action schematic diagram.

Specific embodiment

The invention will be further described with accompanying drawing with reference to embodiments：

Present invention employs binocular camera as video capture device, there is every binocular camera two-way analog video to believe Number, left view and right view are represented respectively.Binocular camera requirement ceiling is ceiling mounted, and vertical vertical view shoots ground, from And ensure that the target for photographing does not exist and block mutually.When scene domain cannot be covered with a binocular camera, can install many Platform camera, and visual range more is formed by the way of splicing, cardinal principle is as follows：

To the pixel in each magazine each target area, the projection on ground is proceeded to according to its depth information, Then demarcated using set polyphaser skew (dx, dy), the association of target after being projected, so as in global coordinate system On realize under polyphaser target detection

As depicted in figs. 1 and 2, the target for having certain altitude is in two public domains of camera.If camera heights are z0, Assuming that two camera specifications are consistent, then areas imaging is W*H, is highly z if target image coordinate is (x, y), then A cameras mesh Target world coordinates (Xw, Yw) is：

In view of B cameras relative to A cameras skew be (dx, dy), then the world coordinates (Xw, Yw) of B camera subjects be：

Generally human region includes multiple pixels, because camera setting angle is different, the foreground point of same target Distribution is different, projects to that be distributed behind ground also can be different, therefore is clustered by the MeanShift that is a little to target To projected centre point represent the target.By comparing the distance of all projected centre points under adjacent cameras, and given threshold Td, target of the distance less than Td is considered same target, so as to reach the purpose that removal repeats target.

After all of camera is installed, it is uniformly accessed into the calculating platform based on GPU, the form of calculating platform is main Based on the x86 servers comprising GPU, or the embedded device comprising GPU.Calculating platform mainly completes following Business：

1：The access of binocular video.

The binocular camera of this programme uses analog signal output, and each camera has two-path video.

2：Calculate binocular depth figure.

The method comparison for calculating depth map using left view and right view is complicated, but is especially suitable for parallel computation, the present invention The algorithm of use has carried out the parallel optimization treatment of height, the speed of service very high can be obtained on GPU, so as to realize high score Resolution, the real-time deep figure of polyphaser is calculated.

3：Realize specific intelligent video analysis function.

Based on the vertical depth map and cromogram for shooting, you can to realize the intelligent video analysis work(of various high accuracies Can, such as passenger flow statisticses, behavioural analysis etc..

When camera lens are fixed, when the distance of vertically vertical view shooting, and target to camera lens is fixed, it is assumed that because human body is fat or thin The imaging size for causing meets Gaussian Profile：

Wherein μ represents under this height that the average of target imaging size, σ represents that the imaging caused because human body is fat or thin is missed Poor standard deviation.

Assuming that model above is to count to set up when target is fixed as D to distance of camera lens, when camera heights and target are high When degree changes, according to pinhole imaging system principle, the size of target imaging is inversely proportional with it to the distance of camera lens, then can obtain Arrive, when camera heights are H, and object height is h, its imaging size should meet Gauss model：

As shown in figure 3, by taking the behavioural analysis application of the self-service business halls of ATM as an example：

Antenna height：3 meters or so；

Set up angle：Vertical 90 degree；

Decorating position：Near the ceiling of ATM；

When antenna height is 2.8m, it is assumed that human body is up to 1.8m, separate unit binocular camera coverage is 2m*2m left It is right.

And according to measuring and calculating, a width for ATM is about in 0.8-1m or so, therefore a binocular camera can cover substantially Scope before 2 ATMs of lid, 4 ATMs are possessed with one, as a example by area is for the medium-sized business hall of 3m*Sm, set up two pairs Mesh camera can just meet demand substantially, referring to Fig. 4.

Two binocular cameras (4 road video) are linked into the calculating platform based on GPU, the behavior point based on depth map is realized The intellectual analysis functions such as analysis algorithm, there is provided " breaking ATM ", " robbery ", " falling down to the ground ", " withdraw the money and trail ".

In sum, the depth camera in the present invention, in addition to obtaining common colour information, moreover it is possible to obtain the depth of scene Degree information, so as to judge the distance between target and camera, has more preferable effect for target detection.Camera mounting means is It is vertical to overlook, it is ensured that the target in scene does not exist eclipse phenomena, further, since mounting means and camera specification is controllable, The yardstick of target under different setting height(from bottom)s can be in advance calibrated, is known for subsequent algorithm analysis provides very valuable priori Know.Calculating platform is based on GPU architecture, and algorithm on GPU by that after parallel optimization, can obtain adding for decades of times higher than CPU Fast performance, so as to for algorithm provides more computing resources, increasingly complex algorithm can be run to obtain more preferable effect.

The above, only presently preferred embodiments of the present invention is not intended to limit the scope of the present invention, should band Understand, the present invention is not limited to implementation as described herein, the purpose of these implementations description is to help this area In technical staff practice the present invention.

Claims

1. many binoculars splicing intelligent analysis system for being accelerated based on GPU, it is characterised in that：

The system is using binocular camera as video capture device, and wherein binocular camera is ceiling mounted, and vertical vertical view is clapped Ground is taken the photograph, so that ensure the target for photographing in the absence of blocking mutually, when scene domain cannot be covered with a binocular camera, Many cameras are then installed, and form visual range by the way of the splicing；Based on the vertical depth map and cromogram for shooting, i.e., It is capable of achieving the intelligent video analysis function of various high accuracies；

Described splicing is specifically：To the pixel in each magazine each target area, proceeded to according to its depth information Projection on ground, is then offset using set polyphaser and demarcated, the association of target after being projected, so as in the overall situation The target detection under polyphaser is realized on coordinate system；

2. many binoculars accelerated based on GPU according to claim 1 splice intelligent analysis system, it is characterised in that：It is described Projection after the association of target be specifically：

The target for being provided with certain altitude is in two public domains of camera, if camera heights are z0, two camera specifications are consistent, Then areas imaging is W*H, is highly z if target image coordinate is (x, y), then wherein first world coordinates of camera subject (Xw, Yw) is：

{(\frac{W - 1}{2} - x) \frac{z}{z 0} + x, (\frac{H - 1}{2} - y) \frac{z}{z 0} + y}

Relative to first skew of camera be (dx, dy) in view of second camera, then the global seat of second camera subject Marking (Xw, Yw) is：

{(\frac{W - 1}{2} - x) \frac{z}{z 0} + x + d x, (\frac{H - 1}{2} - y) \frac{z}{z 0} + y + d y}

Cluster the projected centre point that obtains and represent the target by the MeanShift that is a little to target, by comparing phase The distance of all projected centre points under adjacent camera, and given threshold Td, target of the distance less than Td are considered same target, so that Reach the purpose that removal repeats target.

3. many binoculars accelerated based on GPU according to claim 1 splice intelligent analysis system, it is characterised in that：

When camera lens are fixed, when the distance of vertically vertical view shooting, and target to camera lens is fixed, it is assumed that cause because human body is fat or thin Imaging size meet Gaussian distribution model：

f (x) = \frac{1}{\sqrt{2 π} σ} \exp [- \frac{{(x - μ)}^{2}}{2 {xσ}^{2}}]

Wherein μ represented under this height, the average of target imaging size, and σ represents the image error that causes because human body is fat or thin Standard deviation；Assuming that above-mentioned model is to count to set up when target is fixed as D to distance of camera lens, when camera heights and object height When changing, according to pinhole imaging system principle, the size of target imaging is inversely proportional with it to the distance of camera lens, then obtain, when Camera heights are H, and when object height is h, its imaging size should meet Gaussian distribution model：

f (x) = \frac{1}{\sqrt{2 π} σ} \exp [- \frac{{(x - μ)}^{2}}{2 {xσ}^{2}}] * \frac{D}{H - h}

So, when known to camera setting height(from bottom), by depth map it is estimated that the big mini Mod of the imaging of target, so that after being Continuous intellectual analysis provide effective priori.