CN110276233A

CN110276233A - A kind of polyphaser collaboration tracking system based on deep learning

Info

Publication number: CN110276233A
Application number: CN201810232732.1A
Authority: CN
Inventors: 于耀; 李炎峻; 周余
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-03-15
Filing date: 2018-03-15
Publication date: 2019-09-24

Abstract

The present invention is a kind of algorithm tracked for pedestrian track in video, belongs to computer vision field.Problems solved by the invention is: in pedestrian tracking make tracking effect not robust due to blocking etc. the problem of.The present invention proposes a kind of algorithm that pedestrian tracking is carried out under polyphaser environment.The core of main algorithm of the invention is to propose a kind of collaboration of polyphaser end to end track algorithm, data acquisition is carried out to target area using more cameras, then the detection and identification of pedestrian are carried out to every frame picture in video flowing, the tracking of pedestrian is carried out according to the result of detection, again matching fusion will be carried out with the track of a group traveling together under different cameral, pedestrian is finally obtained in the location information of every frame, constitutes the track of pedestrian.For occlusion issue common in pedestrian tracking, we generate network to solve at application confrontation, when occlusion issue occurring during tracking, we just generate network using confrontation and do not block picture generate next frame, and the detection and tracking of pedestrian are carried out using the picture of generation.

Description

A kind of polyphaser collaboration tracking system based on deep learning

Technical field

The demand and pedestrian track tracking technique hardly possible that present invention is generally directed to excavate under indoor environment to pedestrian track value High contradiction is spent, a kind of polyphaser collaboration tracking system based on deep learning is proposed.

Background technique

In big data era, pedestrian track contains important value.Under the environment such as market, if it is possible to effectively mention Pedestrian track is taken, can further optimize the setting of sales counter, generate huge commercial value.Cheap and widely distributed camera So that the acquisition of pedestrian's picture becomes very simple, and the track that pedestrian how is extracted from picture remains a problem.

Presently, there are pedestrian tracking algorithm be traditional Kalman filtering scheduling algorithm mostly, pedestrian density it is low, movement Can obtain good effect in the simple situation in track, but it is intensive for pedestrian, be still difficult to solve in the case of track is complicated. There is preferable optimization based on the thought of the track-by-detection situation intensive to pedestrian, is first detected in every frame picture Then all pedestrians match and merge to the pedestrian of frame every in video.It is then detected that module is still lacked with tracking module Weary real-time, accurate algorithm.

In order to solve the problems in the existing technology, construct herein it is a set of based on deep learning polyphaser collaboration with Track system carries out video acquisition, detection and tracking end to end to pedestrian.

Summary of the invention

The purpose of the present invention: indoors under environment, target area is shot using multiple cameras, then to video flowing In every frame picture carry out pedestrian detection and identification, according to the result of detection carry out pedestrian tracking, in the case where camera will do not had to Track with a group traveling together is merged, and is finally obtained pedestrian in the location information of every frame, is constituted the track of pedestrian.

Aiming at the problems existing in the prior art, the invention proposes a kind of online polyphasers to cooperate with track algorithm, main It comprises the steps of:

Step 1: the video information of each camera acquisition pedestrian, and pedestrian therein is detected to every frame picture.

Step 2: carrying out the tracking of online to the pedestrian that one camera detects, multiple pedestrians under one camera environment are obtained Track.

Step 3: on the trajectory map to public ground level of pedestrian under multiple camera environments, and do the feature of track With with merge.

For step 1, the pedestrian in picture is detected using based on convolutional neural networks.Link is being trained, we Block diagram in hand labeled picture where pedestrian, and be marked with the high width of top left co-ordinate and box, label=(x, y, W, h), wherein x and y indicates that the coordinate in pedestrian's block diagram upper left corner, w and h indicate the width and height of pedestrian's block diagram.Picture is sent into mind Among network, after the convolutional layer by multilayer obtains characteristic spectrum, discriminate whether it is candidate into object discrimination network Body is re-fed into sorter network for candidate object and differentiates whether it is pedestrian.

In object discrimination network, if the degree of overlapping between selected block diagram and actual block diagram is greater than certain threshold value, The block diagram is marked as positive example, is otherwise then counter-example.Define the loss function of the network are as follows:

Wherein,It is cross entropy loss function, indicates whether judgement block diagram is object.It is a square damage Function is lost, indicates the difference between practical block diagram and prediction block diagram.

Step 2 tracks the pedestrian under one camera environment.The discomforts such as traditional tracking such as Kalman filtering For the complex scene of more people, mainstream algorithm in recent years is mostly based on the thought of tracking-by-detection, however big Mostly it is off-line algorithm, needs the contextual information of before and after frames, be not used to actual items.It is proposed that one kind is based on The real-time tracking algorithm of tracking-by-detection thought, tracking effect are preferable.Detection obtains before the algorithm utilizes As a result, carrying out the extraction and matching of feature in consecutive frame, and the abnormal conditions such as detection failure can be effectively treated.The algorithm mainly has Two parts form.First part is the extraction about feature, we are extracted target using traditional feature extracting method RGB, the features such as HSV, LBP, composition characteristic vector.The second part is the matching about pedestrian, we pass through to pedestrian's State is modeled, and state includes init state, tracking mode, lost condition, dead state etc..Front several frames initialization with After track object, behind every frame tracking object and test object are carried out to the calculating of characteristic similarity, obtain after similarity using greedy Greedy algorithm is matched, and the state of tracking object is finally updated according to matched result.For the object of successful match, after continuation of insurance Hold tracking mode.The object that it fails to match, then switch to lost condition.If several frames can be found just the object of lost condition later True matched test object, then can be restored to tracking mode, otherwise can switch to dead state.Finally we are last by statistics The state of all tracking objects, the trace information of all pedestrians in available video.

For occlusion issue common in pedestrian tracking, we generate network to solve at application confrontation.Occlusion issue refers to Pedestrian is blocked by external object during the motion, so that pedestrian detection module can not detect pedestrian, so that with Track module not robust.We, which generate network using confrontation, can generate next frame according to the picture that frames several before pedestrian are not blocked Picture.Confrontation generates network and is made of network G and network D.Network G refers to generation network, and input is X= (X¹..., X^m), X indicates front m frame picture, after multilayer convolutional layer, exports Y_gen, Y_genRefer to the next frame of generation Picture.Network D refers to decision networks, and the positive sample of input is X=(X¹..., X^m, Y), refer to and is mentioned from initial data The continuous m+1 frame picture taken, therefore Y=X^m+1.Negative sample is X=(X¹..., X^m, Y_gen), wherein Y_genIt is to generate network The middle next frame picture generated according to preceding m frame picture.The target of the network is to judge that the successive frame picture of input is true Or generate what network generated, therefore loss function is defined as:

L^D(X, Y)=L_cls(D (X, Y), 1)+L_cls(D (X, Y_gen), 0)

Wherein L_elsRefer to cross entropy cost function, is defined as:

I refers to i-th of sample therein.Network G is kept when updating the parameter of network D using stochastic gradient descent method Parameter is fixed.

The target of network G is that the picture that generates is as true as possible, therefore its loss function is defined as:

L^G(X, Y)=L_cls(D (X, Y_gen), 1)

The parameter of network D is kept to fix when updating the parameter of network G using stochastic gradient descent method.

We just generate network using confrontation to generate the picture of next frame when occlusion issue occurring during tracking, and The detection and tracking of pedestrian are carried out using the picture of generation.

Step 3 be under polyphaser the matching of pedestrian ID with merge.It is proposed that under a kind of processing polyphaser pedestrian matching with The algorithm of fusion has good behaviour in common data sets.Multiple camera plane lists are mapped to common plane (by the algorithm Plane), the matching of different pedestrians is then carried out on common plane.Set C={ C₁... Ci ... C_nIndicate n Camera,It is i-th of the clarification of objective observed in the visual angle of camera C, can be indicated with the coordinate information of position It.Assuming that one shares N number of camera, the track under N number of camera perspective is thrown in the track under our available N number of camera perspectives Shadow gets up the Trace Formation of the same person to common plane, the track of M people on available common plane, then by M The track back projection of people is to N number of camera plane.It is tracked by using multi rack camera, we, which effectively can solve to block, is expert at The problem of being brought in people's tracking.

When pedestrian track is projected to public ground level from camera plane, it would be desirable to calculate camera plane with publicly Projection matrix between plane, the matrix are completed by calibrated and calculated.It can be by formula:

X '=Hx

It is calculated.Wherein x=(x, y, 1), indicates the homogeneous coordinates in original plane, and x '=(x ' Y ', 1) is indicated The homogeneous coordinates in space after projection.The form of projection matrix is

The multiple points of hand labeled are brought into above-mentioned formula, can calculate in the coordinate of camera plane and public ground level Obtain the value of projection matrix.

Detailed description of the invention

Detailed description of the invention further understands technical solution of the present invention for providing, and constitutes part of specification, with Implementation of the invention technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.Attached drawing It is described as follows:

Fig. 1 is the architecture diagram of whole system.

Specific embodiment

Carry out the embodiment that the present invention will be described in detail below with reference to attached drawing, whereby to the present invention how applied technology method It solves the problems, such as, and the realization process for reaching technical effect can fully understand and implement.It is shown in the flowchart of the accompanying drawings Step can execute in the different computer systems of such as a group of computer-executable instructions, although also, in flow charts Logical order is shown, but in some cases, it can be with the steps shown or described are performed in an order that is different from the one herein.

The implementation procedure of algorithm is specifically described below

Step 1: pedestrian detection.Computer, will corresponding every frame figure by being wirelessly connected the data for obtaining camera and acquiring in real time As being sent in the neural network of pedestrian detection, the block diagram coordinate of multiple pedestrians on image is exported.

Step 2: pedestrian tracking.Obtain the pedestrian's coordinate of every frame picture, before several frames initialize tracking objects, then often Frame picture can be sent to matching module, update tracking object, finally obtain the trace information of multiple pedestrians.

Step 3: polyphaser matches.Each camera can obtain the track of pedestrian, then by pedestrian's rail of this multiple camera Mark result is matched and is merged, and the trace information of public ground level uplink people is obtained.

Those skilled in the art should be understood that above-mentioned system structure of the invention and each step can be with general Computing device realizes that they can be concentrated on a single computing device, or is distributed in the net of multiple computing devices compositions On network, optionally, they can be realized with the program code that computing device can perform, and be deposited it is thus possible to be stored in It is performed by computing device in storage device, they is perhaps fabricated to each integrated circuit modules respectively or will be in them Multiple modules or step are fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware It is combined with software.

Although embodiment shown or described by the present invention is as above, the content is only to facilitate understand this The embodiment of invention and use, is not intended to limit the invention.Any those skilled in the art to which this invention pertains, Do not depart from disclosed herein spirit and scope under the premise of, any repair can be done in the formal and details of implementation Change and change, but scope of patent protection of the invention, still should be subject to the scope of the claims as defined in the appended claims.

Claims

1. a kind of polyphaser based on deep learning cooperates with tracking system, it is characterized in that including following key step:

Step 1: the video information of each camera acquisition pedestrian, and pedestrian therein is detected to every frame picture

Step 2: carrying out the tracking of online to the pedestrian that one camera detects, the rail of multiple pedestrians under one camera environment is obtained Mark.

Step 3: on the trajectory map to public ground level of pedestrian under multiple camera environments, and do the characteristic matching of track with Fusion.

2. the step of claim 1 the method two, is characterized in that, before we utilize detection obtain as a result, in consecutive frame The extraction and matching of feature are carried out, and the abnormal conditions such as detection failure can be effectively treated.Mainly there are two parts to form for the algorithm. First part is the extraction about feature, we are extracted the RGB of target, HSV, LBP using traditional feature extracting method Etc. features, composition characteristic vector.The second part is the matching about pedestrian, we are modeled by the state to pedestrian, State includes init state, tracking mode, lost condition, dead state etc..After a few frame initialization tracking objects in front, behind Tracking object and test object are carried out the calculating of characteristic similarity by every frame, use greedy algorithm progress after obtaining similarity Match, the state of tracking object is finally updated according to matched result.For the object of successful match, continue to keep tracking mode. The object that it fails to match, then switch to lost condition.If several frames can find correct matched inspection to the object of lost condition later Object is surveyed, then can be restored to tracking mode, otherwise can switch to dead state.Finally we pass through the last all tracking pair of statistics The state of elephant, the trace information of all pedestrians in available video.

For occlusion issue common in pedestrian tracking, we generate network to solve at application confrontation.Occlusion issue refers to pedestrian It is blocked during the motion by external object, so that pedestrian detection module can not detect pedestrian, so that tracking mould Block not robust.We generate network using confrontation can generate the figure of next frame according to the picture that frames several before pedestrian are not blocked Piece.Confrontation generates network and is made of network G and network D.Network G refers to generation network, and input is X=(X¹..., X^m), X indicates front m frame picture, after multilayer convolutional layer, exports Y_gen, Y_genRefer to the picture of the next frame of generation.Network D Refer to decision networks, the positive sample of input is X=(X¹..., X^m, Y), refer to the continuous m+1 extracted from initial data Frame picture, therefore Y=X^m+1.Negative sample is X=(X¹..., X^m, Y_gen), wherein Y_genIt is to generate in network according to preceding m frame The next frame picture that picture generates.The target of the network is to judge that the successive frame picture of input is true or generates network It generates, therefore loss function is defined as:

L^D(X, Y)=L_cls(D (X, Y), 1)+L_cls(D (X, Y_gen), 0)

Wherein L_clsRefer to cross entropy cost function, is defined as:

I refers to i-th of sample therein.The parameter of network G is kept when updating the parameter of network D using stochastic gradient descent method It is fixed.

L^G(X, Y)=L_cls(D (X, Y_gen), 1)

When occlusion issue occurring during tracking, we just generate network using confrontation to generate the non-Occlusion Map of next frame Piece, and carry out using the picture of generation the detection and tracking of pedestrian.