CN111192297A

CN111192297A - Multi-camera target association tracking method based on metric learning

Info

Publication number: CN111192297A
Application number: CN201911407164.5A
Authority: CN
Inventors: 靖伟; 刘文天; 邹京伦; 李东进; 刘庆宝; 聂万庆; 王景泉
Original assignee: Shandong Wide Area Technology Co ltd
Current assignee: Shandong Wide Area Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-22

Abstract

The invention belongs to the technical field of computer video image processing, and particularly relates to a multi-camera target association tracking method based on metric learning. The multi-camera target association tracking method based on metric learning aims to overcome the technical defects that information cannot be shared and information is mixed up among multiple cameras, tracking loss caused by shielding of a single camera is avoided, and meanwhile, a tracking algorithm can be further embedded into hardware equipment, so that the stability and the practicability of the algorithm are improved. The multi-camera target association tracking method comprises the following steps: simultaneously acquiring a video set of a target object in a target place through n cameras; detecting and tracking a target object to obtain a tracking result; extracting a detection frame from the tracking result; sending the extracted detection frame into a deep convolution network, and extracting a feature vector; calculating the cosine distance; judging whether the objects in the same frame are similar or not through threshold comparison; the association successfully implements the trace.

Description

Multi-camera target association tracking method based on metric learning

Technical Field

The invention belongs to the technical field of computer video image processing, and particularly relates to a multi-camera target association tracking method based on metric learning.

Background

With the development of computer video image processing technology, multi-target tracking has important application in the fields of intelligent monitoring, action and behavior analysis, automatic driving and the like. Multi-target tracking finds moving objects in an image sequence by giving the image sequence, identifies moving objects of different frames, and determines an accurate target identifier, wherein the moving objects can be any objects such as pedestrians, vehicles, animals and the like. With the development of the field of target detection in recent years, the Tracking-by-detection algorithm becomes the mainstream in multi-target Tracking, and a flow network formula and a probability graph model for detection appear before, but the global optimization problem of the whole process is handled by the algorithm and the method is not suitable for an online scene. Common tracking algorithms used after detection are an SORT algorithm and a Deep Sort algorithm. The SORT algorithm uses simple Kalman filtering to process the relevance of frame-by-frame data and uses the Hungarian algorithm to carry out relevance measurement, the simple algorithm obtains good performance under a high frame rate, but the SORT algorithm is accurate only when the uncertainty of object state estimation is low due to the fact that the surface characteristics of detected objects are ignored. The Deep Sort algorithm uses more reliable measurement to replace the correlation measurement, and uses the convolutional neural network to train on a large-scale pedestrian data set, so as to extract features, thereby increasing the robustness of the network to loss and obstacles. Although the Yolo algorithm and Deep Sort algorithm are combined to perform multi-target tracking and become one of mainstream algorithms, under the conditions of personnel occlusion, crossing and the like caused by a complex environment, the corresponding error between the object identifier of the previous frame and the same object identifier of the next frame often occurs; if the algorithm is used for multi-camera tracking, although the ranges which can be observed by a plurality of cameras have a common area, the tracking information of the plurality of cameras cannot be associated, and finally the multi-target tracking effect is not high in precision and the information of a target object is wrong.

Disclosure of Invention

The invention provides a multi-camera target association tracking method based on metric learning, which aims to overcome the technical defects that information cannot be shared and is confused among multiple cameras, tracking loss is caused by shielding of a single camera and the like, and can further embed a tracking algorithm into hardware equipment so as to improve the stability and the practicability of the algorithm.

In order to solve the technical problems, the invention adopts the following technical scheme:

a multi-camera target association tracking method based on metric learning comprises the following steps:

step 1, simultaneously acquiring a video set V ═ V { V of a target object in a target site through n cameras₁,V₂,…,V_n}; wherein, V_iRepresents a video captured by a camera with the index i, and V_i＝{img₁,img₂,…,img_n}，img_iRepresents V_iAn ith frame picture in the video;

step 2, using a Yolo algorithm to pair V_iDetecting the target object in (1), and tracking by using Deep Sort algorithm to obtain V_iDetection frame DF of medium object ═ { DF ═ DF₁,df₂,…,df_nAnd specific identity set ID ═ ID₁,id₂,…,id_n}; wherein id_iRepresents V_iThe object in (1) named i; then, the tracking result E of each frame is { (df)₁,id₁),(df₂,id₂),…,(df_n,id_n) Saving to the local;

step 3, extracting tracking results of p cameras from the tracking result E; let p be 2 and the camera numbers j and k, respectively; extract the same frame of video, i.e. V, captured by cameras j and k_jAnd V_kMiddle and same frame img_i(ii) a Respectively extracting img_iDetection frame DF of q objects in (1)₁＝{d₁,d₂,…,d_qAnd DF₂＝{f₁,f₂,…,f_q}; wherein d is_iRepresents V_jImg of_iWherein ID is ID_iDetection frame of f_iRepresents V_kImg of_iWherein ID is ID_iThe detection frame of (2);

step 4, extracting the detection frame DF₁And DF₂Sending into a deep convolution network, extracting d_iAnd f_iRespectively marked as r_Vj＝{r₁ ^idi,r₂ ^idi,…,r_n ^idi},r_Vk＝{r₁ ^idi,r₂ ^idi,…,r_n ^idi}; wherein r is_VjAnd r_VkRepresenting cameras j and k at img_iId of frame acquisition_iThe appearance characteristics of (a);

step 5, for r_Vj＝{r₁ ^idi,r₂ ^idi,…,r_n ^idiAnd r_Vk＝{r₁ ^idi,r₂ ^idi,…,r_n ^idiCalculating the cosine distance, wherein the calculation formula is shown as formula (1):

Cos(dis(r_Vj,r_Vk))＝1-r_vj ^Tr_vk(1)

wherein r is_vj ^TIs r_vjThe transposed matrix of (2);

step 6, comparing and judging V through a threshold α_jAnd V_kWhether the objects in the same frame are similar or not, and the threshold α is obtained by training, when Cos (dis (r)_Vj,r_Vk) D) is less than or equal to α_iAnd f_iThe association is successful, and the result is judged to be id in two detection frames_iSimilarly; otherwise, returning to the step 5 to continue cosine calculation until all comparisons are finished;

step 7, when d_iAnd f_iThe association is successful, will f_iCorresponding id_iTo d_iCorresponding id_i。

The invention provides a multi-camera target association tracking method based on metric learning, which comprises the following steps: simultaneously acquiring a video set of a target object in a target place through n cameras; detecting and tracking a target object to obtain a tracking result; extracting a detection frame from the tracking result; sending the extracted detection frame into a deep convolution network, and extracting a feature vector; calculating the cosine distance; judging whether the objects in the same frame are similar or not through threshold comparison; the association successfully implements the trace. The multi-camera target association tracking method based on metric learning, which has the characteristics of the steps, improves the recording effect in a mode of simultaneously recording by multiple cameras; the appearance characteristics of the object in the detection frame are extracted by adopting a deep neural network, so that the method is simple and high in robustness; by adopting the cosine distance as a measurement learning method, the similarity between the characteristic vectors can be measured better without direct connection with absolute numerical values; by adopting an embedded method, the stability and the practicability of the algorithm are improved. And the conditions of personnel shielding, crossing and the like caused by complex environment in the multi-target tracking problem are solved, and the tracking precision can be effectively improved.

Drawings

Fig. 1 is a schematic view of an application scenario of a multi-camera target association tracking method based on metric learning according to the present invention;

fig. 2 is a schematic structural diagram of an embedded device of the multi-camera target association tracking method based on metric learning according to the present invention.

Detailed Description

step 1, simultaneously acquiring a video set V ═ V { V of a target object in a target site through n cameras₁,V₂,…,V_n}; wherein, V_iRepresents a video captured by a camera with the index i, and V_i＝{img₁,img₂,…,img_n}，img_iRepresents V_iThe ith frame of picture in the video. It is noted that, as shown in fig. 1, the skilled person in fig. 1 sets 2 cameras with different directions to capture the current scene.

Step 2, using a Yolo algorithm to pair V_iDetecting the target object in (1), and tracking by using a Deepsort algorithm to obtain V_iDetection frame DF of medium object ═ { DF ═ DF₁,df₂,…,df_nAnd specific identity set ID ═ ID₁,id₂,…,id_n}; wherein id_iRepresents V_iThe object in (1) named i; then, the tracking result E of each frame is { (df)₁,id₁),(df₂,id₂),…,(df_n,id_n) Save locally.

Step 3, extracting tracking results of p cameras from the tracking result E; let p be 2 and the camera numbers j and k, respectively; extract the same frame of video, i.e. V, captured by cameras j and k_jAnd V_kMiddle and same frame img_i(ii) a Respectively extracting img_iDetection frame DF of q objects in (1)₁＝{d₁,d₂,…,d_qAnd DF₂＝{f₁,f₂,…,f_q}; wherein d is_iRepresents V_jImg of_iWherein ID is ID_iDetection frame of f_iRepresents V_kImg of_iWherein ID is ID_iThe detection frame of (1).

Step 4, extracting the detection frame DF₁And DF₂Sending into a deep convolution network, extracting d_iAnd f_iRespectively marked as r_Vj＝{r₁ ^idi,r₂ ^idi,…,r_n ^idi},r_Vk＝{r₁ ^idi,r₂ ^idi,…,r_n ^idi}; wherein r is_VjAnd r_VkRepresenting cameras j and k at img_iId of frame acquisition_iThe appearance characteristics of (1).

Cos(dis(r_Vj,r_Vk))＝1-r_vj ^Tr_vk(1)

wherein r is_vj ^TIs r_vjThe transposed matrix of (2).

Step 6, comparing and judging V through a threshold α_jAnd V_kWhether the objects in the same frame are similar or not, and the threshold α is obtained by training, when Cos (dis (r)_Vj,r_Vk) D) is less than or equal to α_iAnd f_iThe association is successful, and the result is judged to be id in two detection frames_iSimilarly; otherwise, returning to the step 5 to continue the cosine calculation until all the comparisons are finished.

Therefore, the multi-camera target association tracking method based on metric learning realizes the association of the multi-camera targets.

It should be added that, in order to further improve the application scenario of the multi-camera target association tracking method, the relevant contents in steps 1 to 7 may be embedded into a hardware device, for example: such as the embedded terminal device shown in fig. 2. In particular, the embedded terminal device preferably uses PCIe (x8) slots, is specially designed for edge artificial intelligence and machine vision applications, adopts an intelligent power supply, and has the advantages of low cost, less waste heat generation and more stable system.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A multi-camera target association tracking method based on metric learning is characterized by comprising the following steps:

Cos(dis(r_Vj,r_Vk))＝1-r_vj ^Tr_vk(1)

wherein r is_vj ^TIs r_vjThe transposed matrix of (2);