CN111898438A

CN111898438A - Multi-target tracking method and system for monitoring scene

Info

Publication number: CN111898438A
Application number: CN202010604786.3A
Authority: CN
Inventors: 王苫社
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-11-06

Abstract

The application provides a multi-target tracking method, a multi-target tracking device and a multi-target tracking system for a monitored scene. Wherein the method comprises the following steps: acquiring a video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, wherein the aerial photography unmanned aerial vehicle is suspended above the target place, and a shooting field of the aerial photography unmanned aerial vehicle covers the target place; identifying a target object in each frame of the video stream based on color characteristics; taking the target object as a tracking target, and determining the motion track of the target object in the video stream by adopting a target tracking algorithm; determining motion trail data of the target object in a plane coordinate system according to the motion trail of the target object in the video stream; and determining the movement distance of the target object according to the movement track data of the target object. The method and the device have the advantages of being simple to implement, low in implementation cost, high in data processing speed and high in accuracy.

Description

Multi-target tracking method and system for monitoring scene

Technical Field

The application relates to the technical field of image processing, in particular to a multi-target tracking method, a multi-target tracking device and a multi-target tracking system for a monitored scene.

Background

With the rapid progress of computer technology and video monitoring technology, video monitoring is applied more and more widely in daily life.

For example, in some applications, it is desirable to track and analyze the random movement of a fixed number of people within a fixed location. At present, the mainstream method is to install 8 or even 16 cameras with thermal imaging function at the field side of a target place to acquire thermal imaging information of personnel on the field, and then utilize the thermal imaging information to realize target personnel tracking through a plurality of data processing processes such as coordinate system affine transformation, artificial auxiliary identification personnel, sheltered personnel re-identification and the like, so as to obtain the movement distance of the target personnel. However, this method needs to configure a large number of thermal imaging cameras and background computing devices, and has the disadvantages of complex data processing process, large data processing amount, complex implementation, high implementation cost, low data processing efficiency, and the like.

Disclosure of Invention

The application aims to provide a multi-target tracking method, device and system for a monitored scene.

The application provides a monitoring scene multi-target tracking method in a first aspect, which comprises the following steps:

acquiring a video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, wherein the aerial photography unmanned aerial vehicle is suspended above the target place, and a shooting field of the aerial photography unmanned aerial vehicle covers the target place;

identifying a target object in each frame of the video stream based on color characteristics;

taking the target object as a tracking target, and determining the motion track of the target object in the video stream by adopting a target tracking algorithm;

determining motion trail data of the target object in a plane coordinate system according to the motion trail of the target object in the video stream;

and determining the movement distance of the target object according to the movement track data of the target object.

The second aspect of the present application provides a monitoring scene multi-target tracking apparatus, including:

the video stream acquisition module is used for acquiring a video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, the aerial photography unmanned aerial vehicle is suspended above the target place, and a shooting field of view of the aerial photography unmanned aerial vehicle covers the target place;

a target identification module for identifying a target object in each frame of the video stream based on color characteristics;

the target tracking module is used for determining the motion track of the target object in the video stream by taking the target object as a tracking target and adopting a target tracking algorithm;

the motion track determining module is used for determining motion track data of the target object in a plane coordinate system according to the motion track of the target object in the video stream;

and the movement distance determining module is used for determining the movement distance of the target object according to the movement track data of the target object.

The third aspect of the present application provides a monitoring scene multi-target tracking system, including: the system comprises an aerial photography unmanned aerial vehicle and background data processing equipment connected with the aerial photography unmanned aerial vehicle; wherein the content of the first and second substances,

the aerial photography unmanned aerial vehicle is suspended above a target place, is used for taking a picture of the target place in a bent manner, and sends a video stream obtained by shooting to the background data processing equipment;

the background data processing device is configured to determine, according to the video stream, a movement distance of a target object in the target location by using the method provided in the first aspect of the present application, and output the movement distance of each target object.

Compared with the prior art, the multi-target tracking method for the monitored scene provided by the application has the advantages that the video stream is obtained by using the aerial unmanned aerial vehicle to shoot down the target place, so the picture is a two-dimensional plane picture and has no distortion, therefore, the identification of the target object and the planar coordinate system mapping of the target place can be directly carried out without carrying out the affine transformation of the coordinate system, and the problem that the target objects in the video shot by the field camera are frequently shielded from each other does not exist, and the target re-identification is not required to be frequently carried out, therefore, can effectively simplify the data processing process and improve the processing efficiency, and because the image fidelity without affine transformation is higher, therefore, compared with the motion trail data detected in the prior art, the motion trail data detected on the basis of the video stream is more accurate, and the calculated motion distance is more accurate; in addition, the target object in the video stream can be accurately identified and tracked by using the color characteristics only according to the color difference between the target object and the target place, and compared with the mode of identifying the target by using thermal imaging information and identifying by manual assistance in the prior art, the method has the advantages of simpler algorithm, higher data processing efficiency and more accurate processing result; in addition, because this application data processing volume is less, data processing algorithm is simple more high-efficient, and is lower to backstage data processing equipment's requirement, only needs an unmanned aerial vehicle and backstage data processing equipment of taking photo by plane to implement, consequently, still possesses the advantage that implements simply, implementation cost is lower.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a method for monitoring scene multi-target tracking, provided by some embodiments of the present application;

FIG. 2 illustrates a schematic diagram of a monitored scene multi-target tracking apparatus, provided by some embodiments of the present application;

fig. 3 illustrates a schematic diagram of a background data processing device provided in some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a multi-target tracking method, a multi-target tracking device and a multi-target tracking system for a monitored scene, and the following description is given by combining the embodiment and the accompanying drawings.

Referring to fig. 1, which is a flowchart illustrating a monitoring scene multi-target tracking method according to some embodiments of the present application, as shown in fig. 1, the monitoring scene multi-target tracking method may include the following steps:

step S101: acquiring a video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, wherein the aerial photography unmanned aerial vehicle is suspended above the target place, and a shooting field of the aerial photography unmanned aerial vehicle covers the target place;

step S102: identifying a target object in each frame of the video stream based on color characteristics;

step S103: taking the target object as a tracking target, and determining the motion track of the target object in the video stream by adopting a target tracking algorithm;

step S104: determining motion trail data of the target object in a plane coordinate system according to the motion trail of the target object in the video stream;

step S105: and determining the movement distance of the target object according to the movement track data of the target object.

Compared with the prior art, the monitored scene multi-target tracking method provided by the embodiment of the application has the advantages that the video stream is obtained by using the aerial unmanned aerial vehicle to shoot down the target place, so the picture is a two-dimensional plane picture and has no distortion, therefore, the identification of the target object and the planar coordinate system mapping of the target place can be directly carried out without carrying out the affine transformation of the coordinate system, and the problem that the target objects in the video shot by the field camera are frequently shielded from each other does not exist, and the target re-identification is not required to be frequently carried out, therefore, can effectively simplify the data processing process and improve the processing efficiency, and because the image fidelity without affine transformation is higher, therefore, compared with the motion trail data detected in the prior art, the motion trail data detected on the basis of the video stream is more accurate, and the calculated motion distance is more accurate; in addition, the target object in the video stream can be accurately identified and tracked by using the color characteristics only according to the color difference between the target object and the target place, and compared with the mode of identifying the target by using thermal imaging information and identifying by manual assistance in the prior art, the method has the advantages of simpler algorithm, higher data processing efficiency and more accurate processing result; in addition, because this application data processing volume is less, data processing algorithm is simple more high-efficient, and is lower to backstage data processing equipment's requirement, only needs an unmanned aerial vehicle and backstage data processing equipment of taking photo by plane to implement, consequently, still possesses the advantage that implements simply, implementation cost is lower.

In some variations of embodiments of the present application, the identifying a target object in each frame of the video stream based on color features includes:

and identifying the target object in each frame of the video stream by adopting a background difference method.

In some modified embodiments of the embodiment of the present application, the determining, by using a target tracking algorithm and taking the target object as a tracking target, a motion trajectory of the target object in the video stream includes:

taking the target object as a tracking target, generating a target frame surrounding the tracking target in the current frame, and determining a central point of the target frame as a track point of the target object in the current frame;

generating a plurality of candidate frames in the vicinity of the corresponding position of the target frame in the next frame;

calculating the image similarity of each candidate frame and the target frame;

determining the central point of the candidate frame with the highest image similarity as a track point of the target object in the next frame;

and after determining the track points of the target object in each frame, connecting the track points of the target object in each frame to form a motion track of the target object in the video stream.

In some variations of the embodiments of the present application, the generating a plurality of candidate frames in the vicinity of the corresponding position of the target frame in the next frame includes:

predicting the predicted position of the target object in the next frame by adopting a motion trail prediction method according to the historical motion trail data of the target object;

determining a connecting line between the track point of the target object in the current frame and the predicted position;

a plurality of candidate boxes are generated along the connecting line.

In some variations of the embodiments of the present application, before the identifying the target object in each frame of the video stream based on the color feature, the method further includes:

and carrying out alignment processing on each frame in the video stream according to the target place mark.

By the embodiment, each frame in the video stream can be kept aligned, so that the accuracy of the finally detected motion track and motion distance is ensured.

In some embodiments, the monitoring scene multi-target tracking method is used for detecting the movement distances of people based on a video stream, and can detect the movement distances of multiple people, and the monitoring scene multi-target tracking method may include the following steps:

step S201: the method comprises the steps of obtaining a real-time video stream obtained by a target place of aerial photography of the unmanned aerial vehicle, wherein the aerial photography unmanned aerial vehicle suspends above the center of the target place, and the shooting view field of the aerial photography unmanned aerial vehicle covers the target place.

The aerial photography unmanned aerial vehicle is an unmanned aerial vehicle with a camera, can be realized by adopting a multi-rotor aircraft, achieves the purpose of suspension at a fixed position above a target place, ensures the synchronization of each frame of picture in a real-time video stream obtained by shooting, avoids the problems of picture shaking and dislocation, facilitates the subsequent direct personnel identification and motion trail tracking, and further determines the motion distance of personnel quickly and conveniently.

In some embodiments, the aerial photography unmanned aerial vehicle may suspend above the center of the target location, so as to ensure that the real-time video stream obtained by shooting is centrosymmetric with the center of the target location, thereby ensuring that the position of the personnel image obtained by identifying according to the real-time video stream has higher accuracy to the greatest extent, and further ensuring that the calculated movement distance of the personnel has higher accuracy.

In addition, the shooting view field of the aerial photography unmanned aerial vehicle covers the target place, so that all people in the whole target place can be identified and tracked by using the picture shot by one aerial photography unmanned aerial vehicle.

After step S201, the motion trajectory of the person image in the real-time video stream is determined by using a target detection algorithm and a target tracking algorithm based on color features according to the color difference between the person image in the real-time video stream and the ground color of the target location.

Step S202: and identifying the image of the person in each frame of the real-time video stream based on the color features according to the color difference between the person and the background color of the target place.

Step S203: and determining the motion track of the personnel image in the real-time video stream by taking the personnel image as a tracking target and adopting a target tracking algorithm.

Considering that the background color of the target place is taken as a background color and is generally green, or the light green and the dark green are separated, so the color is single, the head and part of the trunk of the person are seen in the overhead real-time video stream picture, so the position of the person is mainly the head color and the dressing color, the head color is generally black, yellow and other colors which can be obviously distinguished from the background color, and the dressing color of the person is less green for identifying the person, so the person and the background color can be conveniently distinguished according to the difference of the color characteristics, and the person image corresponding to the person can be identified from the real-time video stream by adopting a target detection algorithm based on the color characteristics, and the target tracking is carried out, so the moving distance of the person is calculated.

Compared with the mode that personnel identification is required according to multi-dimensional characteristics such as human face characteristics, height characteristics and body type characteristics in videos shot by a field camera in the prior art, the adopted characteristics are fewer, so that the target identification and tracking calculation amount is less, the efficiency is higher, and higher accuracy can be realized.

It should be noted that, in the embodiment of the present application, any color feature-based target detection algorithm provided by the prior art may be used to identify a person image in a real-time video stream, and also any color feature-based target tracking algorithm provided by the prior art may be used to track a person image in a real-time video stream.

Step S204: and determining the motion trail data of the personnel corresponding to the personnel images according to the motion trail of the personnel images in the real-time video stream based on the mapping relation between the pixel coordinate system of the real-time video stream and the plane coordinate system of the target place.

Considering that the picture of the real-time video stream is measured in pixels, and the specific position of the person in the target place is measured by using the plane coordinate system of the target place, therefore, in order to obtain the motion trajectory data and the motion distance of the person, it is further required to establish a mapping relationship between the pixel coordinate system of the real-time video stream and the plane coordinate system of the target place, determine the physical coordinates of the person corresponding to the person image in the target place by using the mapping relationship and the pixel coordinates of the person image in the real-time video stream, and then form the motion trajectory data of the person and calculate the motion distance of the person by using the physical coordinates of each trajectory point of the person in time sequence arrangement.

For example, pixel coordinates (m, n) of a human image in a real-time video stream are converted into position coordinates (x, y) of a human user in a plane coordinate system of a target site through coordinate system mapping, wherein the mapping relationship can be determined according to a target site mark of the target site, for example, a pixel length of a central line of the target site in the real-time video stream is 1000 pixels (pixel), a length of the central line of the target site in the plane coordinate system is 50 meters, and then one pixel represents 0.05 meter.

The movement track data can include time information corresponding to each frame, physical coordinates of people and other information, and information such as speed, acceleration, movement direction and the like corresponding to each track point (physical coordinate) of the people can be calculated based on the information, so that the movement conditions of the people can be comprehensively analyzed and evaluated.

Step S205: and determining the movement distance of the person according to the movement track data of the person.

After the movement track data of the person is determined, the linear distances between every two adjacent track points can be sequentially calculated according to the position coordinates of the track points in the movement track data and the track sequence (time sequence and frame sequence), and finally, all the calculated linear distances are added to obtain the movement distance of the person.

Compared with the prior art, the multi-target tracking method for the monitored scene provided by the embodiment of the application has the advantages that the real-time video stream is obtained by using the aerial unmanned aerial vehicle to shoot the target place downwards, so that the picture is a two-dimensional plane picture and has no distortion, therefore, the personnel identification and the planar coordinate system mapping of the target place can be directly carried out without carrying out coordinate system affine transformation, and the problem that people frequently shield each other in the video shot by the field camera does not exist, and the target re-identification is not needed frequently, therefore, can effectively simplify the data processing process and improve the processing efficiency, and because the image fidelity without affine transformation is higher, therefore, the motion trail data measured on the basis of the real-time video stream is more accurate than the motion trail data detected in the prior art, and the calculated motion distance is more accurate; in addition, the personnel images in the real-time video stream can be accurately identified and tracked by using the color characteristics only according to the color difference between the personnel images and the ground color of the target place, and compared with the mode of identifying the target by using thermal imaging information and identifying by manual assistance in the prior art, the method has the advantages of simpler algorithm, higher data processing efficiency and more accurate processing result; in addition, because this application data processing volume is less, data processing algorithm is simple more high-efficient, and is lower to backstage data processing equipment's requirement, only needs an unmanned aerial vehicle and backstage data processing equipment of taking photo by plane to implement, consequently, still possesses the advantage that implements simply, implementation cost is lower.

Considering that, due to the optical structure of the camera, the shot picture is often distorted, and the distortion is generally located in the edge area of the picture, therefore, in order to accurately measure the motion track and the motion distance of the person, it is necessary to ensure that there is no distortion in the real-time video stream picture at least in the range of the target place, and therefore, the flying height of the aerial unmanned aerial vehicle relative to the target place needs to satisfy the following conditions:

in the formula, h represents the suspension height of the aerial photography unmanned aerial vehicle, L represents the length of a target place, theta represents the angle of view of a camera of the aerial photography unmanned aerial vehicle, and a represents an orthodontic coefficient.

Wherein, when a is 1, the field angle of the aerial photography unmanned aerial vehicle just covers the range of the target place, if the edge distortion is detected, a can be set to a value larger than 1, for example, 1.1, 1.2, 1.5, etc., the embodiment of the present application is not limited, and a person skilled in the art can flexibly set the size of the orthodontic coefficient a under the condition of ensuring no distortion in the range of the target site, it should be noted that the orthodontic coefficient a is proportional to the flying height h of the aerial unmanned aerial vehicle, the larger the orthodontic coefficient a is, the higher the flying height h of the aerial unmanned aerial vehicle is, correspondingly, the smaller the size of the image of the person in the picture is, considering that the too small size of the image of the person may affect the tracking effect, therefore, the value of the orthodontic coefficient a is not suitable to be too large, and is generally controlled below 1.5 in practical application, therefore, the balance between the distortion and the personnel identification and tracking effects is considered, and a more accurate detection result is obtained.

On the basis of the above embodiment, in some modified embodiments, before acquiring the real-time video stream obtained by the aerial photography unmanned aerial vehicle through overhead photography, the method further includes:

before starting, obtaining a debugging picture obtained by a bent target place of an aerial unmanned aerial vehicle;

determining the distortion rate of the debugging picture according to the target place mark in the debugging picture;

and adjusting the size of the orthodontic coefficient according to the distortion rate, and controlling the aerial photography unmanned aerial vehicle to adjust the suspension height according to the adjusted orthodontic coefficient until the distortion rate of the debugging picture is smaller than a preset distortion rate threshold value.

Through this embodiment, can adjust the height of unmanned aerial vehicle that takes photo by plane before the beginning, guarantee that there is not the distortion in the target place scope in the real-time video stream of follow-up recording to improve the accuracy of the personnel's that detect and obtain motion trail data and movement distance.

For the same purpose as the foregoing embodiments, in other embodiments, before acquiring the real-time video stream obtained by the aerial photography drone in a downward shooting mode, the method further includes:

and carrying out distortion correction on the camera of the aerial photography unmanned aerial vehicle according to the distortion rate until the distortion rate of the debugging picture is smaller than a preset distortion rate threshold value.

This embodiment can be applied to the unmanned aerial vehicle that takes photo by plane that has the distortion correction function, can be through the adjustment the just abnormal parameter of unmanned aerial vehicle's camera that takes photo by plane carries out the distortion correction, and concrete implementation can be according to the model of the unmanned aerial vehicle that takes photo by plane of difference, adjust according to the product service description, and it is no longer repeated here, and it is also within the protection scope of this application. This embodiment can control the unmanned aerial vehicle that takes photo by plane and shoot at lower position to can obtain more clear, the bigger target place picture of proportion, help improving target detection and the accuracy and the efficiency of target tracking of personnel's image.

On the basis of any of the above embodiments, in some modified embodiments, the determining a distortion rate of the debugging screen according to the target location mark in the debugging screen includes:

identifying a target place mark in the debugging picture according to the color difference between the ground color of the target place and the target place mark;

detecting the length of a target site end line and the length of a target site center line in the target site mark;

and determining the distortion rate of the debugging picture according to the difference between the length of the end line of the target site and the length of the line in the target site.

The background color of the target location is generally green, a target location mark may be drawn in the target location, the target location mark may include a rectangle surrounding the periphery of the target location, wherein a longer one of the rectangles is called a side line, a shorter one of the rectangles is called an end line, a line segment parallel to the end line and located at the center of the target location is called a center line, the target location mark is generally white, and the target location mark in the debugging picture may be identified according to the color feature.

Then, the length of the end line of the target site and the length of the center line of the target site in the target site marker are measured, where the lengths may be pixel lengths (length in pixel coordinate system, unit is pixel) or physical lengths (length in plane coordinate system, unit is meter), and the method is not limited in this application, and all of them can be used for calculating distortion rate.

According to the method and the device, the distortion rate of the debugging picture can be measured only by identifying the target place mark in the debugging picture and then utilizing the length of the end line of the target place and the length of the center line of the target place, and the method and the device have the advantages of being simple in measuring mode and high in measuring efficiency.

Specifically, in some embodiments, the determining a distortion rate of the debugging screen according to a difference between a length of the end line of the target site and a length of the line in the target site includes:

determining the distortion rate of the debugging picture by adopting the following formula according to the length of the end line of the target place and the length of the center line of the target place:

wherein d represents a distortion rate, x₁Length of one side end line of target site, x₂Represents the length of the other end line of the target site, and y represents the length of the center line of the target site.

Wherein, the smaller the absolute value of d, the smaller the distortion rate, the distortion is absent when d is 0, and the distortion is present when d is greater than 0 or less than 0.

The distortion rate measured by the above embodiment fully considers the difference between the end lines and the middle line on both sides, and the relative value (ratio) is finally calculated, so that the distortion rate of the debugging picture can be really and effectively represented.

Further, those skilled in the art can flexibly change the embodiments with reference to the above-described embodiments, for example,

wherein d represents a distortion rate, x₁Representing a target siteLength of one side end line, x₂Represents the length of the other end line of the target site, and y represents the length of the center line of the target site.

Wherein d is 0, which indicates no distortion, d is greater than 1 or less than 1, which indicates distortion, and the larger the absolute value of the difference between d and 1, the more serious the distortion.

The above are merely exemplary illustrations, which are all within the scope of the present application.

Considering that, when the aerial drone is at the center of the target site, the captured pictures are aligned in the center, which helps to reduce the distortion of the pictures to the maximum extent and avoid the distortion in the range of the target site caused by the asymmetry of the captured pictures due to the fact that the aerial drone is off the center, in some modified embodiments, the method further includes: and controlling the aerial photography unmanned aerial vehicle to horizontally move according to the deviation information of the center point of the target place in the real-time video stream deviating from the center of the picture until the deviation amount of the center point of the target place deviating from the center of the picture is smaller than a preset deviation threshold value.

Wherein the offset information may be difference information between the coordinates of the center point of the target location and the coordinates of the center of the frame, for example, the pixel coordinate of the center point of the target location is (1050,550) the pixel coordinate of the center of the frame (1000,500), the offset information may be represented as (1050-

And if the preset offset threshold is 20 pixels, the aerial photography unmanned aerial vehicle needs to be controlled to horizontally move until the deviation amount of the central point of the target place from the center of the picture is less than 20 pixels of the preset offset threshold.

Through this embodiment, can ensure that the picture that unmanned aerial vehicle that takes photo by plane shoots keeps central array, help alleviating the picture distortion to the at utmost, avoid unmanned aerial vehicle that takes photo by plane skew center to lead to shooing the distortion of the target place within range that the picture asymmetry leads to ensure to the at utmost that the position of the personnel's image that obtains according to this real-time video stream discernment has higher accuracy, and then improves the accuracy of the movement distance that final detection obtained.

In other embodiments, before determining the motion trajectory of the person image in the real-time video stream by using a target detection algorithm and a target tracking algorithm based on color features according to the color difference between the person image in the real-time video stream and the ground color of the target site, the method further includes:

and aligning each frame in the real-time video stream according to the target place mark.

By the embodiment, each frame in the real-time video stream can be kept aligned, so that the accuracy of the finally detected motion track and motion distance is ensured.

In some variations of the embodiments of the present application, the identifying, based on color features, the image of the person in each frame of the real-time video stream according to the color difference between the person and the background color of the target site may include:

and identifying the personnel image in each frame of the real-time video stream by adopting a background difference method according to the color difference between the personnel image in the real-time video stream and the ground color of the target place.

Background Subtraction is a technique for detecting a motion region by using a difference between a current image and a Background image, and is a mainstream method for detecting a moving object at present. The algorithm is simple to implement, the subtraction result directly gives information such as the position, size, shape and the like of the target, complete description about a moving target area can be provided, the accuracy and sensitivity are high, and the performance is good. In the specific implementation, a person skilled in the art may refer to any background subtraction method provided in the prior art to flexibly modify the implementation, which can achieve the purpose of the embodiments of the present application and is within the protection scope of the present application.

Because the ground color of the target place is single, the personnel image in each frame of the real-time video stream can be accurately identified by using a background difference method, and the method has the advantages of high accuracy, simple operation and high efficiency.

In further variations, the identifying the image of the person in each frame of the real-time video stream based on the color feature according to the color difference between the person and the ground color of the target site may include:

and for each frame in the real-time video stream, carrying out binarization processing on the frame according to the color of the background color of the target place, and distinguishing the frame from the personnel image of the background color of the target place according to the binarization processing result.

Considering that the background color of the target site is relatively single, binarization processing may be performed on each frame based on a color interval of the background color, where, for example, pixels in the color interval (green) are assigned to white (RGB is: 255,255), pixels outside the color interval are assigned to black (RGB is: 0,0,0), and after binarization processing, the black image is a human image (the black image also includes other object images, and since the areas of the other objects are relatively small, the other object images may be screened out according to the size of the black image, and the rest are human images).

Because the background color of the target place is single, the method can accurately identify the personnel image in each frame of the real-time video stream, and because the binarization processing efficiency is high and the accuracy is high, the method also has the advantages of high accuracy, simple implementation and high efficiency.

In some modified embodiments of the embodiment of the present application, the determining a motion trajectory of the person image in the real-time video stream by using the person image as a tracking target and using a target tracking algorithm includes:

taking the personnel image as a tracking target, generating a target frame surrounding the tracking target in the current frame, and determining the central point of the target frame as a track point of the personnel image in the current frame;

calculating the image similarity of each candidate frame and the target frame;

determining the central point of the candidate frame with the highest image similarity as a track point of the person image in the next frame;

and after determining the track points of the personnel images in each frame, connecting the track points of the personnel images in each frame to form the motion track of the personnel images in the real-time video stream.

Because the real-time video stream is a plane image, the problem of shielding between people can hardly occur, and therefore, by adopting the embodiment, accurate tracking of each person can be realized, the movement track of the person image in the real-time video stream is accurately formed, and the movement distance is calculated.

In addition to the above embodiments, in some variations, the generating a plurality of candidate frames in the vicinity of the corresponding position of the target frame in the next frame may include:

predicting the predicted position of the personnel image in the next frame by adopting a motion trail prediction method according to the historical motion trail data of the personnel image;

determining a connecting line between a track point of the personnel image in the current frame and the predicted position;

a plurality of candidate boxes are generated along the connecting line.

The predicted position can be one or a plurality of positions, for example, the predicted position can be a position in the front, or a plurality of positions in three directions of the front, the left front and the right front, and the embodiment of the application is not limited.

In the embodiment, a monitoring scene multi-target tracking method is provided, and correspondingly, the application also provides a monitoring scene multi-target tracking device. The monitoring scene multi-target tracking device provided by the embodiment of the application can implement the monitoring scene multi-target tracking method, and can be implemented in a software, hardware or software and hardware combined mode. For example, the monitoring scene multi-target tracking apparatus may include integrated or separate functional modules or units to perform the corresponding steps of the above methods. Referring to fig. 2, a schematic diagram of a monitoring scene multi-target tracking device according to some embodiments of the present application is shown. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

As shown in fig. 2, the monitoring scene multi-target tracking apparatus 10 may include:

the video stream acquisition module 101 is used for acquiring a video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, wherein the aerial photography unmanned aerial vehicle is suspended above the target place, and a shooting field of view of the aerial photography unmanned aerial vehicle covers the target place;

a target identification module 102, configured to identify a target object in each frame of the video stream based on color characteristics;

the target tracking module 103 is configured to determine a motion trajectory of the target object in the video stream by using a target tracking algorithm with the target object as a tracking target;

a motion trajectory determining module 104, configured to determine, according to a motion trajectory of the target object in the video stream, motion trajectory data of the target object in a planar coordinate system;

and the movement distance determining module 105 is configured to determine a movement distance of the target object according to the movement trajectory data of the target object.

In some variations of the embodiments of the present application, the object recognition module 102 includes:

and the target identification unit is used for identifying the target object in each frame of the video stream by adopting a background difference method.

In some variations of the embodiments of the present application, the target tracking module 103 includes:

the target frame determining unit is used for taking the target object as a tracking target, generating a target frame surrounding the tracking target in the current frame, and determining the central point of the target frame as a track point of the target object in the current frame;

a candidate frame generating unit configured to generate a plurality of candidate frames in the vicinity of the corresponding position of the target frame in a next frame;

a similarity calculation unit for calculating the image similarity of each of the candidate frames to the target frame;

the track point determining unit is used for determining the central point of the candidate frame with the highest image similarity as the track point of the target object in the next frame;

and the motion track determining unit is used for connecting the track points of the target object in each frame after determining the track points of the target object in each frame to form a motion track of the target object in the video stream.

In some variations of the embodiments of the present application, the candidate frame generating unit includes:

the position prediction subunit is used for predicting the predicted position of the target object in the next frame by adopting a motion trail prediction device according to the historical motion trail data of the target object;

a connecting line determining subunit, configured to determine a connecting line between the trajectory point of the target object in the current frame and the predicted position;

and the connecting line candidate frame determining subunit is used for generating a plurality of candidate frames along the connecting line.

The embodiment of the present application is described below with reference to a specific application scenario, where the monitoring scene multi-target tracking apparatus may be configured to detect a moving distance of a person based on a video stream, and may detect moving distances of a plurality of persons, and the monitoring scene multi-target tracking apparatus may include:

the video stream acquisition module is used for acquiring a real-time video stream obtained by a target place of aerial photography of an unmanned aerial vehicle, the aerial photography unmanned aerial vehicle is suspended above the center of the target place, and the shooting field of view of the aerial photography unmanned aerial vehicle covers the target place;

the target identification module is used for identifying the personnel image in each frame of the real-time video stream based on color features according to the color difference between personnel and the background color of the target place;

the target tracking module is used for determining the motion track of the personnel image in the real-time video stream by taking the personnel image as a tracking target and adopting a target tracking algorithm;

the motion track determining module is used for determining motion track data of personnel corresponding to the personnel images according to the motion tracks of the personnel images in the real-time video stream based on the mapping relation between the pixel coordinate system of the real-time video stream and the plane coordinate system of the target place;

and the movement distance determining module is used for determining the movement distance of the personnel according to the movement track data of the personnel.

In some variations of embodiments of the present application, a levitation height of the aerial photography drone relative to the target site satisfies the following condition:

In some variations of the embodiments of the present application, the apparatus further comprises:

the debugging picture acquisition module is used for acquiring a debugging picture obtained by a bent target place of the aerial unmanned aerial vehicle before starting;

the distortion rate determining module is used for determining the distortion rate of the debugging picture according to the target place mark in the debugging picture;

and the suspension height adjusting module is used for adjusting the size of the orthodontic coefficient according to the distortion rate and controlling the aerial photography unmanned aerial vehicle to adjust the suspension height according to the adjusted orthodontic coefficient until the distortion rate of the debugging picture is smaller than a preset distortion rate threshold value.

and the distortion correction module is used for carrying out distortion correction on the camera of the aerial photography unmanned aerial vehicle according to the distortion rate until the distortion rate of the debugging picture is smaller than a preset distortion rate threshold value.

In some variations of embodiments of the application, the distortion rate determination module includes:

the target place mark identification unit is used for identifying the target place mark in the debugging picture according to the color difference between the ground color of the target place and the target place mark;

the mark length detection unit is used for detecting the length of a target place end line and the length of a target place center line in the target place mark;

and the distortion rate calculation unit is used for determining the distortion rate of the debugging picture according to the difference between the length of the end line of the target site and the length of the line in the target site.

In some variations of embodiments of the application, the distortion rate calculating unit includes:

a distortion rate calculating subunit, configured to determine, according to the length of the end line of the target site and the length of the center line of the target site, a distortion rate of the debugging screen by using the following formula:

In some variations of embodiments of the present application, the apparatus further comprises:

and the offset adjusting module is used for controlling the aerial photography unmanned aerial vehicle to horizontally move according to the offset information of the center point of the target place deviating from the center of the picture in the real-time video stream until the deviation amount of the center point of the target place deviating from the center of the picture is smaller than a preset offset threshold value.

and the alignment processing module is used for performing alignment processing on each frame in the real-time video stream according to the target place mark.

In some variations of embodiments of the present application, the object identifying module includes:

and the first target identification unit is used for identifying the personnel image in each frame of the real-time video stream by adopting a background difference method according to the color difference between the personnel image in the real-time video stream and the background color of the target place.

and the second target identification unit is used for carrying out binarization processing on each frame in the real-time video stream according to the color of the background color of the target place, and distinguishing the frame from the personnel image of the background color of the target place according to the binarization processing result.

In some variations of embodiments of the present application, the target tracking module includes:

the target frame determining unit is used for generating a target frame surrounding the tracking target in the current frame by taking the personnel image as the tracking target, and determining the central point of the target frame as a track point of the personnel image in the current frame;

the track point determining unit is used for determining the central point of the candidate frame with the highest image similarity as the track point of the person image in the next frame;

and the motion trail determining unit is used for connecting the track points of the personnel images in each frame after determining the track points of the personnel images in each frame to form the motion trail of the personnel images in the real-time video stream.

the position prediction subunit is used for predicting the predicted position of the personnel image in the next frame by adopting a motion trail prediction device according to the historical motion trail data of the personnel image;

a connecting line determining subunit, configured to determine a connecting line between the trajectory point of the person image in the current frame and the predicted position;

The monitored scene multi-target tracking device provided by the embodiment of the application and the monitored scene multi-target tracking method provided by the embodiment of the application have the same beneficial effects from the same inventive concept.

The embodiment of the present application further provides a monitored scene multi-target tracking system corresponding to the monitored scene multi-target tracking method provided by the foregoing embodiment, where the monitored scene multi-target tracking system includes: the system comprises an aerial photography unmanned aerial vehicle 1 and background data processing equipment 2 connected with the aerial photography unmanned aerial vehicle; wherein the content of the first and second substances,

the aerial photography unmanned aerial vehicle 1 is suspended above a target place, is used for taking a picture of the target place in a bent manner, and sends a video stream obtained by shooting to the background data processing equipment 2;

the background data processing device 2 is configured to determine a movement distance of a target object in the target location according to the video stream by using the monitored scene multi-target tracking method provided in any of the above embodiments of the present application, and output the movement distance of each target object.

Specifically, the target site may include a target site, the video stream may include a real-time video stream, the target object may include a person, the monitored scene multi-target tracking system may be configured to detect a moving distance of the person based on the video stream, may detect moving distances of a plurality of persons, and accordingly,

the aerial photography unmanned aerial vehicle 1 is suspended above a target place, is used for taking a picture of the target place in a depression mode, and sends a real-time video stream obtained through shooting to the background data processing equipment;

the background data processing device 2 is configured to generate a movement distance of each person in the target site by using the monitored scene multi-target tracking method provided in any of the embodiments of the present application according to the real-time video stream, and output the movement distance of each person.

Please refer to fig. 3, which illustrates a schematic diagram of a background data processing apparatus according to some embodiments of the present application. As shown in fig. 3, the background data processing apparatus 2 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the monitoring scene multi-target tracking method provided by any one of the foregoing embodiments when executing the computer program.

The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the monitoring scene multi-target tracking method disclosed by any embodiment of the foregoing application may be applied to the processor 200, or implemented by the processor 200.

The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.

The monitored scene multi-target tracking system provided by the embodiment of the application has the same beneficial effects as the monitored scene multi-target tracking method provided by the embodiment of the application based on the same inventive concept.

It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims

1. A multi-target tracking method for a monitored scene is characterized by comprising the following steps:

2. The method of claim 1, wherein the identifying a target object in each frame of the video stream based on color features comprises:

3. The method according to claim 1, wherein the determining the motion trajectory of the target object in the video stream by using a target tracking algorithm with the target object as a tracking target comprises:

calculating the image similarity of each candidate frame and the target frame;

4. The method according to claim 3, wherein the generating a plurality of candidate frames in the vicinity of the corresponding position of the target frame in the next frame comprises:

a plurality of candidate boxes are generated along the connecting line.

5. The method of claim 1, wherein prior to identifying the target object in each frame of the video stream based on the color feature, further comprising:

6. A monitored scene multi-target tracking apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the object recognition module comprises:

8. The apparatus of claim 6, wherein the target tracking module comprises:

9. The apparatus of claim 8, wherein the candidate frame generating unit comprises:

10. A monitored scene multi-target tracking system is characterized by comprising: the system comprises an aerial photography unmanned aerial vehicle and background data processing equipment connected with the aerial photography unmanned aerial vehicle; wherein the content of the first and second substances,

the background data processing device is used for determining the movement distance of the target object in the target place by adopting the method of any one of claims 1 to 5 according to the video stream and outputting the movement distance of each target object.