CN116342642A

CN116342642A - Target tracking method, device, electronic equipment and readable storage medium

Info

Publication number: CN116342642A
Application number: CN202310310226.0A
Authority: CN
Inventors: 陈旭东; 牛永岭; 谢思敏; 张洪光
Original assignee: TP Link Technologies Co Ltd
Current assignee: TP Link Technologies Co Ltd
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-06-27

Abstract

The application is applicable to the technical field of monitoring, and provides a target tracking method, a device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: performing foreground identification on the acquired video frames to obtain a foreground region of each video frame; generating a foreground track of the same foreground region based on the foreground region of each video frame; carrying out partial matting detection on a foreground region in the foreground track to obtain a position detection result of a target to be tracked in the foreground region; tracking a target to be tracked in a video frame based on the foreground track and the position detection result; based on the foreground track and the local matting, the target to be tracked in the video frame is detected, so that more stable tracking can be kept aiming at a far smaller target, the tracking distance limit is reduced, and the probability of missing the far target is reduced.

Description

Target tracking method, device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of monitoring, and particularly relates to a target tracking method, a target tracking device, electronic equipment and a readable storage medium.

Background

With the development of monitoring technology, the gun-ball linkage type monitoring equipment is widely applied. The gun-ball linkage type is generally composed of a panoramic gun machine and a rotatable tripod head machine, and has the monitoring advantage of considering panoramic images and covering local details.

In some application scenes, in order to cover a larger monitoring range, a gun camera picture is often in a binocular or multi-view splicing mode, and the installation height of equipment in the scenes is also higher, so that a distant target occupies a smaller picture.

At present, a common gun-ball linkage tracking scheme mainly depends on a panoramic target detection algorithm, target identification and positioning are carried out in a panoramic picture of a gun camera, then the gun camera coordinates are converted to obtain corresponding coordinates of a ball camera, and then the ball camera is rotated to track a target. However, for far smaller targets, problems of unstable tracking, limited tracking distance, and missing far targets are easily caused.

Disclosure of Invention

The embodiment of the application provides a target tracking method, a target tracking device, electronic equipment and a readable storage medium, which can solve the problems of unstable tracking, limited tracking distance and missing of a remote target.

In a first aspect, the present application provides a target tracking method, which may include:

performing foreground detection on the acquired video frames to obtain a foreground region of each video frame; generating a foreground track of the same foreground region based on the foreground region of each video frame; carrying out partial matting detection on a foreground region in the foreground track to obtain a position detection result of a target to be tracked in the foreground region; and tracking the target to be tracked in the video frame based on the foreground track and the position detection result.

In a possible implementation manner of the first aspect, before performing foreground detection on the acquired video frames to obtain a foreground area of each video frame, the method further includes:

acquiring a panoramic image of a target scene; and performing size transformation processing on the panoramic image to obtain a video frame.

In a possible implementation manner of the first aspect, performing foreground detection on the acquired video frames to obtain a foreground area of each video frame includes:

comparing the pixel brightness of the acquired video frame with that of the historical video frame to obtain a foreground region of the video frame; the historical video frames are video frames acquired before the video frames.

In a possible implementation manner of the first aspect, generating a foreground track of a same foreground region based on the foreground region of each video frame includes:

connecting center points of the same foreground areas in a plurality of continuous video frames to obtain a foreground track; the foreground track comprises track identification and foreground region information.

In a possible implementation manner of the first aspect, the performing local matting detection on the foreground area in the foreground track to obtain a position detection result of the target to be tracked in the foreground area includes:

Preprocessing a foreground region, and performing partial matting processing on the preprocessed foreground region to obtain a first partial image containing a target to be tracked; and carrying out target detection on the first partial image to obtain a position detection result of the target to be tracked.

In a possible implementation manner of the first aspect, after performing object detection on the first local image to obtain a position detection result of the object to be tracked, the method further includes:

comparing the position detection result corresponding to the current video frame with the position detection result corresponding to the historical video frame to obtain a detection frame change value; and if the change value of the detection frame is within the preset threshold range, tracking the target to be tracked based on the position detection result of the current video frame.

In a possible implementation manner of the first aspect, the method further includes:

after a first position detection result of a target to be tracked in a first video frame is obtained, track prediction is carried out based on the first position detection result, and a position prediction result of the target to be tracked at the next moment is obtained; based on the position prediction result, carrying out partial matting processing on the second video frame to obtain a second partial image of the second video frame; performing target detection on the second partial image to obtain a second position detection result of the target to be tracked; tracking a target to be tracked in a second video frame based on a second position detection result; the first video frame and the second video frame are two video frames adjacent to each other in front and back in the foreground track.

In a second aspect, embodiments of the present application provide an object tracking device, which may include:

the first detection unit is used for carrying out foreground detection on the acquired video frames to obtain a foreground region of each video frame;

a track unit, configured to generate a foreground track of the same foreground region based on the foreground region of each video frame;

the second detection unit is used for carrying out partial matting detection on the foreground region in the foreground track to obtain a position detection result of the target to be tracked in the foreground region;

and the tracking unit is used for tracking the target to be tracked in the video frame based on the foreground track and the position detection result.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product for causing a terminal device to perform the method of the first aspect described above when the computer program product is run on the terminal device.

It will be appreciated that the advantages of the second to fifth aspects described above may be referred to in the description related to the first aspect, and will not be described here again.

Compared with the related art, the application has the beneficial effects that: according to the embodiment of the application, the foreground region of each video frame is obtained by carrying out foreground identification on the obtained video frames; generating a foreground track of the same foreground region based on the foreground region of each video frame; carrying out partial matting detection on a foreground region in the foreground track to obtain a position detection result of a target to be tracked in the foreground region; tracking a target to be tracked in a video frame based on the foreground track and the position detection result; based on the foreground track and the local matting, the targets in the video frame are detected, so that more stable tracking can be kept for the targets with smaller distance, the tracking distance limit is reduced, and the probability of missing the targets with the distance is reduced; has stronger usability and practicability.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of a monitoring device provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a target tracking method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a foreground trajectory provided by an embodiment of the present application;

fig. 4 is a schematic diagram of matting detection provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a change in a detection frame according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of trajectory prediction provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a processing flow of each module provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Currently, in the monitoring field, a monitoring device performs target identification and positioning in a panoramic picture in an active tracking manner, and then performs tracking. Especially, aiming at a common gun-ball linkage tracking scene, after a panoramic picture is acquired by a gun camera, target identification and positioning are carried out, then the coordinates of the gun camera are converted to obtain coordinates corresponding to the ball camera, and target tracking is carried out through the rotation of the ball camera.

However, if the panoramic picture is detected by relying on the traditional foreground detection algorithm, the recognition rate of a far small target is poor, so that the triggering distance of a tracking event is short, and the problem of short stable tracking distance of the target is generated; if the self-organizing background detection background modeling method or the foreground detection method is applied, although the advantage of long tracking distance can be triggered, the method can track an object moving in a picture as a foreground and cannot be used for fine recognition, so that a specific required target cannot be selectively tracked.

Therefore, in the related art, aiming at a panoramic-dependent target detection algorithm, the detection of a far smaller target is unstable, so that the far target is easy to miss, the problem of limited tracking distance can be caused, and the use scene of gun-ball linkage is limited; and the traditional foreground detection algorithm aiming at the panoramic picture can not finely distinguish the types of the tracking targets, and false triggering or false tracking can be caused.

In view of the above problems, the embodiments of the present application provide a target tracking method, and a specific implementation procedure of the target tracking method is described below through embodiments.

Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario of a monitoring device provided in an embodiment of the present application. As shown in fig. 1, the monitoring device may be a gun-ball linkage type, which generally includes a panoramic gun camera and a rotatable tripod head ball machine, wherein a panoramic image is obtained through the panoramic gun camera, a target is identified and positioned, and rotation tracking is performed through the rotatable tripod head ball machine; the method has the monitoring advantage of considering panoramic images and covering local details.

The linkage tracking is an important function of a gun-ball linkage type, and can comprise manual tracking and event active tracking. The manual tracking means that after the calibration of the gun ball system is completed, any position or area in a panoramic picture of a gun camera is selected, the ball rotates to a corresponding position, zooming and amplification are properly carried out, and local details can be presented. And after the calibration of the gun ball system is completed, setting certain areas of a gun camera monitoring picture as warning areas and starting an intelligent detection event, and when a target enters the areas and triggers the intelligent detection event, the ball can rotate to the position of a tracking target and continuously rotate along with the advancing direction of the target until the target disappears or exceeds the farthest tracking distance.

The monitoring frames of the rifle bolt have the characteristics of wide monitoring view angle and large monitoring range, and in certain special application scenes, such as viaducts, open squares and the like, binocular or multi-view splicing is often adopted for the frames of the rifle bolt to obtain a larger monitoring range, and the installation height of equipment in the scenes is also higher, so that a distant target occupies a smaller picture.

It should be noted that fig. 1 only schematically illustrates the type of the monitoring device, and the monitoring device in fig. 1 is not limited thereto, and other types of monitoring devices are also possible.

Based on the application scenario, the embodiment of the application provides a target tracking method. The following describes a specific procedure implemented by the method through embodiments of the present application.

Referring to fig. 2, fig. 2 is a schematic flow chart of a target tracking method according to an embodiment of the present application, which can be applied to the above-mentioned monitoring device, and can solve the problems of short tracking distance, unstable tracking and easy omission of small targets in the related art. As shown in fig. 2, the monitoring device is taken as an execution subject, and the method comprises the following steps:

s201, performing foreground detection on the acquired video frames to obtain a foreground region of each video frame.

In some embodiments, the monitoring device may collect continuous video frames through the camera, and perform foreground detection on the continuous video frames to obtain a foreground region of each video frame.

The foreground region of each video frame can be obtained by a foreground detection algorithm. The information of the foreground region may include a time stamp of the video frame and foreground region position information; for example, the foreground region position information may be represented by coordinates (x, y, w, h), where x and y are used to represent coordinates at a certain vertex of the foreground region, and w and h respectively represent distances of the foreground region in the x-axis and y-axis directions based on the foregoing vertex coordinates in the coordinate system.

In some embodiments, before performing foreground detection on the acquired video frames to obtain a foreground region of each video frame, the method further includes:

For example, in order to reduce the amount of computation, the size (pixel or size) of the video frame for foreground detection may be smaller than the size of the original image collected by the machine gun, so after the continuous panoramic image of the target scene is obtained by the machine gun, each frame panoramic image is subjected to size transformation (for example, image cropping) to obtain a video frame conforming to the foreground detection.

In some embodiments, performing foreground detection on the acquired video frames to obtain a foreground region of each video frame includes:

comparing the pixel brightness of the acquired video frame with that of the historical video frame to obtain a foreground region of the video frame; the historical video frame is a video frame acquired before the video frame for foreground detection.

The foreground area of each video frame is detected through a background modeling algorithm, for example, the brightness value corresponding to each pixel point is compared by the current video frame and the history video frame acquired before, and the brightness difference corresponding to each pixel point is obtained; and determining whether to take the pixel points compared in the current video frame as the pixel points of the foreground region or not based on the brightness difference and a preset brightness difference threshold value. The historical video frame may be one or more frames of video adjacent to the current video frame.

In the above manner, all the foreground regions in the picture of the current video frame can be detected, as shown in fig. 3, each of the foreground regions in the video frames 1 to 5; it should be noted that, fig. 3 is only illustrative, and the detected foreground region in each video frame may further include a plurality of foreground regions, for example, the region where the moving object in each frame is located is identified as the foreground region, so that the foreground region (or the moving region) in the current video frame may be determined based on the historical video frame; the detection of the foreground region of the video frame has the characteristics of long detection distance and high sensitivity.

S202, generating a foreground track of the same foreground region based on the foreground region of each video frame.

In some embodiments, a monitoring device (e.g., a bolt face) may maintain foreground regions in successive video frames to generate a foreground trajectory. Each video frame may include a plurality of foreground regions, and for the plurality of foreground regions, a foreground track of the same foreground region in consecutive video frames may be generated, respectively, so that a plurality of foreground tracks corresponding to the plurality of foreground regions may be generated.

Illustratively, as shown in fig. 3, among the video frames 1 to 5, the foreground trajectories generated for the same foreground region; wherein fig. 3 illustrates only an example, a plurality of foreground regions may also be included in each video frame, such that a plurality of foreground tracks of successive video frames may be generated.

In some embodiments, generating a foreground trajectory for a same foreground region based on a foreground region of each video frame includes:

Illustratively, the forward track is a queue of identical foreground regions in successive video frames. The foreground region may change in position or size in successive video frames along with the motion state of the object in the picture, so that the connection is performed based on the center point of the same foreground region in each video frame, and a foreground track is obtained.

The same foreground region may be a region formed by pixels with the same brightness value in the foreground region in the picture. For a moving object, the positions of the areas corresponding to the pixels with the same brightness value are correspondingly changed along with the change of the acquisition time. Each foreground track may thus correspond to a unique track identification and foreground region information.

By way of example, the track identity may include a track identity and a track generation time stamp, such as the foreground track 1 and a specific generation time of the foreground track 1. The foreground region information may represent its range by a rectangular box, and for each foreground region in the video frame, there is its corresponding area region, and the range may be identified by left, top, right, bottom coordinates of the rectangular box.

Forming a foreground track based on a foreground region of a continuous video frame, so that a series of maintenance tracking is conveniently carried out on a region where a target to be tracked is located; when a plurality of targets to be tracked exist, the foreground tracks corresponding to different targets to be tracked can be identified and tracked, and the probability of error tracking is reduced.

S203, carrying out partial matting detection on the foreground region in the foreground track to obtain a position detection result of the target to be tracked in the foreground region.

In some embodiments, local matting detection is performed on a foreground region corresponding to a video frame in a foreground track, and a position detection result of an object to be tracked in the video frame is identified.

Illustratively, the monitoring device (gun camera) may analyze the foreground track (foreground region queue), perform expansion or scaling treatment on the foreground region frame, and adjust the size of the foreground region frame to the size corresponding to the input image of the intelligent detection algorithm; as shown in fig. 4, when the foreground region frame is smaller than the size of the input image of the intelligent detection algorithm (the foreground region frame is smaller), the foreground region frame may be enlarged or expanded, and when the foreground region frame is larger than the size of the input image of the intelligent detection algorithm (the foreground region frame is larger), the foreground region frame may be contracted.

In some embodiments, performing local matting detection on a foreground region in a foreground track to obtain a position detection result of an object to be tracked in the foreground region, where the method includes:

For example, the monitoring device may traverse and local matting the foreground region queue for each foreground trajectory. Preprocessing a foreground region, filtering an abnormal foreground region frame, primarily screening foreground tracks, and screening the size of the foreground region of each foreground track; for example, eliminating foreground areas smaller than a foreground area threshold, wherein the corresponding foreground tracks are invalid tracks, and traversing and local matting detection cannot be performed; for the foreground region corresponding to the effective foreground track (the foreground region frame meets the foreground region threshold), the foreground region is scaled or expanded, as shown in fig. 4, the region containing the target to be tracked (or the foreground target) is subjected to local matting to obtain a first local image, and then the first local image is subjected to scale change to complete local matting detection, so that a position detection result of the target to be tracked is obtained. Correspondingly, after the gun camera recognizes the position detection result of the target to be tracked, the gun camera can be sent to the rotatable tripod head ball camera to track the target to be tracked in real time.

For example, before determining a position detection result of an object to be tracked, the monitoring device performs forward reasoning detection based on the input first partial image, identifies an object to be tracked such as a person/vehicle which may exist in a foreground region, and records a target detection result corresponding to the foreground track; in the subsequent recognition process, carrying out prediction calculation on information such as the advancing direction and the like of the target detection result based on the foreground track; so as to realize continuous and stable tracking of the target.

Illustratively, based on forward reasoning detection, it is detected whether there are target types in the foreground region that need to be tracked, such as people, vehicles, non-motor vehicles, etc. Through a lightweight target detection network, an original yolov4-tiny model architecture is improved, a CSP (Cross Stage Partial) structure is added between a main trunk and a detection head, an original network structure is deepened, and the learning capacity and the feature extraction capacity of the model are improved. The target detection result output based on the forward reasoning detection may include position information (such as a position detection result represented by a rectangular detection frame), a target detection category, and a confidence.

Meanwhile, when a plurality of targets exist, the same detection target needs to be stably tracked, but when more than one target is included in a scenic spot (such as crowds or two people face each other and the trails intersect), the targets need to be distinguished through target detection. When the position of the target is detected, the position information of a detection frame of a previous frame tracking target is reserved, the detection result of the current frame is compared with the detection frame of the previous frame one by one, the cross ratio IoU (Intersection over Union) is calculated, the detection frame with the largest IoU and the same target type is selected, and the detection frames with the same IoU or different target types are filtered, so that the detection of a single target (the same target) is realized, and the probability of track misconnection in the tracking process is reduced.

In some embodiments, after performing object detection on the first partial image to obtain a position detection result of the object to be tracked, the method further includes:

For example, to reduce the probability of track misconnection or target miscracking, the monitoring device may filter the position detection results corresponding to the current video frame. The position detection result can be represented by outputting the size and the position of a detection frame, the size of the detection frame can correspond to the distance of the target to be tracked, and the same target to be tracked is positioned at different positions or distances according to different video frames, so that the size of the detection frame of the target to be tracked corresponding to the current video frame is compared with the size of the detection frame corresponding to the historical video frame, and if the size change exceeds a preset threshold, the target to be tracked of the current video frame is not tracked. For example, comparing the detection frame corresponding to the current video frame with the detection frame corresponding to the previous five frames of video frames, if the size change range of the detection frame of the current video frame exceeds 30% of the average value of the sizes of the detection frames corresponding to the previous five frames, not tracking the target to be tracked of the current video frame, and if the size change range of the detection frame of the current video frame is less than 30%, continuing to track based on the position detection result of the current video frame; thereby reducing the probability of target tracking errors.

As shown in fig. 5, when the detection frame 1 is changed into the detection frame 2, and based on the judgment of the size change of the detection frame, determining that the detection frame 2 meets the preset threshold, tracking the target based on the position detection result corresponding to the detection frame 2; when the detection frame 1 changes to the detection frame 3, and the size change of the detection frame exceeds a preset threshold, the target tracking is not continued based on the position detection result corresponding to the detection frame 3.

It should be noted that, fig. 5 illustrates only the change of the detection frame, and it is understood that different detection frames correspond to different video frames, and the video frames in fig. 5 are only illustrated schematically; for the situation of track misconnection or target miscracking, the judgment can be performed based on the position change of the detection frame, and the target tracking can be performed without being based on the current detection frame because the position change of the detection frame is compared with the position change based on track prediction and exceeds the threshold range.

Since the size of the tracking target may be changed, that is, the size of the detection frame is changed in real time according to the size of the detection target, if the target moves to a distance rapidly, the size change of the detection frame may be obvious in adjacent frames, and the probability of abnormal tracking caused by that the current tracking target is mistakenly jumped to the distant target when the tracks of the two targets at a distance and a near are crossed is reduced through the judgment of the sliding average of the detection frame.

S204, tracking the target to be tracked in the video frame based on the foreground track and the position detection result.

Illustratively, object tracking is a continuous process that requires stable tracking of the object until the object disappears. The position detection result of the target to be tracked is updated in real time, and the target to be tracked is tracked stably and continuously by detecting the possible position of the target to be tracked in the next frame of video frame based on the position of the target to be tracked detected in the current video frame and the prediction of the target to be tracked in the advancing direction based on the foreground track.

In a possible implementation manner, the method further includes: after a first position detection result of a target to be tracked in a first video frame is obtained, track prediction is carried out based on the first position detection result, and a position prediction result of the target to be tracked at the next moment is obtained; based on the position prediction result, carrying out partial matting processing on the second video frame to obtain a second partial image of the second video frame; performing target detection on the second partial image to obtain a second position detection result of the target to be tracked; tracking a target to be tracked in a second video frame based on a second position detection result; the first video frame and the second video frame are two video frames adjacent to each other in front and back in the foreground track.

For example, as shown in fig. 6, the first video frame and the second video frame may be two consecutive video frames. After detecting a target to be tracked in a first video frame to obtain a target detection result, predicting the possible position of the target to be tracked at the next moment (in the next video frame) based on the first position detection result and a foreground track in the target detection result to obtain a position prediction result, and performing partial matting on a second video frame based on the position prediction result to obtain a second partial image; then carrying out matting detection on the second partial image to obtain a target detection result corresponding to the second video frame, such as a second position detection result of the target to be tracked in the second video frame; tracking a target to be tracked in a second video frame based on a second position detection result; and continuous and stable tracking of the target to be tracked is realized.

In some embodiments, in the process of moving the target to be tracked, the gun of the monitoring device can synchronously send the position detection result to the ball machine, and the ball machine obtains updated coordinates of the target to be tracked, and rotates to the target position for tracking after the coordinates are converted by the coordinate system.

For example, coordinate conversion is performed because the camera and the dome camera are a combined system, which forms a fixed panoramic camera and a movable PTZ dome camera (Pan/Tilt Zoom, representing Pan-Tilt Pan Zoom, zoom control), and when a moving object is tracked in the camera, the dome camera needs to be rotated to a position such that the tracked object is imaged in the center of the dome camera and kept at a certain size. Correspondingly, any pixel coordinate in the gun camera is corresponding to the picture of the dome camera (a mapping value is needed), and the gun camera is converted into a PTZ value of the target imaged at the center of the picture of the dome camera based on the mapping value; the calibration process of the gun ball is a process of establishing a corresponding relation between a gun camera and a picture of the ball camera, generally, a plurality of pairs of calibration points can be selected for calibration, and after the calibration is completed, a coordinate transformation matrix can be obtained and used for corresponding the positions of the pixels in the gun camera and the pixels in the picture of the ball camera.

As shown in fig. 7, the embodiment of the present application provides a schematic diagram of a processing flow of each module; in the target tracking process realized based on each module, detecting the video frame through a foreground detection module to obtain foreground region information; inputting foreground region information into a foreground track module to generate a foreground track queue; the foreground track queue is input to a tracking module, local matting is carried out through the tracking module based on the foreground track, a local image is obtained, and the local image is input to a target detection module; detecting a target to be tracked through a target detection module to obtain a target detection result (comprising type and position information of the target to be tracked); inputting a target detection result into a track prediction module, and predicting the traveling direction of the target to be tracked through the track prediction module to obtain the track of the target to be tracked; and inputting the target track to be tracked into a tracking module for target tracking. The tracking module can also perform partial matting on the video frame based on the track prediction result to obtain a partial image, and the partial image is input to the target detection module for detection.

By the embodiment of the application, the sensitivity of triggering the tracking event is higher; the foreground track has the characteristics of short generation time and high triggering sensitivity, and the detection capability of the foreground detection algorithm for small targets is better than that of the full-image intelligent detection algorithm, so that the target detection is more sensitive, and the triggering distance is longer. Meanwhile, the linkage tracking distance is farther; a gun-ball linkage scheme based on local matting is adopted, and a local area which is input as a panoramic image is used as a detection object; under the condition that the input of the detection model is fixed, when the local area is used as the input, the image size transformation is smaller than the full-image input, so that the duty ratio of a small target in the image after the scale transformation is ensured, and the remote small target can be effectively detected, thereby realizing the tracking at a longer distance. The tracking process is more stable, and the track error connection is less; the abnormal frame filtering and single-target detection strategy adopted in the gun-ball linkage tracking process adopting the local detection scheme avoids the error connection to other targets in the tracking process, ensures the stability of the tracking process, and is more stable compared with other gun-ball linkage strategies.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Based on the above target tracking method, the embodiment of the present application further provides a target tracking device, as shown in fig. 8, where the device may include:

a first detecting unit 81, configured to perform foreground detection on acquired video frames, so as to obtain a foreground area of each video frame;

a track unit 82, configured to generate a foreground track of the same foreground region based on the foreground region of each video frame;

a second detecting unit 83, configured to perform partial matting detection on the foreground region in the foreground track, so as to obtain a position detection result of the target to be tracked in the foreground region;

and a tracking unit 84, configured to track the target to be tracked in the video frame based on the foreground track and the position detection result.

In a possible implementation manner, the apparatus further includes an image processing unit, configured to acquire a panoramic image of the target scene; and performing size transformation processing on the panoramic image to obtain a video frame.

In a possible implementation manner, the first detection unit is further configured to compare pixel brightness of the acquired video frame with that of the historical video frame to obtain a foreground area of the video frame; the historical video frames are video frames acquired before the video frames.

In a possible implementation manner, the track unit is further configured to connect center points of the same foreground regions in a plurality of continuous video frames to obtain a foreground track; the foreground track comprises track identification and foreground region information.

In a possible implementation manner, the second detection unit is further configured to perform preprocessing on the foreground area, and perform local matting processing on the preprocessed foreground area to obtain a first local image containing the target to be tracked; and carrying out target detection on the first partial image to obtain a position detection result of the target to be tracked.

In one possible implementation manner, the second detecting unit is further configured to compare a position detection result corresponding to the current video frame with a position detection result corresponding to the historical video frame to obtain a detection frame change value; and if the change value of the detection frame is within the preset threshold range, tracking the target to be tracked based on the position detection result of the current video frame.

In one possible implementation manner, the device further includes a track prediction unit, configured to perform track prediction based on the first position detection result after obtaining the first position detection result of the target to be tracked in the first video frame, so as to obtain a position prediction result of the target to be tracked at the next moment.

In a possible implementation manner, the second detection unit is further configured to perform local matting processing on the second video frame based on the position prediction result, so as to obtain a second local image of the second video frame; and performing target detection on the second partial image to obtain a second position detection result of the target to be tracked.

In a possible implementation manner, the tracking unit is further configured to track the target to be tracked in the second video frame based on the second position detection result; the first video frame and the second video frame are two video frames adjacent to each other in front and back in the foreground track.

According to the embodiment of the application, foreground identification is carried out on the obtained video frames, and a foreground area of each video frame is obtained; generating a foreground track of the same foreground region based on the foreground region of each video frame; carrying out partial matting detection on a foreground region in the foreground track to obtain a position detection result of a target to be tracked in the foreground region; tracking a target to be tracked in a video frame based on the foreground track and the position detection result; based on the foreground track and the local matting, targets in the video frame are detected, so that more stable tracking can be kept aiming at the targets with smaller distances, the tracking distance limit is reduced, and the probability of missing the targets with the distances is reduced.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.

Fig. 9 is a schematic structural diagram of an electronic device 9 according to an embodiment of the present application. As shown in fig. 9, the electronic apparatus 9 of this embodiment includes: at least one processor 90 (only one is shown in fig. 9), a memory 91 and a computer program 92 stored in the memory 91 and executable on the at least one processor 90, the steps of the above embodiments being implemented when the processor 90 executes the computer program 92.

The electronic device 9 is a computing device that can be monitored and tracked. The electronic device 9 may include, but is not limited to, a processor 90, a memory 91. It will be appreciated by those skilled in the art that fig. 9 is merely an example of the electronic device 9 and is not meant to be limiting of the electronic device 9, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 90 may be a central processing unit (Central Processing Unit, CPU), the processor 90 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 91 may in some embodiments be an internal storage unit of the electronic device 9, such as a hard disk or a memory of the electronic device 9. The memory 91 may in other embodiments also be an external storage device of the electronic device 9, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 9. Further, the memory 91 may also include both an internal storage unit and an external storage device of the electronic device 9. The memory 91 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 91 may also be used for temporarily storing data that has been output or is to be output.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of target tracking, the method comprising:

performing foreground detection on the acquired video frames to obtain a foreground region of each video frame;

generating a foreground track of the same foreground region based on the foreground region of each video frame;

Performing local matting detection on the foreground region in the foreground track to obtain a position detection result of a target to be tracked in the foreground region;

and tracking the target to be tracked in the video frame based on the foreground track and the position detection result.

2. The method of claim 1, wherein prior to said foreground detecting of the acquired video frames to obtain a foreground region for each of the video frames, the method further comprises:

acquiring a panoramic image of a target scene;

and performing size transformation processing on the panoramic image to obtain the video frame.

3. The method of claim 1, wherein performing foreground detection on the acquired video frames to obtain a foreground region for each of the video frames comprises:

comparing the pixel brightness of the obtained video frame with that of a historical video frame to obtain the foreground region of the video frame;

the historical video frames are video frames acquired before the video frames.

4. The method of claim 1, wherein the generating a foreground trajectory for the same foreground region based on a foreground region of each of the video frames comprises:

Connecting center points of the same foreground regions in a plurality of continuous video frames to obtain the front Jing Guiji;

the foreground track comprises track identification and foreground region information.

5. A method as in claim 1 wherein said performing a partial matting detection on said foreground region in said foreground trajectory to obtain a location detection result of an object to be tracked in said foreground region comprises:

preprocessing the foreground region, and performing partial matting processing on the preprocessed foreground region to obtain a first partial image containing the target to be tracked;

and performing target detection on the first partial image to obtain the position detection result of the target to be tracked.

6. The method of claim 5, wherein after said performing object detection on said first partial image to obtain said position detection result of said object to be tracked, said method further comprises:

comparing the position detection result corresponding to the current video frame with the position detection result corresponding to the historical video frame to obtain a detection frame change value;

and if the change value of the detection frame is within a preset threshold range, tracking the target to be tracked based on the position detection result of the current video frame.

7. The method of any one of claims 1 to 6, further comprising:

after a first position detection result of the target to be tracked in a first video frame is obtained, track prediction is carried out based on the first position detection result, and a position prediction result of the target to be tracked at the next moment is obtained;

performing local matting processing on a second video frame based on the position prediction result to obtain a second local image of the second video frame;

performing target detection on the second partial image to obtain a second position detection result of the target to be tracked;

tracking the target to be tracked in the second video frame based on the second position detection result;

the first video frame and the second video frame are two video frames adjacent to each other in front and back in the foreground track.

8. An object tracking device, the device comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.