CN113989696A

CN113989696A - Target tracking method and device, electronic equipment and storage medium

Info

Publication number: CN113989696A
Application number: CN202111104029.0A
Authority: CN
Inventors: 崔书刚; 林凡雨
Original assignee: Beijing Yuandu Internet Technology Co ltd
Current assignee: Beijing Yuandu Internet Technology Co ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2022-01-28
Anticipated expiration: 2041-09-18
Also published as: CN113989696B

Abstract

The application provides a target tracking method, a target tracking device, an electronic device and a storage medium, wherein the method comprises the following steps: determining a first size parameter of a target and a region where the target is located in a first image frame based on the position of the target in the first image frame acquired by video acquisition equipment; acquiring a first zooming multiple of the video acquisition equipment when acquiring the first image frame and a second zooming multiple of the video acquisition equipment when acquiring a second image frame, wherein the second image frame is an image frame acquired by the video acquisition equipment and positioned behind the first image frame; generating a second size parameter of the target based on the first zoom factor, the second zoom factor, and the first size parameter; and determining the area of the target in the second image frame based on the second size parameter. According to the embodiment of the application, the tracking efficiency of the target in zooming can be improved.

Description

Target tracking method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a target tracking method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of image processing technology, in applications in many fields, it is necessary to track a target in an image. For example: in the industrial field, in order to accurately control the actions of the manipulator, the manipulator in the monitoring video and the manipulator operated by the manipulator need to be tracked.

In practical application, the zoom factor of the video capture device is often changed halfway, and in such a case, the size of the target is usually adjusted in a manner of testing one by one among a plurality of sizes preset for the target in the prior art. This approach takes a lot of time, resulting in inefficient target tracking.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for tracking a target, which can improve tracking efficiency of the target when zooming.

On the basis of one aspect of the embodiment of the application, a target tracking method is disclosed, and the method comprises the following steps:

determining a first size parameter of a target and a region where the target is located in a first image frame based on the position of the target in the first image frame acquired by video acquisition equipment;

acquiring a first zooming multiple of the video acquisition equipment when acquiring the first image frame and a second zooming multiple of the video acquisition equipment when acquiring a second image frame, wherein the second image frame is an image frame acquired by the video acquisition equipment and positioned behind the first image frame;

generating a second size parameter of the target based on the first zoom factor, the second zoom factor, and the first size parameter;

and determining the area of the target in the second image frame based on the second size parameter.

Based on an aspect of the embodiments of the present application, a target tracking apparatus is disclosed, the apparatus including:

the device comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is configured to determine a first size parameter of a target and a region where the target is located in a first image frame based on the position of the target in the first image frame acquired by a video acquisition device;

the acquisition module is configured to acquire a first zoom multiple of the video acquisition device when acquiring the first image frame and a second zoom multiple of the video acquisition device when acquiring a second image frame, wherein the second image frame is an image frame acquired by the video acquisition device and positioned behind the first image frame;

a generating module configured to generate a second size parameter of the target based on the first zoom factor, the second zoom factor, and the first size parameter;

a second determination module configured to determine a region in which an object in the second image frame is located based on the second size parameter.

In an exemplary embodiment of the present application, the generation module is configured to:

when the difference between the first zoom multiple and the second zoom multiple is greater than or equal to a preset difference threshold value, generating a second size parameter based on the first zoom multiple, the second zoom multiple and the first size parameter;

when the difference is less than the difference threshold, the first size parameter is taken as the second size parameter.

calculating a ratio obtained by dividing the second zoom multiple by the first zoom multiple;

and calculating to obtain each contour line in the second dimension parameter based on the product of each contour line in the first dimension parameter and the ratio.

In an exemplary embodiment of the present application, the apparatus further comprises a training module configured to:

when the difference is larger than or equal to the difference threshold value, after the area where the target is located in the second image frame is determined, the area where the target is located in the second image frame is used as a new sample to train a machine learning model, and the area where the target is located is detected in the image frame behind the second image frame through the machine learning model.

In an exemplary embodiment of the present application, the training module is further configured to:

when the number of times that the difference between the adjacent image frames is smaller than the difference threshold value reaches a preset number of times, taking the last image frame in the adjacent image frames as the second image frame, calculating a first similarity between the area where the target in the second image frame is located and the area where the target in the previous image frame is located, and calculating an average similarity between the area where the target in at least two historical image frames is located and the area where the target in the previous image frame respectively corresponds to the area where the target is located, wherein the historical image frames are the image frames collected by the video collecting device and located before the second image frame;

if the first similarity is larger than or equal to k times of the average similarity, after detecting a region where a target is located in the second image frame through the machine learning model, training the machine learning model by taking the region where the target is located in the second image frame as a new sample, and detecting the region where the target is located in the image frame after the second image frame through the machine learning model, wherein k is larger than 0 and smaller than or equal to 1.

In an exemplary embodiment of the present application, the training module is configured to:

after detecting the region where the target is located in the second image frame through the machine learning model in the first thread, training the machine learning model by taking the region where the target is located in the second image frame as a new sample in the second thread;

and after the machine learning model is obtained through training in the second thread, the machine learning model is transmitted to the first thread so as to detect the region where the target is located in the image frame behind the second image frame in the first thread through the machine learning model.

In an exemplary embodiment of the application, if the first image frame is an initial frame acquired by the video acquisition device, the first determination module is configured to:

acquiring the target position selected by the user in the first image frame and acquired by the video acquisition equipment;

detecting a first range which takes the target position as a center in the first image frame based on a detection algorithm to obtain a detection frame;

and determining the first size parameter and the area where the target in the first image frame is located based on the detection frame with the minimum distance to the target position.

In an exemplary embodiment of the application, if the first image frame is an image frame other than an initial frame acquired by the video acquisition device, the first determining module is configured to:

determining the position of a target in a first image frame based on the area of the target in the previous image frame of the first image frame;

extracting a candidate region with the same size as a region where the target in the previous image frame is located in a second range which takes the target position as the center in the first image frame;

calculating the similarity of each candidate region and the region where the target in the previous image frame is located;

and taking the size parameter of the candidate region which exceeds the threshold and has the maximum similarity as the first size parameter, and taking the candidate region which exceeds the threshold and has the maximum similarity as the region where the target in the first image frame is located.

In an exemplary embodiment of the present application, the second determination module is configured to:

determining the position of a target in the second image frame based on the area of the target in the first image frame;

and determining the area of the target in the second image frame based on the position of the target in the second image frame and the second size parameter.

In an exemplary embodiment of the present application, the second image frame is a subsequent image frame of the first image frame acquired by the video acquisition device.

Based on an aspect of the embodiments of the present application, an electronic device is disclosed, which includes: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of the preceding claims.

In accordance with an aspect of embodiments of the present application, a computer program medium is disclosed, having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any of the preceding claims.

According to an aspect of embodiments of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

In the embodiment of the application, the size parameter of the target of the subsequent image frame is generated by taking the size parameter of the target of the previous image frame as a reference based on the zoom multiple between the previous image frame and the subsequent image frame, and then the target tracking is performed on the subsequent image frame on the basis. By the method, the size of the target can be quickly adjusted when the video acquisition equipment zooms in the target tracking process, so that the target tracking efficiency in zooming is improved.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 shows a flow chart of a target tracking method according to an embodiment of the present application.

FIG. 2 shows a schematic flow diagram for training a tracker according to an embodiment of the present application.

FIG. 3 illustrates a flow diagram of target tracking according to an embodiment of the present application.

FIG. 4 shows a block diagram of a target tracking device according to an embodiment of the present application.

Fig. 5 shows a hardware diagram of an electronic device according to an embodiment of the application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The application provides a target tracking method which is mainly used for tracking a target in a monitoring video with a zoom multiple dynamically changing. Specifically, the target is tracked by determining the area where the target is located in the image frame of the monitoring video.

In one embodiment, a target, either moving or stationary, is tracked by a drone with a camera. In the flight process of the unmanned aerial vehicle, the monitoring video of the target is shot through the camera, target tracking is carried out based on the target tracking method provided by the application, and the area where the target is located in the image frame of the monitoring video is determined. Even if the zoom factor is changed to a large extent, the target tracking can be efficiently realized.

Fig. 1 shows a flowchart of a target tracking method according to an embodiment of the present application. The method comprises the following steps:

step S10, determining a first size parameter of a target and an area where the target is located in a first image frame based on the position of the target in the first image frame acquired by a video acquisition device;

step S11, acquiring a first zoom multiple of the video acquisition device when acquiring a first image frame and a second zoom multiple of the video acquisition device when acquiring a second image frame, wherein the second image frame is an image frame acquired by the video acquisition device and located behind the first image frame;

step S12, generating a second size parameter of the target based on the first zoom multiple, the second zoom multiple and the first size parameter;

step S13, determining the area in the second image frame where the object is located based on the second size parameter.

In the embodiment of the application, for a first image frame acquired by a video acquisition device, a first size parameter of a target is determined based on the position of the target in the first image frame, and the area where the target is located in the first image frame is determined.

The size parameter of the object refers to a parameter for describing a specific size of an outline of a region where the object is located. In general, the area where the target is located is an area enclosed by a rectangular frame of the area where the target is located, and in this case, the size parameter is generally used to describe the width of the horizontal contour line and the height of the vertical contour line of the area where the target is located.

For a second image frame acquired by the video acquisition device after the first image frame, in order to detect the area where the target is located in the second image frame, a second size parameter of the target when the target is tracked in the second image frame needs to be determined.

In order to determine a second size parameter of the target when the target is tracked in the second image frame, a first zooming multiple of the video acquisition device when the first image frame is acquired and a second zooming multiple of the video acquisition device when the second image frame is acquired are obtained. And generating a second size parameter on the basis of the first zoom factor, the second zoom factor and the first size parameter. And further determining the area of the target in the second image frame based on the second size parameter.

Therefore, in the embodiment of the application, the size parameter of the target in the subsequent image frame is generated by taking the size parameter of the target in the previous image frame as a reference based on the zoom multiple between the previous image frame and the subsequent image frame, and then the target tracking is performed on the subsequent image frame on the basis. By the method, the size of the target can be quickly adjusted when the video acquisition equipment zooms in the target tracking process, so that the target tracking efficiency in zooming is improved.

In one embodiment, the target position is selected by the user for the first image frame, which is the initial frame.

In this embodiment, if the first image frame is an initial frame acquired by the video acquisition device, a target position selected by a user in the first image frame acquired by the video acquisition device is acquired. And detecting a first range which takes the target position as the center in the first image frame based on a detection algorithm to obtain a detection frame. And further determining a first size parameter and an area where the target in the first image frame is located based on the detection frame with the minimum distance to the target position.

Specifically, if the first image frame is an initial frame, the user may manually select a target position in the first image frame through a display terminal for monitoring the video. The target position may be a point or a region.

To ensure the accuracy of target tracking, target tracking is performed within a first range centered on the target position in the first image frame. The range of the circular area with the target position as the geometric center and the preset radius can be used as a first range; the range of the rectangular region having the target position as the geometric center and a predetermined length and width may be set as the first range.

And then detecting in the first range based on a detection algorithm to obtain a detection frame. The detection algorithm, which detects within the first range, typically includes a plurality of detection boxes, each of which selects an object.

And if the distance between a detection frame and the target position is minimum, which indicates that the distance between the detection frame and the target is minimum, the first detection frame can be used as a target frame in the first image frame, the size parameter of the first detection frame can be used as a first size parameter, and the area enclosed by the first detection frame can be used as the area where the target is located in the first image frame.

In one embodiment, a yolov (you Only Look one) series detection algorithm is adopted to perform detection in the first image frame, so as to obtain a detection frame. Wherein the yolov series algorithm comprises: yolov1, yolov2, yolov3 and the like.

In one embodiment, for the first image frame, which is not the initial frame, the target location is determined based on the area in the previous image frame where the target is located.

In this embodiment, if the first image frame is an image frame other than the initial frame acquired by the video acquisition device, the region where the target in the first image frame is located is determined according to the tracking algorithm. Specifically, the position of the target in the first image frame is determined based on the area where the target in the previous image frame of the first image frame is located. Determining the position of a target in a first image frame based on the region where the target in the previous image frame of the first image frame is located, and extracting to obtain a candidate region in a second range taking the target position as the center in the first image frame. And calculating the similarity of each candidate region and the region where the target in the previous image frame is located. And taking the size parameter of the candidate region which exceeds the threshold and has the maximum similarity as a first size parameter, and taking the candidate region which exceeds the threshold and has the maximum similarity as a region where the target in the first image frame is located.

In one embodiment, the tracking algorithm may be an ECO (efficient conversion Operators for tracking) algorithm. Specifically, if the first image frame is a non-initial frame, the area where the target in the previous image frame is located, or the center of the area where the target in the previous image frame is located, may be directly used as the target position in the first image frame.

It should be noted that, when determining the position of the target in the first image frame based on the area where the target in the previous image frame of the first image frame is located, in addition to the method provided in the tracking algorithm, the area where the target in the previous image frame is located may also be translated by a preset distance, and then the obtained translated area, or the center of the translated area, may be used as the position of the target in the first image frame.

And then extracting a candidate region with the same size as the region where the target in the previous image frame is located in a second range which takes the target position as the center in the first image frame. And then calculating the similarity between each candidate region and the region where the target in the previous image frame is located. Here, the similarity between regions may also be referred to as a response value between regions.

And if the similarity corresponding to a certain candidate region is maximum, taking the candidate region as a region where the target in the first image frame is located, and taking the size parameter of the candidate region as a first size parameter.

Similarly, the implementation process for determining the first range is not repeated herein.

It should be noted that the first image frame is a non-initial frame, but if the difference between the zoom multiples of the first image frame and the previous image frame also reaches the difference threshold, the method for determining the size parameter and the area where the target is located in the first image frame may refer to the method for determining the size parameter and the area where the target is located in the second image frame.

In an embodiment, the second image frame is a subsequent image frame of the first image frame captured by the video capture device. Namely, the first image frame and the second image frame are two adjacent image frames.

In this embodiment, the size parameter corresponding to the current image frame is generated based on the zoom multiple of the previous image frame, the zoom multiple of the current image frame, and the size parameter corresponding to the previous image frame. And then determining the area of the target in the current image frame based on the size parameter corresponding to the current image frame.

In an embodiment, the second image frame is an image frame as a target tracking source, and the first image frame is an image frame L frames before the second image frame, where L is greater than or equal to 2, and L is a constant in the same embodiment. That is, the first image frame and the second image frame are two image frames of a front-to-back interval (L-1) frame.

In one embodiment, the serial numbers of N consecutive image frames in the surveillance video are sequentially denoted as F₁、F₂、F₃......F_N. The second image frame is an image frame as a target tracking source, and the first image frame is an image frame at a frame preceding (L-1) the second image frame.

For example, if L is 3, in the image frame F₃Performing target tracking by F₃For the second image frame by F₁Is the first image frame. Based on F₁Zoom factor of F₃Zoom factor and F₁Corresponding dimensional parameters, generate F₃Corresponding dimensional parameters. And is further based on F₃Corresponding dimensional parameters, determining F₃Is located in the area of the target.

For another example, if L is 3, the image frame F is processed₄Performing target tracking by F₄For the second image frame by F₂Is the first image frame. Based on F₂Zoom factor of F₄Zoom factor and F₂Corresponding dimensional parameters, generate F₄Corresponding dimensional parameters. And is further based on F₄Corresponding dimensional parameters, determining F₄Is located in the area of the target.

Similarly, the implementation process of tracking the target in other image frames is not described again.

In an embodiment, the first image frame is fixed. The second image frame is continuously updated and the number of frames spaced between the second image frame and the first image frame is continuously updated as the video capture device continuously captures new image frames.

In one embodiment, the first image frame is a fixed initial frame and the second image frame is a current image frame.

The serial numbers of N continuous image frames in the monitoring video are sequentially recorded as F₁、F₂、F₃......F_N. The first image frame is fixedly F₁。

The current image frame is F₂At the time of the second image frame F, i.e. at the time of the second image frame F₂When target tracking is performed, based on F₁Zoom factor of F₂Zoom factor and F₁Corresponding dimensional parameters, generate F₂Corresponding dimensional parameters. And is further based on F₂Corresponding dimensional parameters, determining F₂Is located in the area of the target.

The current image frame is F₃For the second image frame F, i.e.₃When target tracking is performed, based on F₁Zoom factor of F₃Zoom factor and F₁Corresponding dimensional parameters, generate F₃Corresponding dimensional parameters. And is further based on F₃Corresponding dimensional parameters, determining F₃Is located in the area of the target.

In an embodiment, when the second zoom factor is changed compared to the first zoom factor, the second size parameter is also changed compared to the first size parameter.

In this embodiment, a ratio obtained by dividing the second zoom factor by the first zoom factor is calculated, and a product of each contour line in the first size parameter and the ratio is calculated. And then calculating each contour line in the second dimension parameter based on the product.

In one embodiment, the area where the target is located is a rectangular area, and the size parameters include the width size of the horizontal contour line and the height size of the vertical contour line.

In the first image frame, the first zoom factor is z1, the focal length of the optical lens is f1 (in centimeters), the field angle is α (in degrees), the width dimension of the horizontal contour line of the region where the object is located is w1 (in pixels), and the height dimension of the vertical contour line is h1 (in pixels). In the second image frame, the second zoom factor is z2, the focal length of the optical lens is f2 (in centimeters), the field angle is β, the width dimension of the horizontal contour line of the region where the object to be determined is located is w2, and the height dimension of the vertical contour line is h 2. The dimension of the target surface along the field of view is denoted as L (in millimeters).

It is possible to obtain:

this makes it possible to obtain:

w2＝w1*z2/z1

h2＝h1*z2/z1

in one embodiment, the change of the second size parameter compared with the first size parameter is triggered when the change of the second zoom multiple compared with the first zoom multiple exceeds a certain magnitude.

In this embodiment, when the difference between the first zoom multiple and the second zoom multiple is greater than or equal to a preset difference threshold, the second size parameter is generated based on the first zoom multiple, the second zoom multiple, and the first size parameter. When the difference between the first zoom multiple and the second zoom multiple is smaller than the difference threshold value, the first size parameter is directly used as the second size parameter.

The difference between the first zoom magnification and the second zoom magnification may be a distance difference between the first zoom magnification and the second zoom magnification, or a relative proportion between the first zoom magnification and the second zoom magnification.

The advantage of this embodiment is that when the difference between the zoom factors does not exceed the preset difference threshold, it indicates that the zoom factors have not changed to a large extent, so the previous size parameters can be used in this case, and frequent changes to the size parameters of the target can be avoided.

In an embodiment, a distance difference between the first zoom factor and the second zoom factor is calculated.

And when the distance difference is larger than or equal to a preset distance threshold value, calculating a ratio obtained by dividing the second zooming multiple by the first zooming multiple. And then calculating each contour line in the second dimension parameter based on the product of each contour line in the first dimension parameter and the ratio.

And when the distance difference is smaller than the distance threshold, directly taking the first size parameter as the second size parameter.

In one embodiment, the target area is a rectangular area, and the size parameters thereof include the width size of the horizontal contour and the height size of the vertical contour.

In the first image frame, the first zoom factor is z1, the width dimension of the horizontal contour is w1, and the height dimension of the vertical contour is h 1. In the second image frame, the second zoom factor is z2, the width dimension of the horizontal contour line to be determined is w2, and the height dimension of the vertical contour line to be determined is h 2.

The preset distance threshold is N. Calculate | z1-z2| and compare | z1-z2| to N.

When the | z1-z2| ≧ N, w2 and h2 are obtained by the following formula, so as to obtain a second size parameter corresponding to the second image frame.

w2＝w1*z2/z1

h2＝h1*z2/z1

When | z1-z2| < N, w2 and h2 are calculated by the following formula, thereby obtaining a second size parameter corresponding to the second image frame.

w2＝w1

h2＝h1

In one embodiment, a proportional value between the first zoom multiple and the second zoom multiple is calculated, wherein the proportional value is greater than or equal to 1. When the first zooming multiple is larger than the second zooming multiple, the proportion value is obtained by dividing the first zooming multiple by the second zooming multiple; when the second zoom multiple is larger than the first zoom multiple, the ratio value is obtained by dividing the second zoom multiple by the first zoom multiple.

And when the proportion value is larger than or equal to a preset proportion threshold value, calculating a ratio obtained by dividing the second zooming multiple by the first zooming multiple. And then calculating each contour line in the second dimension parameter based on the product of each contour line in the first dimension parameter and the ratio.

And when the proportion value is smaller than the proportion threshold value, directly taking the first size parameter as the second size parameter.

The preset proportional threshold is M, and M is larger than 1. Calculate z1/z2 and z2/z1 and compare z1/z2 and z2/z1, respectively, to M.

When z1/z2 is larger than or equal to M or z2/z1 is larger than or equal to M, w2 and h2 are obtained through the following formulas, and therefore the second size parameter corresponding to the second image frame is obtained.

w2＝w1*z2/z1

h2＝h1*z2/z1

When z1/z2 < M and z2/z1 < M, w2 and h2 are calculated through the following formulas, and therefore the second size parameter corresponding to the second image frame is obtained.

w2＝w1

h2＝h1

In an embodiment, the position of the object in the second image frame is determined based on the area in which the object in the first image frame is located. And determining the area of the target in the second image frame based on the position of the target in the second image frame and the second size parameter.

Specifically, the area where the target in the first image frame is located, or the center of the area where the target in the first image frame is located, may be directly used as the target position in the second image frame; or after translating the region where the target in the first image frame is located by a preset distance, taking the obtained translated region or the center of the translated region as the target position in the second image frame; the target position may also be determined in the second image frame according to a tracking algorithm. It should be noted that, after zooming, the sizes of the target areas of the two frames before and after zooming change, and a new size of the target area cannot be determined directly according to the tracking algorithm, however, the center position of the target area obtained by using the tracking algorithm after zooming does not change, and therefore, the center position of the target area in the second image frame, that is, the target position, can be determined by using the tracking algorithm.

And further detecting a region with the size of the second size parameter in a third range of the position of the target in the second image frame, thereby determining the region where the target in the second image frame is located. Similarly, the implementation process for determining the first range is not repeated herein.

In one embodiment, the area where the target is located is detected by a tracking algorithm.

Specifically, a tracking algorithm is trained according to samples, so that the tracking algorithm automatically detects the area where the target in the image frame is located.

In one embodiment, the machine learning model used for target tracking is referred to as a tracker.

FIG. 2 shows a schematic flow chart of training a tracker according to an embodiment of the present application.

In this embodiment, the position coordinates of the object in the sample image frame and the size of the object are determined in advance.

And S20, acquiring the position of the target and the size of the area where the target is located.

And S21, extracting the characteristics of the area where the target is located from the sample image frame based on the position of the target and the size of the area where the target is located.

And S22, performing dimension reduction processing on the characteristics of the region where the target is located by initializing the projection matrix, thereby improving the training efficiency of the tracker.

And S23, performing preprocessing operations such as cosine window processing, Fourier transform and the like on the characteristics of the region where the target is located.

And S24, adding the features after the preprocessing operation as new samples into a training set of the tracker.

And S25, training the tracker by adopting the training set added with the new sample to obtain the trained tracker.

In one embodiment, the region where the target is located is characterized by a HOG (Histogram of Oriented Gradient) feature and a CN (Color Name) feature.

In one embodiment, the trained tracker automatically tracks the target in the surveillance video according to an ECO Tracking (Efficient Convolution operations for Tracking) algorithm.

In an embodiment, when the difference between the first zoom multiple and the second zoom multiple is greater than or equal to a preset difference threshold, after the area where the target in the second image frame is located is determined, the area where the target in the second image frame is located is taken as a new sample to train the machine learning model, so that the area where the target is located is detected in the image frame after the second image frame through the machine learning model.

Specifically, the difference between the first zoom factor and the second zoom factor is greater than or equal to the preset difference threshold, which indicates that the second zoom factor is changed to a greater extent than the first zoom factor. And before the second image frame, the machine learning model is mainly used for detecting the target at the first zoom multiple. In order to improve the detection accuracy of the machine learning model on the target under the second zoom multiple, the region where the target is located in the second image frame is taken as a new sample to train the machine learning model again, and then the region where the target is located is detected in the image frame behind the second image frame through the machine learning model.

The embodiment has the advantage that when the zoom multiple is changed to a large extent, the machine learning model is trained again by taking the image of the area where the target under the changed zoom multiple is located as a new sample, so that the obtained updated machine learning model can more accurately detect the target under the changed zoom multiple.

In an embodiment, when the difference between the first zoom multiple and the second zoom multiple is greater than or equal to a preset difference threshold, after the area where the target in the second image frame is located is determined, the area where the target in the second image frame is located is used as a new sample to train the machine learning model.

And when the number of times that the difference between the adjacent image frames is smaller than the difference threshold value reaches a preset number of times, taking the last image frame in the adjacent image frames as a second image frame, calculating the first similarity between the image of the region where the target is located in the second image frame and the region where the target is located in the previous image frame, and calculating the average similarity between the region where the target is located in at least two historical image frames and the region where the target is located in the previous image frame corresponding to the region where the target is located, wherein the historical image frames are the image frames which are collected by the video collecting device and located before the second image frame.

If the first similarity is larger than or equal to k times of the average similarity, after the region where the target is located is detected in the second image frame through the machine learning model, the region where the target is located in the second image frame is taken as a new sample to train the machine learning model, and the region where the target is located is detected in the image frame behind the second image frame through the machine learning model, wherein k is larger than 0 and smaller than or equal to 1.

Specifically, the greater the similarity of the regions where the targets are located between the adjacent image frames, the more favorable the characteristics contained in the region where the targets are located in the current image frame are for detecting the targets, so when the method is used for training the machine learning model, the region where the targets are located in the current image frame belongs to a sample with higher quality.

And the difference of the zoom factors between the adjacent image frames is smaller than a preset difference threshold value, which indicates that the zoom factors between the adjacent image frames do not change to a large extent. The machine learning model may not be updated for only one pair of adjacent image frames.

However, when this situation occurs continuously, it means that the machine learning model is not updated all the time. The long-time non-updating may cause the detection accuracy of the machine learning model to be reduced.

Therefore, when the continuous occurrence frequency reaches a preset frequency threshold (for example, T times), and the first similarity corresponding to the second image frame is greater than or equal to the average similarity corresponding to the historical image frame, it indicates that the area where the target in the second image frame is located belongs to a sample with better quality than the area where the target in the historical image frame is located. Therefore, the region where the target in the second image frame is located is used as a new sample to train the machine learning model, and the machine learning model can detect the target more accurately.

In one embodiment, the number of times that the difference of the zoom factors between the consecutive adjacent image frames is smaller than the preset difference threshold is recorded as f. And recording a preset time threshold value as T. Let the first similarity corresponding to the second image frame be pv. Note that the average similarity corresponding to the historical image frames is apv, and k is a preset scale factor.

And when the flag _ model is 1, updating the tracker by taking the area where the target in the second image frame is located as a new sample. When the flag _ model is 0, the tracker is not updated.

In one embodiment, the first image frame and the second image frame are two adjacent image frames.

Recording the serial numbers of N continuous image frames in the monitoring video as F₁、F₂、F₃......F_N. Wherein any two adjacent imagesThe difference between the zoom multiples of the frames is smaller than a preset difference threshold value, namely, the zoom multiples of any two adjacent image frames are not changed to a large extent.

In the presence of F₂For the second image frame, F₁In the first image frame, due to F₁And F₂Does not change to a large extent, so only F is considered₁And F₂The tracker may not be updated.

In the presence of F₃For the second image frame, F₂In the first image frame, due to F₂And F₃Does not change to a large extent either, so only F is considered₂And F₃In this case, the tracker may not be updated.

Similarly, consider only F_N-1And F_NIn this case, the tracker may not be updated.

However, considering these N image frames as a whole, it can be seen that F may result₂To F_NThe tracker is not updated all the time in the process of (1). To avoid this, the number threshold may be set to T, which is a positive integer greater than or equal to 1, in a targeted manner.

If T is set to be 3: in the presence of F₃For the first image frame, the second image frame F is tracked by the tracker₄When the target is tracked, although F₃And F₄Does not change to a large extent, but since the change of the zoom factor, which does not change to a large extent, has been continued 3 times, F is determined₄After the area where the target is located, F is calculated₄And the area where the target is located and F₃Is determined as the first similarity pv of the region in which the object is located. And can be substituted by F₁To F₃Calculating F as a historical image frame₂And the area where the target is located and F₁The similarity pv1, F of the region in which the target is located₃And the area where the target is located and F₂The similarity pv2 of the region where the target is located in (a), and the average similarity apv corresponding to the history image frame is obtained as (pv1+ pv 2)/2.

If pv is greater than or equal to k × apv, then F₄Target of (1)The area in which the tracker is updated. If pv is less than k × apv, the tracker is still not updated; in the same way, with the same logic pair F₄The subsequent image frames are processed. K is a preset scale factor, and k is greater than 0 and less than or equal to 1.

In an embodiment, after the region where the target is located is detected in the second image frame by the machine learning model in the first thread, the machine learning model is trained in the second thread by taking the region where the target is located in the second image frame as a new sample.

And after the machine learning model is obtained through training in the second thread, the machine learning model is transmitted to the first thread so as to detect the area where the target is located in the image frame after the second image frame in the first thread through the machine learning model.

Specifically, the first thread and the second thread work in parallel. The first thread is mainly responsible for the use of the machine learning model, i.e., the target is tracked by the machine learning model in the first thread. The second thread is primarily responsible for the training of machine learning. That is, the machine learning model is trained in the second thread.

The embodiment has the advantages that the machine learning model can be used and trained at the same time by respectively executing the use of the machine learning model and the training of the machine learning model through the two threads, and the uninterrupted online updating of the target tracking is realized.

In an embodiment, the first thread is a main thread of a process for implementing the target tracking method provided by the present application, and the second thread is a sub-thread of the process.

And S30, acquiring the current image frame.

S31, extracting features near the target position of the current image frame based on the target position of the previous image frame.

And S32, performing feature dimension reduction processing on the extracted features.

And S33, performing preprocessing operations such as cosine window processing and Fourier transform on the features subjected to the dimension reduction processing.

And S34, calculating to obtain a region with the maximum similarity with the region of the target in the previous image frame in the current image frame based on the characteristics obtained by the preprocessing operation. And determining the area with the maximum similarity as the area where the target in the current image frame is located, and recording the maximum similarity as the first similarity between the area where the target in the current image frame is located and the area where the target in the previous image frame is located.

And S35, determining whether the difference between the zoom multiple of the current image frame and the zoom multiple of the previous image frame is larger than or equal to a preset difference threshold value.

If the difference is greater than or equal to the preset difference threshold, S361 is executed, otherwise S362 is executed.

And S361, updating the size of the area where the target is located in the current image frame based on the zoom multiples of the two.

And (5) turning to S381 from S361, and combining the target position in the current image frame to obtain the region where the target in the current image frame is located. Adding the area of the target in the current image frame as a new sample into a training set of the tracker, and further updating the tracker; moving on to S382, the target position and size in the current image frame are returned.

And S362, taking the size of the area where the target in the previous image frame is located as the size of the area where the target in the current image frame is located, and calculating the average similarity between the area where the target in the historical image frame is located and the target area in the previous image frame corresponding to the area.

Moving from S362 to S37, it is determined whether to update the tracker based on the average similarity corresponding to the historical image frames.

If the difference is smaller than the preset difference threshold value and reaches the preset number, and the first similarity between the area where the target in the current image frame and the area where the target in the previous image frame is located is larger than or equal to k times of the average similarity corresponding to the historical image frame, switching from S37 to S381, taking the area where the target in the current image frame is located as a new sample, adding the new sample to a training set of the tracker, and further updating the tracker; moving on to S382, the target position and size in the current image frame are returned. Wherein k is greater than 0 and equal to or less than 1.

If the number of times that the difference is smaller than the preset difference threshold value and continuously appears does not reach the preset number of times T, the step from S37 to S382 is carried out, the tracker is not updated, and the target position and the size in the current image frame are returned.

FIG. 4 shows an object tracking device according to an embodiment of the present application, the device comprising:

the first determining module 40 is configured to determine a first size parameter of a target and a region where the target is located in a first image frame based on a position of the target in the first image frame acquired by a video acquisition device;

an obtaining module 41 configured to obtain a first zoom multiple of the video capturing device when capturing the first image frame and a second zoom multiple of the video capturing device when capturing a second image frame, wherein the second image frame is an image frame captured by the video capturing device and located after the first image frame;

a generating module 42 configured to generate a second size parameter of the target based on the first zoom factor, the second zoom factor, and the first size parameter;

a second determining module 43 configured to determine a region in which the object is located in the second image frame based on the second size parameter.

An electronic apparatus 50 according to an embodiment of the present application is described below with reference to fig. 5. The electronic device 50 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 5, electronic device 50 is embodied in the form of a general purpose computing device. The components of the electronic device 50 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.

Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the description part of the above exemplary methods of the present specification. For example, the processing unit 510 may perform the various steps as shown in fig. 1.

The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.

Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 50 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 50, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 50 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. An input/output (I/O) interface 550 is connected to the display unit 540. Also, the electronic device 50 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 360. As shown, the network adapter 560 communicates with the other modules of the electronic device 50 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 50, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.

Based on an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, based on the embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. A method of target tracking, the method comprising:

2. The method of claim 1, wherein generating a second size parameter for the target based on the first zoom factor, the second zoom factor, and the first size parameter comprises:

3. The method of claim 2, wherein generating the second size parameter based on the first zoom factor, the second zoom factor, and the first size parameter comprises:

4. The method of claim 2, further comprising:

5. The method of claim 4, further comprising:

6. The method of claim 4, wherein training the machine learning model with the region of the object in the second image frame as a new sample to detect the region of the object in the image frame after the second image frame by the machine learning model comprises:

7. The method of claim 1, wherein determining a first size parameter of an object and an area in which the object is located in the first image frame based on a position of the object in the first image frame captured by a video capture device if the first image frame is an initial frame captured by the video capture device comprises:

8. The method of claim 1, wherein determining a first size parameter of the object and an area where the object is located based on a position of the object in the first image frame captured by the video capture device if the first image frame is an image frame other than an initial frame captured by the video capture device comprises:

9. The method of claim 1, wherein determining a region in which an object in the second image frame is located based on the second size parameter comprises:

10. The method of claim 1, wherein the second image frame is a subsequent image frame to the first image frame captured by the video capture device.

11. An object tracking apparatus, characterized in that the apparatus comprises:

12. An electronic device, comprising:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-10.

13. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-10.