WO2022194158A1

WO2022194158A1 - Target tracking method and apparatus, device, and medium

Info

Publication number: WO2022194158A1
Application number: PCT/CN2022/080985
Authority: WO
Inventors: 郭亨凯; 杜思聪
Original assignee: 北京字跳网络技术有限公司
Priority date: 2021-03-15
Filing date: 2022-03-15
Publication date: 2022-09-22
Also published as: CN115082516A

Abstract

Embodiments of the present disclosure relate to a target tracking method and apparatus, a device, and a medium. The method comprises: obtaining a target video frame; determining the position information of a target region in the target video frame by performing feature detection on the target video frame; if it is failed to determine the position information of the target region in the target video frame, determining target photographing position information; and according to the target photographing position information and a camera projection algorithm, determining the position information of the target region in the target video frame again.

Description

A target tracking method, device, equipment and medium

This application claims the priority of the Chinese patent application with the application number 202110276360.4 and titled "A Target Tracking Method, Apparatus, Equipment and Medium" filed with the China Patent Office on March 15, 2021, the entire contents of which are incorporated by reference in in this application.

technical field

The present disclosure relates to the technical field of video processing, and in particular, to a target tracking method, apparatus, device and medium.

Background technique

With the continuous development of intelligent terminal technology, the demand for video content identification and tracking is increasing.

At present, each video frame in the video can be tracked through edge features, but the above method will fail to detect the target when the camera moves quickly, resulting in poor robustness.

SUMMARY OF THE INVENTION

In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a target tracking method, apparatus, electronic device, storage medium, computer program product and computer program.

The embodiment of the present disclosure provides a target tracking method, the method includes:

Get the target video frame;

Determine the position information of the target area in the target video frame by detecting the feature of the target video frame;

If the determination of the position information of the target area in the target video frame fails, determining the target shooting position information;

According to the target shooting position information and the camera projection algorithm, the position information of the target area in the target video frame is determined again.

Embodiments of the present disclosure also provide a target tracking device, the device comprising:

The video frame module is used to obtain the target video frame;

a first position module, for determining the position information of the target area in the target video frame by detecting the feature of the target video frame;

a shooting position module, configured to determine target shooting position information if the position information of the target area in the target video frame fails to be determined;

The second position module is configured to re-determine the position information of the target area in the target video frame according to the target shooting position information and the camera projection algorithm.

An embodiment of the present disclosure further provides an electronic device, the electronic device includes: a processor; a memory for storing instructions executable by the processor; the processor for reading the memory from the memory The instructions can be executed, and the instructions can be executed to implement the target tracking method provided by the embodiments of the present disclosure.

An embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the target tracking method provided by the embodiment of the present disclosure.

Embodiments of the present disclosure also provide a computer program product, including a computer program, and the computer program is used to execute the target tracking method provided by the embodiments of the present disclosure.

The embodiment of the present disclosure also provides a computer program, and the computer program is used to execute the target tracking method provided by the embodiment of the present disclosure.

Compared with the prior art, the technical solution provided by the embodiment of the present disclosure has the following advantages: the target tracking solution provided by the embodiment of the present disclosure obtains the target video frame; the feature detection of the target video frame determines the position of the target area in the target video frame. Position information; if the determination of the position information of the target area in the target video frame fails, determine the target shooting position information; according to the target shooting position information and the camera projection algorithm, determine the position information of the target area in the target video frame again. With the above technical solution, after the tracking of the target area of the video frame fails, the position of the target area in the video frame can be re-determined according to the shooting position determined by the synchronous positioning and mapping algorithm, so as to realize the tracking recovery, and the camera can move faster. It can also achieve target tracking and improve the tracking robustness.

Description of drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

FIG. 1 is a schematic flowchart of a target tracking method according to an embodiment of the present disclosure;

2 is a schematic diagram of a normal vector projection provided by an embodiment of the present disclosure;

3 is a schematic diagram of another normal vector projection provided by an embodiment of the present disclosure;

4 is a schematic flowchart of another target tracking method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of still another target tracking method provided by an embodiment of the present disclosure;

6 is a schematic structural diagram of a target tracking device according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

FIG. 1 is a schematic flowchart of a target tracking method according to an embodiment of the present disclosure. The method may be executed by a target tracking apparatus, where the apparatus may be implemented by software and/or hardware, and may generally be integrated in an electronic device. As shown in Figure 1, the method includes:

Step 101: Acquire a target video frame.

A video frame, also called an image frame, can be the smallest unit that composes a video, or it can be a single image. The target video frame can be any image frame that needs to be detected and tracked, it can be a video frame captured by a device with a video capture function or a video frame obtained from the Internet, or it can be a video frame captured in real time. An image, no specific limit.

Step 102: Determine the position information of the target area in the target video frame by detecting the feature of the target video frame.

The target area refers to the area in the target video frame where the object with the target shape is located, and the target shape is not limited. For example, the target shape may include an ellipse, a circle, and a rectangle. shape area as an example. The location information of the target area may be information that can represent the location of the target area in the video frame, and may specifically include information such as vertex coordinates and center point coordinates of the target area in the video frame.

In the embodiment of the present disclosure, after the target video frame is acquired, a preset detection algorithm may be used to determine the position information of the target area in the target video frame. The above-mentioned preset detection algorithm may be a deep learning-based detection algorithm or a contour detection algorithm, etc., which may be specifically determined according to the actual situation. For example, when the target area is an oval area, the preset detection algorithm may be any ellipse detection algorithm, and the process of determining the position information of the target area in the video frame may include: using the ellipse detection algorithm to perform contour detection on the first video frame, and then The ellipse contour obtained by contour detection is fitted to obtain the position information in the video frame of the ellipse region.

Optionally, when the target video frame is a video frame in the target video, determining the position information of the target area in the target video frame may also include: extracting the first image in the target video, and determining that the target area is in the first image. The first position information of the target video frame is performed according to the initial feature point determined by the first position information, and the target video frame is tracked by optical flow to obtain the target feature point; wherein, the target video frame is the adjacent video frame of the first image in the target video; The feature points are fitted to obtain the position information of the target area in the target video frame.

The target video frame may be any image frame in the target video, and the first image may be the last image frame in the target video adjacent to the target video frame in time sequence. In the embodiment of the present disclosure, the above-mentioned preset detection algorithm may be used to detect the target area on the first image, and determine the first position information of the target area in the first image. Determining the initial feature point according to the first position information includes: sampling the edge contour of the target area in the first image according to the first position information to determine the initial feature point. Optionally, sampling the edge contour of the target area in the first image according to the first position information, and determining the initial feature points, including: when the target area is an elliptical area, according to the first position information, the target area is in polar coordinates. Perform representation to obtain an ellipse outline; wherein, the first position information includes vertex coordinates and/or center point coordinates of the target area in the first image; sampling is performed in the ellipse outline according to preset polar angle intervals to obtain initial feature points. After that, the optical flow tracking algorithm is used to track the initial feature points obtained by the above sampling, and the feature points that are successfully tracked are reserved as the target feature points, and the feature points that fail to be tracked are eliminated. Fit the target feature points to obtain the position information of the target area in the target video frame.

In some embodiments, fitting the target feature points to obtain position information of the target region in the target video frame, including: if the coverage of the target feature points on the edge contour of the target region is greater than or equal to a preset range, then Fit the target feature points to obtain the position information of the target area in the target video frame. The preset range refers to a preset range that satisfies the shape of the target area, which may be set according to actual conditions. For example, the preset range may be 3/4 of the entire range of the edge contour. Specifically, after determining the target feature points, it can be determined whether the coverage of the target points on the edge contour of the target area is greater than or equal to the preset range, and if so, use a fitting algorithm to fit the target feature points to obtain the target area in Location information in the target video frame. If the coverage area of the target feature point on the edge contour of the target area is smaller than the preset area, a preset detection algorithm can be directly used to detect the target video frame to determine the position information of the target area in the target video frame.

In some embodiments, after determining the first position information of the target area in the first image, the method further includes: determining a change parameter of the target video frame relative to the first image; initial feature points determined according to the first position information Perform optical flow tracking on the video frame to obtain target feature points, including: if the target does not meet the multiplexing condition based on the change parameter, performing optical flow tracking on the target video frame based on the initial feature point determined according to the first position information to obtain the target feature point.

The transformation parameter refers to a parameter representing the change of the target video frame relative to the first image. Optionally, determining a change parameter of the target video frame relative to the first image may include: extracting a first feature point in the first image; performing optical flow tracking on the target video frame according to the first feature point, and determining a second feature point , and the moving distance between the second feature point and the first feature point is determined as a change parameter. The first feature point may be a corner point detected on the first image by using a feature from Accelerated Segment Test (FAST) corner detection algorithm. The multiplexing condition refers to a specific judgment condition for determining whether the first image can be multiplexed by the target video frame to the position of the target area. The change threshold refers to a preset threshold, which can be set according to the actual situation. For example, when the change parameter is represented by the movement information of the feature points in the target video frame relative to the corresponding feature points in the first video, the transformation threshold can be the distance threshold. Set to 0.8.

Specifically, after determining the change parameter of the target video frame relative to the first image, the change parameter can be compared with the change threshold. If the change parameter is determined to be greater than the change threshold, it can be determined that the target video frame does not meet the multiplexing conditions and needs to be re-used. Tracking, performing optical flow tracking on the target video frame based on the initial feature point determined according to the first position information, to obtain the target feature point; otherwise, it is determined that the target video frame meets the multiplexing condition, then the first position information is determined as the target area in the target video frame. position information in the frame.

In the above solution, on the basis of detecting the target area of an image frame of the video, the position of the target area in the target video frame can be more accurately determined by feature point tracking and fitting, which improves the accuracy of the target area in the target video frame. Computational efficiency of target area location determination. Moreover, by adding multiplexing conditions to two adjacent video frames, when the changes of the two adjacent video frames in the video are large, the above-mentioned feature point tracking and fitting are used to determine the position of the target area; When the change or difference between two adjacent video frames is small, the similarity between the two video frames is high. At this time, the next video frame can directly reuse the position information of the target area of the previous video frame without re-detection. The workload is saved and the computing efficiency is improved.

Step 103: If the determination of the position information of the target area in the target video frame fails, determine the target shooting position information.

In the embodiment of the present disclosure, when the camera moves quickly, the location information of the target area may not be successfully determined in the target video frame, and then the target shooting location information is determined. The target shooting position information refers to the coordinates of the shooting position in the world coordinate system.

In the embodiment of the present disclosure, determining the target shooting position information may include: determining the initial shooting position information in the first coordinate system by synchronous positioning and mapping algorithm; determining the first coordinate system and the second coordinate system corresponding to the camera projection algorithm Target transformation relationship; according to the initial shooting position information and the target transformation relationship, determine the target shooting position information.

Among them, Simultaneous Localization And Mapping (SLAM) algorithm, SLAM is mainly used to solve the problem of positioning, navigation and map construction when a device equipped with a specific sensor is running in an unknown environment. If the sensor is a camera, the camera can be determined. position, that is, the shooting position. In the embodiment of the present disclosure, the SLAM algorithm is used to determine the initial shooting position information of the shooting position in the first coordinate system. The camera projection algorithm may be an algorithm of a pinhole projection model of the camera.

Optionally, determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm may include: respectively determining the first shooting position information of the inspected video frame by synchronizing the positioning and mapping algorithm and the camera projection algorithm. and second shooting position information, wherein the detected video frame is a video frame that successfully tracks the target area; the target transformation relationship is determined according to the first shooting position information and the second shooting position information of the detected video frame.

The detected video frame refers to a video frame that is successfully tracked, that is, a video frame for which the position of the target area is successfully determined. Wherein, the number of detected video frames is one or more, and when the number of detected video frames is multiple, the target transformation relationship is the average value of the transformation relationship corresponding to each detected video frame, and the target transformation relationship includes transformation scale and Transform displacement. The target transformation relationship refers to the transformation relationship between the first coordinate system corresponding to the simultaneous positioning and mapping algorithm and the second coordinate system corresponding to the camera projection algorithm. The target transformation relationship can include transformations in rotation, scale and displacement between the two coordinate systems. Since the rotations of the two coordinate systems are the same, no rotation transformation is required, that is, the target transformation relationship includes the transformation scale and Transform displacement. When the number of detected video frames is multiple, a transformation relationship can be calculated for each detected video frame, so the target transformation relationship can be the average value of the transformation relationship corresponding to each detected video frame.

Specifically, for the detected video frame, the first shooting position information can be determined through the synchronous positioning and mapping algorithm, and the second shooting position information can be determined through the camera projection algorithm. The first shooting position information and the second shooting position information are the shooting positions in The position information of the first coordinate system and the second coordinate system, and then the least squares method can be used to determine the target transformation relationship between the first coordinate system and the second coordinate system according to the first shooting position information and the second shooting position information. Exemplarily, the first shooting position information may be represented by slam_w_T_c, the second shooting position information may be represented by ellipse_w_T_c, and the target transformation relationship is represented as slam_w_T_c=ellipse_w_T_c*align_scale+align_translation, where align_scale represents the transformation scale of the two coordinate systems, and align_translation represents Transform displacement of the two coordinate systems.

It can be understood that the above-mentioned first coordinate system may be the first world coordinate system corresponding to the synchronous positioning and mapping algorithm, and the above-mentioned second coordinate system may be the second world coordinate system corresponding to the camera projection algorithm or the target shape object as the origin. The origin and coordinate axis of the first world coordinate system and the second world coordinate system are different, and they also need to be transformed.

Optionally, determining the target shooting position information according to the initial shooting position information and the target transformation relationship includes: transforming the initial shooting position information from the information of the first coordinate system to the information of the second coordinate system based on the target transformation relationship, so as to obtain the target shooting position. location information. After the above target transformation relationship is determined, the initial shooting position information can be transformed from the information of the first coordinate system corresponding to the synchronous positioning and the mapping algorithm to the information of the second coordinate system corresponding to the camera projection algorithm through the target transformation relationship, and then can be obtained by The target shooting position information after the transformation of the coordinate system is used for the subsequent step of determining the position information of the target area in the target video frame.

Step 104: Determine the position information of the target area in the target video frame again according to the target shooting position information and the camera projection algorithm.

In the embodiment of the present disclosure, re-determining the position information of the target area in the target video frame according to the target shooting position information and the camera projection algorithm may include: performing a position measurement according to the target shooting position information and the position information of the target shape object corresponding to the target area. Solve to determine the displacement information from the shooting position to the target-shaped object; wherein, the target area is the area where the target-shaped object is located in the target video frame; input the displacement information from the shooting position to the target-shaped object into the projection equation of the camera projection algorithm to determine the target The location information of the region in the target video frame.

The position information of the target shape object can be a preset fixed value, which is a known quantity. For example, the origin of the world coordinate system can be set at the position where the target shape object is located, and the coordinates of the position information are (0, 0, 0) . According to the position of the target shape object, the target shooting position and the transformation equation, the displacement from the target shooting position to the target shape object can be determined. The above transformation equation can be expressed as W10=W20+W12, where W10 represents the shooting position information in the world coordinate system, W20 represents the position information of the target shape object in the world coordinate system, and W12 represents the target shooting position in the world coordinate system to the target area. displacement information. The above three W10, W20 and W12 are all vectors in the world coordinate system, with direction and size. In this step, the target shooting position information W10 and the position information W12 of the target shape object after the coordinate system transformation can be input into the transformation equation to obtain the displacement information W12 from the target shooting position to the target shape object.

The location information of the target area in the projection equation is related to the internal parameters of the photographing device, the rotation matrix from the coordinate system where the photographing location is located to the world coordinate system, and the location information of the origin of the world coordinate system under the coordinate system where the photographing location is located. The projection equation can be expressed as p=π[K(R12*W20+T)], where π represents the coefficient, and K represents the internal parameters of the shooting device, which may specifically include some parameters inside the shooting device such as the focal length of the shooting position, distortion parameters, etc. R12 represents the rotation matrix from the coordinate system where the shooting position is located to the world coordinate system, p represents the position information of the target area, and T represents the position information of the origin of the world coordinate system under the coordinate system where the shooting position is located. W12=-1*ratio*R21*K-1*p is transformed from the above projection equation p=π[K(R12*W20+T)], R21 represents the rotation matrix from the coordinate system where the shooting position is located to the world coordinate system The inverse matrix of , ratio=1/π, T is the three-dimensional coordinate, the value on the right side of the equal sign in the projection equation is also the three-dimensional coordinate, p is the two-dimensional coordinate, and ratio can be the value of the last dimension in the three-dimensional coordinate. The position information p of the target area in the target video frame can be determined by taking the displacement information W12=-1*ratio*R21*K-1*p from the target shooting position to the target-shaped object in the above-mentioned world coordinate system.

The target tracking solution provided by the embodiment of the present disclosure obtains the target video frame; the position information of the target area in the target video frame is determined by the feature detection of the target video frame; if the determination of the position information of the target area in the target video frame fails, then Determine the target shooting position information; according to the target shooting position information and the camera projection algorithm, determine the position information of the target area in the target video frame again. With the above technical solution, after the tracking of the target area of the video frame fails, the position of the target area in the video frame can be re-determined according to the shooting position determined by the synchronous positioning and mapping algorithm, so as to realize the tracking recovery, and the camera can move faster. It can also achieve target tracking and improve the tracking robustness.

In some embodiments, the target tracking method may further include: determining the initial normal vector of the plane where the target shape object corresponding to the target area is located, and determining the projection of the initial normal vector on the horizontal or vertical plane as the target normal vector.

Among them, the initial normal vector refers to the normal vector of the plane where the target shape object is located in the space, and the target normal vector refers to the optimized normal vector. Specifically, after determining the location information of the target area in the target video frame and adjacent video frames, the initial normal vector can be determined by decomposing the homography matrix between the two video frames, and the above homography matrix is upsampled by the target area. The point calculation of , the decomposition method can use the singular value decomposition (Singular Value Decomposition, SVD) method. After the initial normal vector is determined, the projection of the initial normal vector on the horizontal or vertical plane can be determined as the target normal vector. Exemplarily, the initial normal vector is (0.05, 0.03, 0.994), and the target normal vector may be (0, 0, 1).

Exemplarily, FIG. 2 is a schematic diagram of a normal vector projection provided by an embodiment of the present disclosure, FIG. 3 is a schematic diagram of another normal vector projection provided by an embodiment of the present disclosure, and FIG. 2 and FIG. 3 distributions represent initial normal vectors. The target normal vector after vertical and horizontal projection.

In the above scheme, assuming that the object is only located on the horizontal or vertical plane, the optimized target normal vector can be obtained by projecting the initial normal vector, which avoids the error caused by assuming that the camera is an orthogonal projection model when calculating the normal vector, and improves the The accuracy of the normal direction of the plane where the object is located in the space, thereby improving the accuracy of the display of special effects based on the normal direction

FIG. 4 is a schematic flowchart of another target tracking method provided by an embodiment of the present disclosure. On the basis of the foregoing embodiment, this embodiment further optimizes the foregoing target tracking method. As shown in Figure 4, the method includes:

Step 201: Acquire a target video frame.

Step 202: Determine the position information of the target region in the target video frame by detecting the feature of the target video frame.

Step 203: If the determination of the position information of the target area in the target video frame fails, the initial shooting position information in the first coordinate system is determined through a synchronous positioning and mapping algorithm.

Step 204: Determine the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm.

Optionally, determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm includes: determining the first shooting position information and The second shooting position information, wherein the detected video frame is a video frame that successfully tracks the target area; the target transformation relationship is determined according to the first shooting position information and the second shooting position information of the detected video frame.

Wherein, the number of detected video frames is one or more, and when the number of detected video frames is multiple, the target transformation relationship is the average value of the transformation relationship corresponding to each detected video frame, and the target transformation relationship includes transformation scale and Transform displacement.

Step 205: Determine the target shooting position information according to the initial shooting position information and the target transformation relationship.

Optionally, determining the target shooting position information according to the initial shooting position information and the target transformation relationship includes: transforming the initial shooting position information from the information of the first coordinate system to the information of the second coordinate system based on the target transformation relationship, so as to obtain the target shooting position. location information.

Step 206: Determine the position information of the target area in the target video frame again according to the target shooting position information and the camera projection algorithm.

Optionally, according to the target shooting position information and the camera projection algorithm, the position information of the target area in the target video frame is determined again, including: performing a position solution according to the target shooting position information and the position information of the target shape object corresponding to the target area, and determining The displacement information from the shooting position to the target-shaped object; wherein, the target area is the area where the target-shaped object is located in the target video frame; the displacement information from the shooting position to the target-shaped object is input into the projection equation of the camera projection algorithm, and the target area is determined on the target. Position information in the video frame.

Next, the target tracking method in the embodiment of the present disclosure will be further described by using a specific example. Exemplarily, FIG. 5 is a schematic flowchart of still another target tracking method provided by an embodiment of the present disclosure. Taking the target area as an elliptical area and the target-shaped object as a circular object as an example, the specific process may include: 1. Initialization: In the first several video frames of the target video, the SLAM algorithm is used to complete the calculation of the shooting position in the first coordinate system, which is represented by slam_w_T_c, and the camera projection algorithm is used to complete the calculation in the second coordinate system according to the known position of the elliptical area. The calculation of the shooting position is represented by ellipse_w_T_c, the first coordinate system is the first world coordinate system, and the second coordinate system is the second world coordinate system or the circular object coordinate system. 2. Coordinate system alignment. The least squares method is used to calculate the similarity transformation relationship between the known rotations between the first coordinate system and the second coordinate system. When the camera is the image acquisition module set in the smart terminal, the rotation can be passed through the inertial measurement homography of the smart terminal (Inertial Measurement Unit). , IMU), the rotation of the two coordinate systems is the same, so the similarity transformation relationship includes the displacement and scale required for the similarity transformation, so as to complete the coordinate system alignment between the two coordinate systems. 3. Tracking: When the current frame in the target video is blurred and the elliptical area cannot be detected due to the fast camera movement, the shooting position determined by the SLAM algorithm and the target transformation relationship between the two coordinate systems are used to determine the camera projection algorithm. The corresponding shooting position in the second coordinate system. The shooting position determined by the SLAM algorithm can be represented by slam_w_T_c, the shooting position corresponding to the camera projection algorithm can be represented by ellipse_w_T_c, and the target transformation relationship is represented as slam_w_T_c=ellipse_w_T_c*align_scale+align_translation, where align_scale represents the transformation scale of the two coordinate systems, and align_translation represents Transform displacement of the two coordinate systems. Then, the position of the elliptical region in the current frame is determined according to the shooting position in the second coordinate system, the projection equation and the transformation equation corresponding to the camera projection algorithm.

Further, the shooting position in the second coordinate system determined by the above camera projection algorithm may be inaccurate in some cases, resulting in a large error in the transformation relationship between the first coordinate system and the second coordinate system. In view of the above problems, in the whole tracking process, two queues can be set up to store the latest set number of slam_w_T_c and ellipse_w_T_c respectively, and the set number can be 10. When the current frame is blurred and the ellipse cannot be detected due to the fast camera movement, the transformation relationship between the two coordinate systems is determined based on several saved slam_w_T_c and ellipse_w_T_c, and the average value of the multiple transformation relationships is determined as the final target transformation relationship . Therefore, the shooting position in the second coordinate system corresponding to the camera projection algorithm is determined according to the shooting position determined by the SLAM algorithm and the target transformation relationship, and then the position of the elliptical area in the current frame is determined according to the shooting position in the two coordinate system.

FIG. 6 is a schematic structural diagram of a target tracking apparatus provided by an embodiment of the present disclosure. The apparatus may be implemented by software and/or hardware, and may generally be integrated into an electronic device. As shown in Figure 6, the device includes:

A video frame module 301, configured to obtain a target video frame;

The first position module 302 is used to determine the position information of the target area in the target video frame by detecting the feature of the target video frame;

A shooting position module 303, configured to determine the target shooting position information if the position information of the target area in the target video frame fails to be determined;

The second position module 304 is configured to re-determine the position information of the target area in the target video frame according to the target shooting position information and the camera projection algorithm.

Optionally, the shooting position module 303 is used for:

Determine the initial shooting position information in the first coordinate system by synchronous positioning and mapping algorithm;

determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm;

The target shooting position information is determined according to the initial shooting position information and the target transformation relationship.

Optionally, the shooting position module 303 is used for:

The first shooting position information and the second shooting position information of the detected video frame are respectively determined by the synchronous positioning and mapping algorithm and the camera projection algorithm, wherein the detected video frame is the successful tracking of the target area. video frame;

The target transformation relationship is determined according to the first shooting position information and the second shooting position information of the detected video frame.

Optionally, the number of the detected video frames is one or more, and when the number of the detected video frames is multiple, the target transformation relationship is the transformation relationship corresponding to each of the detected video frames. The average value, the target transformation relationship includes transformation scale and transformation displacement.

Optionally, the shooting position module 303 is used for:

The initial shooting position information is transformed from the information of the first coordinate system to the information of the second coordinate system based on the target transformation relationship, so as to obtain the target shooting position information.

Optionally, the second location module 304 is used for:

According to the target shooting position information and the position information of the target shape object corresponding to the target area, the position is solved, and the displacement information from the shooting position to the target shape object is determined; wherein, the target area is the target video frame. The area where the target shape object is located;

The displacement information from the shooting position to the target-shaped object is input into the projection equation of the camera projection algorithm, and the position information of the target area in the target video frame is determined.

Optionally, the device further includes a normal module for:

The initial normal vector of the plane where the target shape object corresponding to the target area is located is determined, and the projection of the initial normal vector on the horizontal plane or the vertical plane is determined as the target normal vector.

The target tracking device provided by the embodiment of the present disclosure can execute the target tracking method provided by any embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.

An embodiment of the present disclosure also provides a computer program product, including a computer program/instruction, which implements the target tracking method provided by any embodiment of the present disclosure when the computer program/instruction is executed by a processor.

FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring specifically to FIG. 7 below, it shows a schematic structural diagram of an electronic device 400 suitable for implementing an embodiment of the present disclosure. The electronic device 400 in the embodiment of the present disclosure may include, but is not limited to, such as a mobile phone, a notebook computer, a digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (Portable Android Device, PAD), a portable multimedia player Portable Media Player (PMP), mobile terminals such as in-vehicle terminals (such as in-vehicle navigation terminals), etc., as well as fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 7 , the electronic device 400 may include a processing device (such as a central processing unit, a graphics processor, etc.) 401, which may be stored in a read-only memory (Read Only Memory, ROM) 402 according to a program or from a storage device 408 A program loaded into a random access memory (RAM) 403 performs various appropriate actions and processes. In the RAM 403, various programs and data required for the operation of the electronic device 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404 .

Typically, the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) output device 407 , a speaker, a vibrator, etc.; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409 . Communication means 409 may allow electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. Although FIG. 7 shows electronic device 400 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 409, or from the storage device 408, or from the ROM 402. When the computer program is executed by the processing device 401, the above-mentioned functions defined in the target tracking method of the embodiment of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, Portable Compact Disk Read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or the above any suitable combination. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . The program code embodied on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: electric wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.

In some embodiments, clients and servers can communicate using any currently known or future developed network protocols, such as HyperText Transfer Protocol (HTTP), and can communicate with digital data in any form or medium. Communication (eg, a communication network) interconnects. Examples of communication networks include Local Area Networks (LANs), Wide Area Networks (WANs), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently Known or future developed networks.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: acquire a target video frame; determine a target area by detecting the features of the target video frame position information in the target video frame; if the determination of the position information of the target area in the target video frame fails, determine the target shooting position information; according to the target shooting position information and the camera projection algorithm, determine again Position information of the target area in the target video frame.

Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages - such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field-Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (Field-Programmable Gate Arrays, ASICs), Application-Specific Standard Products ( Application Specific Standard Product, ASSP), System-on-a-chip (SOC), Complex Programmable Logic Device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, the present disclosure provides a target tracking method, including:

Get the target video frame;

According to one or more embodiments of the present disclosure, in the target tracking method provided by the present disclosure, the determining the target shooting position information includes:

According to one or more embodiments of the present disclosure, in the target tracking method provided by the present disclosure, determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm includes:

According to one or more embodiments of the present disclosure, in the target tracking method provided by the present disclosure, the number of the detected video frames is one or more, and when the number of the detected video frames is multiple, the The target transformation relationship is the average value of the transformation relationships corresponding to the detected video frames, and the target transformation relationship includes transformation scale and transformation displacement.

According to one or more embodiments of the present disclosure, in the target tracking method provided by the present disclosure, determining the target shooting position information according to the initial shooting position information and the target transformation relationship includes:

According to one or more embodiments of the present disclosure, in the target tracking method provided by the present disclosure, according to the target shooting position information and the camera projection algorithm, the position information of the target area in the target video frame is determined again, including: :

According to one or more embodiments of the present disclosure, the target tracking method provided by the present disclosure further includes:

According to one or more embodiments of the present disclosure, the present disclosure provides a target tracking device, including:

The video frame module is used to obtain the target video frame;

According to one or more embodiments of the present disclosure, in the target tracking device provided by the present disclosure, the shooting position module is used for:

According to one or more embodiments of the present disclosure, in the target tracking device provided by the present disclosure, the number of the detected video frames is one or more, and when the number of the detected video frames is multiple, the The target transformation relationship is the average value of the transformation relationships corresponding to the detected video frames, and the target transformation relationship includes transformation scale and transformation displacement.

According to one or more embodiments of the present disclosure, in the target tracking device provided by the present disclosure, the second location module is used for:

According to one or more embodiments of the present disclosure, in the target tracking device provided by the present disclosure, the device further includes a normal module for:

According to one or more embodiments of the present disclosure, the present disclosure provides an electronic device, comprising:

processor;

a memory for storing the processor-executable instructions;

The processor is configured to read the executable instructions from the memory, and execute the instructions to implement any one of the target tracking methods provided in the present disclosure.

According to one or more embodiments of the present disclosure, the present disclosure provides a computer-readable storage medium storing a computer program for executing any of the objects provided by the present disclosure tracking method.

According to one or more embodiments of the present disclosure, the embodiments of the present disclosure further provide a computer program product, including a computer program, for executing any one of the target tracking methods provided in the present disclosure.

According to one or more embodiments of the present disclosure, an embodiment of the present disclosure further provides a computer program for executing the target tracking method as provided in any one of the present disclosure.

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

A target tracking method, comprising:

Get the target video frame;

Determine the position information of the target area in the target video frame by detecting the feature of the target video frame;

If the determination of the position information of the target area in the target video frame fails, determining the target shooting position information;

According to the target shooting position information and the camera projection algorithm, the position information of the target area in the target video frame is determined again.
The method according to claim 1, wherein the determining the target shooting position information comprises:

Determine the initial shooting position information in the first coordinate system by synchronous positioning and mapping algorithm;

determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm;

The target shooting position information is determined according to the initial shooting position information and the target transformation relationship.
The method according to claim 2, wherein determining the target transformation relationship between the first coordinate system and the second coordinate system corresponding to the camera projection algorithm comprises:

The first shooting position information and the second shooting position information of the detected video frame are respectively determined by the synchronous positioning and mapping algorithm and the camera projection algorithm, wherein the detected video frame is the successful tracking of the target area. video frame;

The target transformation relationship is determined according to the first shooting position information and the second shooting position information of the detected video frame.
The method according to claim 3, wherein the number of the detected video frames is one or more, and when the number of the detected video frames is multiple, the target transformation relationship is each of the The average value of the transformation relationship corresponding to the detected video frame, the target transformation relationship includes transformation scale and transformation displacement.
The method according to claim 2, wherein determining the target shooting position information according to the initial shooting position information and the target transformation relationship comprises:

The initial shooting position information is transformed from the information of the first coordinate system to the information of the second coordinate system based on the target transformation relationship, so as to obtain the target shooting position information.
The method according to claim 1, wherein determining the position information of the target area in the target video frame again according to the target shooting position information and a camera projection algorithm, comprising:

According to the target shooting position information and the position information of the target shape object corresponding to the target area, the position is solved, and the displacement information from the shooting position to the target shape object is determined; wherein, the target area is the target video frame. The area where the target shape object is located;

The displacement information from the shooting position to the target-shaped object is input into the projection equation of the camera projection algorithm, and the position information of the target area in the target video frame is determined.
The method according to any one of claims 1, 3 or 6, further comprising:

The initial normal vector of the plane where the target shape object corresponding to the target area is located is determined, and the projection of the initial normal vector on the horizontal plane or the vertical plane is determined as the target normal vector.
A target tracking device, comprising:

The video frame module is used to obtain the target video frame;

The first position module, for determining the position information of target area in the described target video frame by the feature detection to described target video frame;

a shooting position module, configured to determine target shooting position information if the position information of the target area in the target video frame fails to be determined;

The second position module is configured to re-determine the position information of the target area in the target video frame according to the target shooting position information and the camera projection algorithm.
An electronic device, characterized in that the electronic device comprises:

processor;

a memory for storing the processor-executable instructions;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the target tracking method according to any one of the preceding claims 1-7.
A computer-readable storage medium, characterized in that the storage medium stores a computer program, and the computer program is used to execute the target tracking method according to any one of the preceding claims 1-7.
A computer program product, characterized in that it includes a computer program, and the computer program is used to execute the target tracking method according to any one of the preceding claims 1-7.
A computer program, characterized in that the computer program is used to execute the target tracking method according to any one of the above claims 1-7.