CN116309719A

CN116309719A - Target tracking method, device, computer equipment and storage medium

Info

Publication number: CN116309719A
Application number: CN202310282040.9A
Authority: CN
Inventors: 朱开元; 周智慧; 程超; 朱世强; 顾建军
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-06-23

Abstract

The application relates to a target tracking method, a target tracking device, computer equipment and a storage medium. When the loss of a target object is detected, acquiring a history image with a preset frame number, wherein the history image comprises the target object; determining a moving speed of the target object based on the history image; predicting a coordinate position of the target object based on the moving speed and the moving time of the target object; the moving time comprises a losing time and a delay time of the image acquisition equipment, and the delay time comprises a starting delay of the image acquisition equipment; and adjusting the pose of the image acquisition equipment based on the coordinate position. By adopting the method, the target object position prediction based on the historical image can be realized; delay time is introduced in prediction calculation, so that the accuracy of target object position prediction can be improved, the robustness of a prediction algorithm is improved, and the effect of improving tracking accuracy after target loss is achieved.

Description

Target tracking method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of robot following technologies, and in particular, to a target tracking method, a device, a computer device, and a storage medium.

Background

Service robots are often used in home companion scenarios to address their physiological and emotional needs in time by actively recognizing the identity of the elderly or children and following them continuously.

In practical applications of robots, there often occurs a situation that the target is lost, for example, the average or instantaneous moving speed of the target is greater than the moving speed of the robot, or the robot cannot reach a speed matching the target speed in a short time, which results in that the target exceeds the image acquisition field of view of the robot. When the target is lost, the prior art adopts measures including stopping the pose adjustment of the robot, starting tracking when the target object enters the field of view again, tracking a certain distance according to the moving direction of the target object, stopping the pose adjustment if the target object does not enter the field of view yet, or restoring the pose to the pose state before the loss. However, such countermeasures still have little effect on re-achieving target tracking.

Therefore, the current service robot still has the problem of low tracking success rate after the target is lost.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a target tracking method, apparatus, computer device, and storage medium that can improve tracking accuracy after target loss.

In a first aspect, the present embodiment provides a target tracking method, applied to an image acquisition device, the method including:

when the loss of the target object is detected, acquiring a history image with a preset frame number, wherein the history image comprises the target object;

determining a moving speed of the target object based on the history image;

predicting a coordinate position of the target object based on the moving speed and the moving time of the target object; the moving time comprises a losing time and a delay time of the image acquisition equipment, and the delay time comprises a starting delay of the image acquisition equipment;

and adjusting the pose of the image acquisition equipment based on the coordinate position.

In one embodiment, the determining the moving speed of the target object based on the history image includes:

acquiring a historical target human body detection frame of a target object in the historical image;

and calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition equipment.

In one embodiment, the calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition device includes:

Acquiring a first historical image and a second historical image of adjacent frames and a time difference corresponding to the adjacent frames;

determining a horizontal movement difference value of the historical target human body detection frame relative to the image acquisition device based on the positions of the historical target human body detection frames in the first historical image and the second historical image;

and determining the moving speed of the target object based on the horizontal movement difference value, the time difference and the historical horizontal movement speed of the image acquisition device at the moment of the first historical image.

In one embodiment, the calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition device further includes:

calculating an average value based on the moving speeds of the target object at the time points of the plurality of history images;

and determining the average value as the moving speed of the target object.

In one embodiment, predicting the coordinate position of the target object based on the movement speed and the movement time of the target object includes:

acquiring steering engine response time and algorithm running time of the image acquisition equipment; the steering engine response time is the time from the acceleration of the steering engine to the preset horizontal movement speed, and the algorithm running time is the calculation time from the acquisition of the historical image to the adjustment of the pose of the image acquisition equipment;

And determining the start delay based on the steering engine response time and algorithm running time.

In one embodiment, the adjusting the pose of the image capturing device based on the coordinate position includes:

acquiring a current real-time image;

determining a real-time target human body detection frame based on the real-time image;

and adjusting the pose of the image acquisition equipment based on the difference value between the coordinate position of the real-time target human body detection frame in the real-time image and the center point of the real-time image.

In one embodiment, the determining a real-time target human detection frame based on the real-time image includes:

determining whether a target face of the target object exists or not based on the real-time image;

if the target face exists in the real-time image, determining a target face detection frame based on the target face; determining a real-time target human body detection frame under the real-time image based on the target human face detection frame;

if the target human face does not exist in the real-time image, a human body detection frame is determined based on the real-time image, and a real-time target human body detection frame is determined based on the matching condition of the human body detection frame and a reference human body detection frame.

In one embodiment, the determining whether there is a target face of the target object based on the real-time image includes:

determining a face detection frame based on the real-time image;

determining a first feature vector based on the face detection frame;

and determining whether the target face exists or not based on the first feature vector and a reference face feature vector in a database.

In a second aspect, the present embodiment provides an object tracking apparatus applied to an image capturing device, the apparatus including:

the acquisition module is used for acquiring a history image of a preset frame number when the loss of the target object is detected, wherein the history image comprises the target object;

a calculation module for determining a moving speed of the target object based on the history image;

a prediction module for predicting a coordinate position of the target object based on a moving speed and a moving time of the target object; the moving time comprises a losing time and a delay time of the image acquisition equipment, and the delay time comprises a starting delay of the image acquisition equipment;

and the adjusting module is used for adjusting the pose of the image acquisition equipment based on the coordinate position.

In a third aspect, the present embodiment provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.

In a fourth aspect, the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the preceding claims.

The target tracking method, the device, the computer equipment and the storage medium, wherein the method comprises the following steps: when the loss of the target object is detected, acquiring a history image with a preset frame number, wherein the history image comprises the target object; determining a moving speed of the target object based on the history image; predicting a coordinate position of the target object based on the moving speed and the moving time of the target object; the moving time comprises a losing time and a delay time of the image acquisition equipment, and the delay time comprises a starting delay of the image acquisition equipment; and adjusting the pose of the image acquisition equipment based on the coordinate position. The method comprises the steps of determining the moving speed of a target object through a historical image, and predicting the coordinate position of the target object according to the moving speed of the target object, so that the target object position prediction based on the historical image can be realized; the delay time parameter of the image acquisition equipment is introduced in the prediction process, so that the accuracy of target object position prediction can be improved, the robustness of a prediction algorithm is improved, and the effect of improving the tracking accuracy after the target is lost is achieved.

Drawings

FIG. 1 is an application environment diagram of a target tracking method in one embodiment;

FIG. 2 is a flow chart of a target tracking method in one embodiment;

FIG. 3 is a schematic diagram of a target face detection box in one embodiment;

FIG. 4 is a schematic diagram of a human body detection block in one embodiment;

FIG. 5 is a schematic diagram of a real-time target human detection frame in one embodiment;

FIG. 6 is a schematic diagram of a face detection box in one embodiment;

FIG. 7 is a flow chart of a target tracking method according to another embodiment;

FIG. 8 is a block diagram of a target tracking device in one embodiment;

FIG. 9 is a block diagram of an image acquisition device in one embodiment;

FIG. 10 is a block diagram of an image capturing device according to another embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The target tracking method provided by the embodiment of the application can be applied to an independently operated terminal, and the terminal comprises an image acquisition device, a memory and a processor, wherein the image acquisition device acquires images in a field of view range and the processor stores the images in the memory. When the loss of the target object is detected, the processor acquires a history image of a preset frame number stored in the memory, determines the moving speed of the target object based on the history image, predicts the coordinate position of the target object based on the moving speed of the target object, and adjusts the pose of the image acquisition device based on the coordinate position.

For practical needs, the target tracking method provided in the embodiment of the present application may also be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. When the loss of the target object is detected, the terminal 102 may acquire a history image of a preset frame number in the server 104, determine a moving speed of the target object based on the history image, predict a coordinate position of the target object based on the moving speed of the target object, and adjust a pose of the image capturing apparatus based on the coordinate position. The server 104 may be implemented as a stand-alone server or a server cluster including a plurality of servers.

In one embodiment, as shown in fig. 2, which is a schematic flow chart of the target tracking method provided in this embodiment, taking an example that the method is applied to the terminal 102 in fig. 1 as an illustration, the terminal 102 may be an image capturing device, and the method includes the following steps:

step S100, when the loss of the target object is detected, a history image of a preset frame number is acquired, wherein the history image comprises the target object.

The target object is a target to be tracked, and can be a human or animal or a moving object. The determination of the target object may be based on a preset, or may be an object that is closer to the image capturing device or has a larger display area in the captured image, or may be a determination method of other target objects, which is not limited herein.

The determination of whether the target object is lost may be based on visual recognition, or may be based on detecting a state record of the target object by an upper computer or other devices, or may be based on a received tracking instruction, or may be based on other methods, which are not limited herein.

The historical images of the preset frame number are collected by the image collecting device. The image acquisition device can be stored in a memory of the image acquisition device after the acquisition is completed, or can be stored in a server, and correspondingly, the history image can be acquired from the memory of the image acquisition device, the server and other devices. The preset frame number can be set according to actual needs, and can be a positive integer not less than 1.

Step S200, determining a moving speed of the target object based on the history image.

The moving speed may be the moving speed of the target object under the reference frame of the image acquisition device, or the moving speed of the target object under the reference frame of the plane where the image acquisition device is located.

Determining the moving speed of the target object based on the historical image, which may be determining the moving speed of the target object based on a single-frame historical image, for example, determining the moving distance of the target object in the single-frame exposure time of the image acquisition device through residual shadows generated by the movement of the target object in the historical image, thereby determining the moving speed of the target object; the moving speed of the target object may be determined based on a plurality of historical images, for example, the moving speed of the target object may be determined according to the moving distance difference value and the shooting time difference value of the target object in different historical images, or other methods for determining the moving speed of the target object based on the historical images are not limited herein. It can be understood that when the multi-frame historical image is obtained, the historical image can be an adjacent frame, or a non-adjacent frame, or a calculation mode for obtaining a part of adjacent frames and a part of non-adjacent frames can be set according to actual requirements.

Step S300, predicting the coordinate position of the target object based on the moving speed and the moving time of the target object; the moving time comprises a losing time and a delay time of the image acquisition equipment, and the delay time comprises a starting delay of the image acquisition equipment.

It can be understood that the image capturing device has a certain hysteresis when the target object is lost, and is limited by aspects such as hardware, and cannot instantly complete the processes of detecting whether the target object is lost, calculating the moving speed, controlling the pose adjustment of the image capturing device, and the like, so that when the coordinate position of the target object is predicted, in order to improve the possibility that the image capturing device realizes target tracking again, the delay time of the image capturing device needs to be incorporated into a prediction algorithm of the coordinate position.

The start-up delay may be preset through a priori knowledge, or may be determined based on the current moving speed of the image capturing device, for example, the time required for reaching the preset moving speed may be determined according to the current moving speed of the image capturing device, which is not limited herein.

And step S400, adjusting the pose of the image acquisition equipment based on the coordinate position.

The pose adjustment of the image acquisition device can be realized by controlling the motion execution unit to adjust the yaw angle or pitch angle of the image acquisition device, or by controlling the motion execution unit to adjust the position of the image acquisition device under the plane reference system.

According to the target tracking method provided by the embodiment, the moving speed of the target object is determined through the historical image, and the coordinate position of the target object is predicted according to the moving speed of the target object, so that the target object position prediction based on the historical image can be realized; the delay time parameter of the image acquisition equipment is introduced in the prediction process, so that the accuracy of target object position prediction can be improved, the robustness of a prediction algorithm is improved, and the effect of improving the tracking accuracy after the target is lost is achieved.

The moving speed in this embodiment is the moving speed of the target object in the reference frame of the plane in which the image capturing apparatus is located. The horizontal movement may be a yaw rotation of the image capturing device itself, and the historical horizontal movement speed of the image capturing device may be a yaw rotation speed of the image capturing device at a historical time. It will be appreciated that when the image capturing device is also in a horizontal movement state, the movement speed of the target object is calculated by the history image, and calculation is also required in combination with the history movement state of the image capturing device.

The historical target human body detection frame may be marked in a historical image in a pixel frame form, or may be stored in record information corresponding to the historical image in a coordinate form, and the content stored in the record information may be acquired data including single or multiple frames of historical images, may include image information of the historical images, such as acquisition time, exposure time, acquisition equipment information, aperture size, resolution, and the like, and may also include status information of the image acquisition equipment during acquisition of the historical images, such as yaw rate movement speed, pitch angle movement speed, displacement speed, and the like of the image acquisition equipment, which is not limited herein.

The historical horizontal movement speed of the image acquisition device may be recorded in the record information corresponding to the historical image, or may be obtained by obtaining a motion log file of the image acquisition device, and based on the acquisition time of the historical image, obtaining a corresponding motion parameter in the motion log file, thereby determining the historical horizontal movement speed of the image acquisition device, or may be obtained by other methods, which is not limited herein.

Calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition device, wherein the moving speed of the target object under the reference frame of the image acquisition device can be determined according to the moving distance of the historical target human body detection frame in the historical image; and determining the moving speed of the target object under the plane reference system of the image acquisition device based on the first moving speed and the historical horizontal moving speed of the image acquisition device.

In addition, the horizontal movement may further include a positional movement of the image capturing device on the plane, and the horizontal movement speed may further include a positional movement speed of the image capturing device on the plane. It will be appreciated that at some historic times, the image acquisition device may track the target object in real time to make a positional movement forward or away from the target object. Calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition device can further comprise calculating the moving speed of the target object according to the position moving speed of the image acquisition device at the corresponding moment of the historical image and the position of the historical target human body detection frame in the historical image.

According to the target tracking method provided by the embodiment, the moving speed of the target object is determined through the position of the historical target human body detection frame in the historical image and the horizontal moving speed of the image acquisition equipment, so that the moving speed calculation of the target object under the reference system of the plane can be realized, and the effect of improving the target tracking accuracy is achieved.

The movement speed may be an angular speed, i.e. an angular speed at which the target object moves in the reference frame of the image acquisition device.

The determining a horizontal movement difference value of the historical target human body detection frame relative to the image acquisition device based on the positions of the historical target human body detection frames in the first historical image and the second historical image may be determining a horizontal movement angle of the historical target human body detection frame relative to the image acquisition device.

On the premise that the view field angle of the image acquisition device is known, the angle of the target object in the historical image can be determined based on the position of the target object in the historical image and the view field angle of the device. The moving angle of the target object can be obtained through the angle difference value between the adjacent frame history images. Further, based on the movement angle of the target object and the time difference of the history image, the movement angular velocity of the target object with respect to the image pickup apparatus may be determined. Based on the movement angular velocity and the historical horizontal movement velocity of the image acquisition device at the first historical image time, the relative movement velocity of the target object and the image acquisition device under the reference system of the plane can be determined.

In a specific embodiment, an i-th frame history image and an i-1-th frame history image are acquired, the i-th frame image is a history image of a last available history target human body detection frame, yaw angles of target objects in the i-th frame and the i-1-th frame image are respectively determined based on a view field angle of an image acquisition device and positions of the history target human body detection frame in the history image, acquisition time differences of the i-th frame and the i-1-th frame image are acquired, and a history horizontal movement speed of the image acquisition device in the i-th frame is determined based on the yaw angle differences, the acquisition time differences and the history horizontal movement speed, and a calculation formula is as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,

for the yaw rate of the target object in the i-th frame history image,/for the target object in the i-th frame history image>

For yaw angle of target object in i-th frame history image, T _i Image acquisition time for i-th frame history image, < >>

The historical horizontal movement speed of the ith frame of the image acquisition device.

According to the target tracking method provided by the embodiment, the moving speed of the target object is determined based on the horizontal movement difference value, the time difference and the historical horizontal movement speed of the image acquisition device at the time of the first historical image, so that the calculation of the relative movement speed of the target object and the image acquisition device under the reference system of the plane can be realized, and compared with the calculation of the actual displacement of the target object by establishing a space model, the calculation data amount and calculation time consumption can be reduced, and the effect of improving the target tracking speed is achieved.

And determining the average value as the moving speed of the target object.

The time of the plurality of historical images may be the time of a continuous frame, or the time of a part of frames may be selected according to a preset rule.

In a specific embodiment, taking the yaw rate of the moving speed as the target object as an example, the formula for calculating the average value is as follows:

for average yaw rate +.>

Yaw rate at the time of the i-th frame history image of the target object.

According to the target tracking method provided by the embodiment, the average value is calculated based on the moving speeds of the target object at the moments of a plurality of historical images, and the average value is used as the moving speed of the target object, so that the calculated data quantity of the coordinate position of the target object can be reduced, and the effect of improving the tracking speed can be achieved.

It can be understood that after the target is determined to be lost, the image acquisition device acquires relevant information, such as a history image, and calculates the coordinate position of the target object, a certain time is required for the steering engine to accelerate to a preset horizontal moving speed after receiving the control instruction, and a certain time is required for the steering engine to recapture the target object on the algorithm level and track in real time after reaching the predicted target object coordinate position. Therefore, the accuracy of the judgment of the coordinate position of the target object can be affected to a certain extent by the algorithm running and the delay of the steering engine response.

The steering engine response time can be delay response time from receiving the re-tracking instruction to re-capturing the target object for real-time tracking. The steering engine response time can comprise the time from the acceleration of the steering engine to the preset horizontal movement speed, the algorithm running time is the calculation time from the acquisition of the historical image to the adjustment of the pose of the image acquisition equipment, the steering engine response time and the algorithm running time can be obtained based on priori knowledge, and can be set according to actual requirements. The preset horizontal movement speed can be the maximum horizontal movement speed of the steering engine, and can also be set according to actual requirements. Further, the response time of the steering engine can be the time when the steering engine accelerates from a static state to a preset horizontal movement speed, or the time when the current running speed of the steering engine accelerates to the preset horizontal movement speed. For practical purposes, if the steering engine is configured to perform a deceleration operation near the predicted coordinate position, the steering engine response time may also include the time required to decelerate near the predicted coordinate position. In addition, the steering engine response time may also include the response time required to complete re-capture of the target object after approaching or reaching the predicted coordinate position.

The start-up delay may be determined based on the steering engine response time and the algorithm running time in units of time, and may be determined according to the addition of the steering engine response time and the algorithm running time. The starting delay can also be determined by taking the frame number as a unit based on the steering engine response time and the algorithm running time, and can be determined according to the product of the steering engine response time and the algorithm running time added and the sampling frequency of the image acquisition equipment. In one embodiment, the calculation formula for the start-up delay is as follows:

T _d ＝(Tw+Ts _i )*FPS

wherein T is _d For starting delay, tw is steering engine response time, ts _i For algorithm run time, the FPS is the image acquisition device sampling frequency.

Further, if the steering engine response time is the time when the current moving speed of the steering engine is accelerated to the preset horizontal moving speed, the steering engine response time can be determined according to the ratio of the time when the current moving speed reaches the preset horizontal moving speed to the time when the static state reaches the preset horizontal moving speed; in addition, the steering engine response time at different initial moving speeds can be counted previously, and the steering engine response time is determined based on the counted result.

According to the target tracking method, the starting delay is determined based on the response time of the steering engine and the operation time of the algorithm, so that the calculation of the starting delay is realized, and the effect of improving the accuracy of position prediction after target loss can be achieved.

acquiring a current real-time image;

Wherein the real-time image may be acquired by an image acquisition device. The real-time target human body detection frame is determined based on the real-time image, and the real-time target human body detection frame where the target object is located can be determined based on image recognition.

Based on the difference value between the coordinate position of the real-time target human body detection frame in the real-time image and the center point of the real-time image, the pose of the image acquisition device can be adjusted, and one or more of the yaw angle, the pitch angle, the distance from the target object and the like of the image acquisition device can be adjusted.

The yaw angle of the image acquisition equipment can be adjusted by calculating the average abscissa of the real-time target human body detection frame, determining the yaw angle difference value between the average abscissa of the real-time target human body detection frame and the right middle abscissa of the visual field range of the image acquisition equipment, and adjusting the pose of the image acquisition equipment based on the yaw angle difference value.

The pitch angle of the image acquisition equipment is adjusted by calculating the average ordinate of a real-time target human body detection frame, a target human face or other parts, based on the pitch angle difference value of the average ordinate and the ordinate in the middle of the visual field range of the image acquisition equipment, and adjusting the pose of the image acquisition equipment based on the pitch angle difference value.

The distance between the image acquisition equipment and the target object is adjusted, namely the pixel width of a real-time target human body detection frame is calculated, the required radial displacement is obtained by conversion based on the difference value between the pixel width of the real-time target human body detection frame and the reference pixel width, and the position of the image acquisition equipment is adjusted based on the radial displacement, namely the image acquisition equipment approaches to or leaves from the target in a forward or backward mode; the method can also be used for obtaining the real-time distance between the image acquisition device and the target object based on the real-time target human body detection frame pixel width conversion, determining the required radial displacement based on the difference value between the real-time distance and the reference distance, and adjusting the position of the image acquisition device based on the radial displacement.

In addition, the pose of the image acquisition device can be adjusted, and the parallel movement with the target object can be realized by adjusting the position of the image acquisition device, namely, the same-direction movement is performed along the moving direction of the target object. The distance between the image acquisition device and the target can be determined based on the pixel width of the real-time target human body detection frame, the real-time displacement of the target can be determined based on the moving distance of the real-time target human body detection frame in the real-time image, and the position of the image acquisition device can be adjusted based on the moving direction and the real-time displacement of the target, so that the parallel motion with the target object is realized.

According to the target tracking method provided by the embodiment, when the target object is in the field of view of the image acquisition device, the real-time target human body detection frame is determined based on the real-time image, and the pose of the image acquisition device is adjusted based on the real-time target human body detection frame, so that the effect of continuously tracking the target object can be achieved.

if the target face exists in the real-time image, determining a target face detection frame based on the target face; determining the real-time target human body detection frame based on the target human face detection frame;

It can be understood that the features of the face are richer than those of the human body, so that whether the target object exists in the real-time image can be more accurately judged through the detection of the target face. The determining whether the target face of the target object exists based on the real-time image may be based on image recognition, for example, a face recognition model provided by Rock-X SDK library of the rayleigh micro company may be adopted, the recognition result may be calculated within 40ms under the acceleration of NPU based on RK3399PRO chip, or any face recognition may be performed by using other face recognition models or algorithm libraries such as arcface, mobilefaceNet, deepface, openface, dlib, which is not limited herein.

And determining a target face detection frame based on the target face, wherein the target face detection frame can be marked by a pixel frame with a preset color on the target face in the real-time image, or can be determined the coordinate position of the target face detection frame in the image and recorded in the recording information corresponding to the real-time image. As shown in fig. 3, in a specific embodiment, after the target face is determined, the coordinate point of the upper left corner and the coordinate point of the lower right corner of the rectangular frame where the target face is located may be used as the coordinate position of the target face detection frame.

The real-time target human body detection frame is determined based on the target human face detection frame, and the position of the target human face detection frame can be matched with the coordinate position of the detected human body detection frame in the real-time image.

The human body detection frame can be obtained based on image recognition, for example, a human body detection model provided by the Rock-X SDK library can be adopted, a recognition result can be calculated within 70ms under the acceleration of an NPU based on an RK3399PRO chip, detection can be carried out by adopting a high-performance object detection model, for example, a fast-RCNN, SSD, YOLO series model and the like, and a human body posture detection model or algorithm library can be adopted, for example, mediapipe Pose, openPose, alphaPose and the like. After the human detection frames are determined, a plurality of human detection frames may also be numbered, as shown in fig. 4, and in a specific embodiment, three human detection frames are determined based on a human detection model in the current image and numbered No.1, no.2, and No.3, respectively.

The human body detection frame may be a body part which only marks a human body, or may be a whole body including a head. As shown in fig. 5, a human body detection frame matched with a target human face detection frame may be determined according to the relative positions of the human body detection frame and the target human face detection frame, so that the human body detection frame is determined as a real-time target human body detection frame.

Further, when the target object is too close to the image acquisition device or is blocked by an obstacle, the situation that the target face exceeds the acquisition range of the image acquisition device or the sight is limited and cannot be acquired may occur, and then the target object cannot be judged according to the face characteristics.

If the target face does not exist in the real-time image, determining a human body detection frame based on the real-time image, wherein the determination method of the human body detection frame can be realized based on image recognition, and the description is omitted herein.

The reference human body detection frame can be a coordinate position of a target object in a current image based on the displacement distance and pixel volume change of the real-time target human body detection frame in the historical image, and the reference human body detection frame is determined based on the coordinate position; the real-time target human body detection frame may be determined based on a coordinate distance of the human body detection frame and the reference human body detection frame based on a matching condition of the human body detection frame and the reference human body detection frame. In one embodiment, when the real-time target human body detection frame is determined based on the target human face detection frame, a human face-target tracking key value pair is determined based on the target human face detection frame and the real-time target human body detection frame, and if the target human face does not exist in the current image, the real-time target human body detection frame is determined based on the target tracked human body detection frame.

The reference human body detection frame can also be a reference image determined based on a real-time target human body detection frame in a history image with a target human face, and the reference image can be generated when the target human face is lost or can be stored in a database; the real-time target human body detection frame may be determined based on the matching of the human body detection frame and the reference human body detection frame based on the comparison of human body features and reference image features. In one embodiment, the reference human detection frame in the database is updated based on the image of the real-time target human detection frame. By updating the reference human body detection frame, the real-time target human body detection frame can be determined through the human body image of the target object which is recorded recently under the condition that the real-time image does not have the target human face, the identification accuracy of the target human body can be improved, and the effect of improving the target tracking accuracy is achieved.

According to the target tracking method provided by the embodiment, whether the target face exists or not is judged, and then the real-time target human body detection frame is determined according to the target face, so that accurate judgment of the real-time target human body detection frame through the face characteristics can be realized, and the effect of improving the target tracking accuracy can be achieved; by matching the human body detection frame in the real-time image with the reference human body detection frame determined based on the historical image, the real-time target human body detection frame is determined, the human body detection frame of the target object can be still identified under the condition that the target human face is lost, and the effect of improving the target tracking accuracy can be achieved.

determining a face detection frame based on the real-time image;

determining a first feature vector based on the face detection frame;

The Face detection frame may be determined by image recognition based on the real-time image, for example, a Face detection model provided by Rock-X SDK library of rayleigh micro company may be used, or other Face detection models or algorithm libraries such as MTCNN, retinaface, YOLO Face and SCRFD may be used to detect a Face in an image, which is not limited herein. After the face detection frames are determined, the face detection frames may also be numbered, as shown in fig. 6, and in a specific embodiment, three face detection frames are determined in the current image based on the face detection model and numbered as No.1, no.2, and No.3, respectively.

After the first feature vector is determined based on the face detection frame, determining whether the target face exists based on the first feature vector and a reference face feature vector in a database, which may be that the reference face feature in the database is acquired, determining the face feature with the shortest euclidean distance based on the face feature in the real-time image and the reference face feature, and if the euclidean distance is smaller than a preset threshold, determining the face detection frame corresponding to the face feature as the target face detection frame. In a specific embodiment, the preset threshold value of the euclidean distance may be 1.0. As shown in fig. 3, after the target face detection frame is determined, the number of the face detection frame may also be updated.

According to the target tracking method provided by the embodiment, the feature vector of the face detection frame is compared with the reference feature vector, so that whether the target face exists or not is determined, and the recognition of the target face can be realized.

In order to more clearly illustrate the technical solution of the present application, the present application also provides a detailed embodiment. As shown in fig. 6 and 7, the target tracking method of the present embodiment is applied to an image acquisition apparatus, where the image acquisition apparatus includes a base, a chassis, a driving wheel, a universal wheel, a steering engine 2, a controller, a processor, and a memory are disposed in the base, the base extends upward to install a platform, the platform is installed with a steering engine 1, the steering engine 1 is connected with a monocular camera through a connecting rod, the steering engine 1 is used for controlling a pitch angle acquired by the monocular camera, the steering engine 2 is used for controlling a yaw angle acquired by the monocular camera, and the driving wheel and the universal wheel are used for controlling movement of the image acquisition apparatus on a plane where the image acquisition apparatus is located. The target tracking method of the embodiment comprises the following steps:

in the image acquisition process, recording the time stamp of the image captured by the monocular camera and the rotation speed of the yaw steering engine at the moment, wherein the time stamp of the ith frame of image is T _i The rotation speed of the yaw angle steering engine is

And (3) counting the response speed of the steering engine based on experimental data, namely, sending a command from the controller until the steering engine speed reaches 73% of the target speed to be Tw. Recording algorithm running speed, i.e. from reading ith frame image to sending instruction to steering engine, as Ts _i 。

When the tracking of the target object is lost, namely the ith frame image can obtain a target human body detection frame, and the (i+1) th image can not obtain the target human body detection frame, estimating the speed of the N frames of target objects at the yaw angle before the target disappears, wherein the calculation formula is as follows:

wherein the method comprises the steps of

Is the yaw angle of the object motion in the ith frame image,/->

The yaw rate of the target motion in the i-th frame image is calculated.

Calculating the average yaw rate of N frames of targets before the target object disappears

The method comprises the following steps:

according to the sampling frequency FPS of the monocular camera, the starting delay generated by algorithm operation and steering engine response is T _d ＝(Tw+Ts _i )*FPS。

Determining the (i+T) th based on the start-up delay _d Yaw angle of frame target object relative to image acquisition device

And the yaw angle of the steering engine 2 is adjusted to theta _{yaw_est} 。

Before the target is lost or after the target is lost and the target object is captured again, the controller further comprises a step of realizing target human visual servo following based on light model reasoning, as shown in fig. 7, and comprises the following steps:

Step one: capturing a current image by a monocular camera, and running a face recognition algorithm to obtain a target face detection frame;

step two: a human body detection algorithm and a multi-target tracking algorithm are operated to obtain a human body tracking detection frame; the human body detection algorithm comprises the steps of taking a current acquired image as input, and running human body detection model reasoning based on a lightweight depth neural network to obtain a human body detection rectangular detection frame containing four point coordinates of upper left, lower left, upper right and lower right; the multi-target tracking algorithm comprises a human body detection frame obtained by a human body detection algorithm as an initial tracking target, and a human body tracking detection frame of continuous multi-frame images and a target tracking number are obtained by running the multi-target tracking algorithm;

step three: performing position matching on the target human face detection frame and the human body tracking detection frame to obtain a target human body detection frame;

step four: inputting the position information of the target human face detection frame and the target human body detection frame into a PID controller to obtain a motion control signal for adjusting the position of the monocular camera;

step five: the motion control unit adjusts the angle and the position of the monocular camera according to the control signal.

The face recognition algorithm in the first step comprises the following steps:

Step 1.1: taking a current acquired image as input, and running a face detection model reasoning based on a lightweight depth neural network to obtain a face detection rectangular detection frame containing four point coordinates of upper left, lower left, upper right and lower right;

step 1.2: cutting an image according to a face detection frame, taking the cut image as input, and running face recognition model reasoning based on a lightweight depth neural network to obtain a feature vector of a face;

step 1.3: searching a face feature vector with the shortest Euclidean distance with the face feature vector output in the step 1.2 from a database, wherein the distance is smaller than a threshold d _{face_th} And successfully identifying the target face, acquiring a label corresponding to the vector from a database, and combining the label with a face detection frame to obtain a 'key value pair of the face label-the face detection frame', thereby obtaining the target face detection frame.

Step three, the target human face detection frame and the human body tracking detection frame are subjected to position matching to obtain a target human body detection frame, and the method comprises the following steps:

step 3.1: comparing all human body tracking detection frames obtained by the image with the target human face detection frame in the step 1, and if a group of human face detection frames exist, the following inequality group is satisfied:

bbox_face _xmin ＞bbox_body _xmin

bbox_face _xmax ＜bbox_body _xmax

bbox_face _ymin ＞bbox_body _ymin

wherein bbox_face _ymin 、bbox_face _ymax The minimum abscissa and the maximum abscissa of the target face detection frame are adopted; bbox_face _ymin 、bbox_face _ymax The minimum ordinate and the maximum ordinate of the target face detection frame are; bbox_body _xmin 、bbox_body _xmax The method comprises the steps of (1) detecting a minimum abscissa and a maximum abscissa of a frame for human body tracking; bbox_body _ymin 、bbox_body _ymax The minimum ordinate and the maximum ordinate of the human body tracking detection frame; box body _height Tracking the height of the detection frame for the human body; if the detection frame group meeting the inequality group exists, the target human face detection frame is successfully matched with the human body tracking detection frame, and the target human body detection frame is obtained.

Step 3.2: if the matching of the target face detection frame and the human body tracking detection frame in the step 3.1 is successful, combining the face label of the target face detection frame with the target tracking number obtained by the human body detection algorithm to obtain a 'face label-target tracking number key value pair'. Meanwhile, the 'face label-face detection frame key value pair' in the step 1.3 is deleted.

Step 3.3: and for the image frames in which the human body can be detected but the human face cannot be detected, combining the human face label and the human body tracking detection frame into a new data structure by using the human face label-target tracking number key value pair obtained in 3.2 to obtain a target human body detection frame.

Inputting the position information of the target face detection frame and the target human body detection frame into a PID controller to obtain a motion control signal for adjusting the position of the monocular camera, wherein the method comprises the following steps:

Step 4.1: the average ordinate bbox_face of the target face detection frame is calculated by the following formula _yave ：

Calculating a pitch angle difference value theta of the vision sensor required to be adjusted for tracking the target through the following formula _pitch ：

Wherein frame is _height FOV is high for an image in pixels _V Is the angle of the longitudinal field of view of the monocular camera.

Step 4.2: if the pitch angle difference value theta of the vision sensor required to be adjusted for tracking the target _pitch Is greater than the threshold value theta _{pitch_th} Will be theta _pitch As input, a PID control algorithm is run and a control signal S is obtained which is needed to control the pitch angle of the monocular camera _pitch Otherwise, set S _pitch ＝0。

Step 4.3: calculating the width bbox_body of the target human body detection frame by the following formula _wiath And average abscissa bbox_body _xave ：

bbox_body _width ＝bbox_body _xmax —bbox_body _xmin

Step 4.4: calculating a yaw angle difference value theta of the vision sensor required to be adjusted for tracking the target by the following formula _yaw ：

Wherein frame is _width FOV is the width of an image in pixels _H Is the angle of the transverse field of view of the monocular camera.

Step 4.5: if the yaw angle difference value theta of the vision sensor required to be adjusted for tracking the target _yaw Is greater than the threshold value theta _{yaw_th} Will be theta _yaw As input, a PID control algorithm is run to obtain a control signal S needed to control the yaw angle of the vision sensor _yaw Otherwise, set S _yaw ＝0。

Step 4.6: the visual human width adjustment value w required for tracking the target is calculated by the following formula:

w＝bbox_body _{width_ref} -bbox_body _width

wherein bbox_body _{width_ref} The reference value of the width of the target human body detection frame represents the reasonable distance between the vision sensor and the target to be maintained.

Step 4.7: if the absolute value of the visual human body width adjustment value w is larger than the threshold value w _th Taking w as input, and running a PID control algorithm to obtain a control signal S for controlling the distance between the vision sensor and the target _d Otherwise, set S _d ＝0。

The motion control unit adjusts the angle and the position of the monocular camera according to the control signal, and the motion control unit comprises:

step 5.1: if the control signal S for controlling the pitch angle of the vision sensor _pitch Setting the pitch angle moving direction of the vision sensor as upward movement and the rotating speed as S _pitch The method comprises the steps of carrying out a first treatment on the surface of the If S _pitch Setting the pitch angle moving direction of the vision sensor as downward movement and the rotating speed as S _pitch The method comprises the steps of carrying out a first treatment on the surface of the If S _pitch And=0, stopping the vision sensor pitch angle movement.

Step 5.2: if the control signal S for controlling the yaw angle of the vision sensor _yaw Setting the yaw angle moving direction of the vision sensor to be right rotation and the rotation speed to be S when the yaw angle moving direction is more than 0 _yaw The method comprises the steps of carrying out a first treatment on the surface of the If S _yaw Setting the yaw angle moving direction of the vision sensor to rotate leftwards and the rotating speed to be S when the yaw angle moving direction is less than 0 _yaw The method comprises the steps of carrying out a first treatment on the surface of the If S _yaw =0, the vision sensor yaw angle motion is stopped.

Step 5.3: if the control signal S for controlling the distance between the vision sensor and the target _d Setting the moving direction of the vision sensor platform as forward moving and the moving speed as S when the moving speed is more than 0 _d The method comprises the steps of carrying out a first treatment on the surface of the If S _d Setting the moving direction of the vision sensor platform as backward moving and the moving speed as S when the moving speed is less than 0 _d The method comprises the steps of carrying out a first treatment on the surface of the If S _d =0, the movement of the vision sensor stage in the front-rear direction is stopped.

According to the target tracking method, the moving speed of the target object is determined through the historical image, and the coordinate position of the target object is predicted according to the moving speed of the target object, so that the target object position prediction based on the historical image can be realized; the response time of the steering engine and the running time of the algorithm are introduced in the prediction process as the delay time parameters of the image acquisition equipment, so that the accuracy of the position prediction of the target object can be improved, the robustness of the prediction algorithm is improved, and the effect of improving the tracking accuracy after the target is lost is achieved; the moving speed of the target object is determined based on the horizontal moving difference value, the time difference and the historical horizontal moving speed of the image acquisition device at the moment of the first historical image, so that the calculation of the relative moving speed of the target object and the image acquisition device under the reference system of the plane where the target object is located can be realized, and the calculated data quantity and the calculated time consumption can be reduced compared with the actual displacement of the target object calculated by establishing a space model through calculating the yaw angle, thereby achieving the effect of improving the target tracking speed; by calculating an average value based on the moving speeds of the target object at the moments of a plurality of historical images and taking the average value as the moving speed of the target object, the calculated data quantity of the coordinate position of the target object can be reduced, and the effect of improving the tracking speed can be achieved; the real-time target human body detection frame is determined through the target human face or the reference human body detection frame, so that the determination of a target object in the real-time tracking process can be realized, and the accuracy of target tracking is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an object tracking device for realizing the above-mentioned object tracking method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the object tracking device provided below may be referred to the limitation of the object tracking method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 8, there is provided an object tracking apparatus including: the system comprises an acquisition module, a calculation module, a prediction module and an adjustment module, wherein:

In one embodiment, the computing module is further configured to: acquiring a historical target human body detection frame of a target object in the historical image; and calculating the moving speed of the target object based on the position of the historical target human body detection frame in the historical image and the historical horizontal moving speed of the image acquisition equipment.

In one embodiment, the computing module is further configured to: acquiring a first historical image and a second historical image of adjacent frames and a time difference corresponding to the adjacent frames; determining a horizontal movement difference value of the historical target human body detection frame relative to the image acquisition device based on the positions of the historical target human body detection frames in the first historical image and the second historical image; and determining the moving speed of the target object based on the horizontal movement difference value, the time difference and the historical horizontal movement speed of the image acquisition device at the moment of the first historical image.

In one embodiment, the computing module is further configured to: calculating an average value based on the moving speeds of the target object at the time points of the plurality of history images; and determining the average value as the moving speed of the target object.

In one embodiment, the apparatus further comprises:

the real-time tracking module is used for acquiring a current real-time image; determining a real-time target human body detection frame based on the real-time image; and adjusting the pose of the image acquisition equipment based on the difference value between the coordinate position of the real-time target human body detection frame in the real-time image and the center point of the real-time image.

In one embodiment, the real-time tracking module is further configured to determine whether a target face of the target object exists based on the real-time image; if the target face exists in the real-time image, determining a target face detection frame based on the target face; determining a real-time target human body detection frame under the real-time image based on the target human face detection frame; if the target human face does not exist in the real-time image, a human body detection frame is determined based on the real-time image, and a real-time target human body detection frame is determined based on the matching condition of the human body detection frame and a reference human body detection frame.

In one embodiment, the real-time tracking module is further configured to: determining a face detection frame based on the historical image; determining a first feature vector based on the face detection frame; and determining whether the target face exists or not based on the first feature vector and a reference face feature vector in a database.

The various modules in the above-described object tracking device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 11. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a target tracking method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

determining a moving speed of the target object based on the history image;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

determining a moving speed of the target object based on the history image;

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A target tracking method applied to an image acquisition device, the method comprising:

determining a moving speed of the target object based on the history image;

2. The target tracking method according to claim 1, wherein the determining the moving speed of the target object based on the history image includes:

3. The target tracking method according to claim 2, wherein the calculating the moving speed of the target object based on the position of the historic target human body detection frame in the historic image and the historic horizontal moving speed of the image capturing apparatus includes:

4. The target tracking method according to claim 2, wherein the calculating the moving speed of the target object based on the position of the historic target human body detection frame in the historic image and the historic horizontal moving speed of the image capturing apparatus further comprises:

and determining the average value as the moving speed of the target object.

5. The target tracking method according to claim 1, characterized in that before predicting the coordinate position of the target object based on the moving speed of the target object and the moving time comprises:

6. The object tracking method according to claim 1, wherein the adjusting the pose of the image capturing apparatus based on the coordinate position includes:

Acquiring a current real-time image;

7. The target tracking method of claim 6, wherein the determining a real-time target human detection frame based on the real-time image comprises:

8. The target tracking method of claim 7, wherein the determining whether a target face of the target object exists based on the real-time image comprises:

Determining a face detection frame based on the real-time image;

determining a first feature vector based on the face detection frame;

9. An object tracking device, the device comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 8.