CN112330721B

CN112330721B - Three-dimensional coordinate recovery method and device, electronic equipment and storage medium

Info

Publication number: CN112330721B
Application number: CN202011255018.8A
Authority: CN
Inventors: 关英妲; 周杨; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2023-02-17
Anticipated expiration: 2040-11-11
Also published as: CN112330721A

Abstract

The disclosure relates to a method and a device for recovering three-dimensional coordinates, an electronic device and a storage medium. The method comprises the following steps: acquiring a three-dimensional coordinate of a target object at a previous moment of a current moment; determining two-dimensional coordinates of the target object corresponding to the multiple cameras at the previous moment according to the three-dimensional coordinates of the target object at the previous moment; for any camera in the multiple cameras, determining a candidate detection frame corresponding to the target object at the current moment according to the two-dimensional coordinates of the target object corresponding to the camera at the previous moment and the distance between the detection frame in the video frame acquired by the camera at the current moment; and determining the three-dimensional coordinates of the target object at the current moment according to the candidate detection frame corresponding to the target object at the current moment.

Description

Three-dimensional coordinate recovery method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a method and an apparatus for recovering a three-dimensional coordinate, an electronic device, and a storage medium.

Background

Three-dimensional reconstruction is a key technology in computer vision, and is widely applied to aspects of movie animation, automatic driving, medical image reconstruction and the like. The development of three-dimensional reconstruction techniques relies on hardware, for example, laser radar devices for autonomous driving, and RGB-D (Red, green, blue; deep) cameras are often used in the advanced consumer market for video entertainment, and these hardware devices have limited popularity in everyday life. Therefore, the three-dimensional scene is restored to be a hot spot according to the common two-dimensional image obtained from the digital camera or the smart phone, and the development of a software algorithm is promoted. The three-dimensional motion trail recovery refers to the recovery of the motion trail of a target object in a three-dimensional scene according to a two-dimensional video frame sequence. In the three-dimensional motion trajectory recovery, three-dimensional coordinates of a target object at different moments need to be recovered. How to improve the accuracy of three-dimensional coordinate restoration is a technical problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a technical solution for recovering three-dimensional coordinates.

According to an aspect of the present disclosure, there is provided a method of restoring three-dimensional coordinates, including:

acquiring a three-dimensional coordinate of a target object at a previous moment of a current moment;

determining two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous moment according to the three-dimensional coordinates of the target object at the previous moment;

for any camera in the multiple cameras, determining a candidate detection frame corresponding to the target object at the current moment according to the two-dimensional coordinates of the target object corresponding to the camera at the previous moment and the distance between the detection frame in the video frame acquired by the camera at the current moment;

and determining the three-dimensional coordinates of the target object at the current moment according to the candidate detection frame corresponding to the target object at the current moment.

In the embodiment of the disclosure, by acquiring a three-dimensional coordinate of a target object at a previous time of a current time, and according to the three-dimensional coordinate of the target object at the previous time, determining two-dimensional coordinates corresponding to the target object at the previous time and a plurality of cameras respectively, for any one of the plurality of cameras, determining a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinate of the target object at the previous time corresponding to the camera and a distance between the candidate detection frame in a video frame acquired by the camera at the current time, and determining a three-dimensional coordinate of the target object at the current time according to the candidate detection frame corresponding to the target object at the current time, the three-dimensional coordinate of the target object at the current time is acquired by using the three-dimensional coordinate of the target object at the previous time and combining a plurality of view angles corresponding to the plurality of cameras, thereby recovering to obtain an accurate three-dimensional motion trajectory.

In a possible implementation manner, the determining, according to the three-dimensional coordinates of the target object at the previous time, two-dimensional coordinates of the target object at the previous time, which respectively correspond to multiple cameras, includes:

and for any camera in the plurality of cameras, determining the two-dimensional coordinates of the target object corresponding to the camera at the previous moment according to the three-dimensional coordinates of the target object at the previous moment and the conversion relation between the three-dimensional coordinates and the two-dimensional coordinates corresponding to the camera.

In this implementation, the two-dimensional coordinates of the target object corresponding to the camera at the previous moment are determined by determining, for any one of the plurality of cameras, the three-dimensional coordinates of the target object at the previous moment and the conversion relationship between the three-dimensional coordinates corresponding to the camera and the two-dimensional coordinates, so that the two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous moment are accurately determined according to the three-dimensional coordinates of the target object at the previous moment based on the re-projection method.

In a possible implementation manner, for any one of the multiple cameras, determining a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinates of the target object corresponding to the camera at the previous time and the distance between the detection frames in the video frame acquired by the camera at the current time includes:

for any camera in the multiple cameras, determining a detection frame which is closest to a two-dimensional coordinate of the target object corresponding to the camera at the last moment in a video frame acquired by the camera at the current moment;

and determining the detection frame with the closest distance as a candidate detection frame corresponding to the target object at the current moment in response to the distance between the detection frame with the closest distance and the two-dimensional coordinates of the target object corresponding to the camera at the last moment being less than or equal to a first distance threshold value.

In this implementation, the candidate detection frame corresponding to the target object at the current time is filtered by the first distance threshold, so that the probability that the determined candidate detection frame belongs to the true detection frame of the target object, that is, the probability that the determined candidate detection frame includes the target object can be further increased. And determining the three-dimensional coordinates of the target object at the current moment based on the candidate detection frame corresponding to the target object at the current moment, which is beneficial to further improving the accuracy of the three-dimensional coordinates of the determined target object at the current moment.

In one possible implementation, the method further includes:

determining the first distance threshold according to the movement speed of the target object, wherein the first distance threshold is positively correlated with the movement speed of the target object.

In this implementation, by determining the first distance threshold according to the motion speed of the target object, it is possible to flexibly determine an appropriate first distance threshold based on the motion state of the target object, that is, by determining the first distance threshold having a smaller value when the motion speed of the target object is smaller, and determining the first distance threshold having a larger value when the motion speed of the target object is larger, so that it is possible to reduce the probability of erroneously determining a detection frame having a longer distance as a candidate detection frame when the motion speed of the target object is smaller, and to reduce the probability of missing a detection frame having a longer distance when the motion speed of the target object is larger, thereby improving the accuracy of the determined candidate detection frame, and further improving the accuracy of the three-dimensional coordinate of the determined target object at the current time.

In a possible implementation manner, the determining, according to the candidate detection box corresponding to the target object at the current time, the three-dimensional coordinates of the target object at the current time includes:

determining candidate three-dimensional coordinates of the target object at the current moment according to the candidate detection frame corresponding to the target object at the current moment;

and determining the three-dimensional coordinates of the target object at the current moment according to the candidate three-dimensional coordinates of the target object at the current moment.

In this implementation manner, the candidate three-dimensional coordinates of the target object at the current time are recovered according to the position of the candidate detection frame corresponding to the target object at the current time, and the three-dimensional coordinates of the target object at the current time are determined according to the candidate three-dimensional coordinates of the target object at the current time, so that the three-dimensional coordinates of the target object at the current time can be accurately determined, and an accurate three-dimensional motion trajectory can be recovered.

In a possible implementation manner, the determining, according to the candidate detection frame corresponding to the target object at the current time, the candidate three-dimensional coordinates of the target object at the current time includes:

for any one camera in any group of cameras in the plurality of cameras, determining a candidate two-dimensional coordinate of the target object corresponding to the camera at the current moment according to a candidate detection frame, corresponding to the target object at the current moment, determined by a video frame acquired by the camera at the current moment, wherein the any group of cameras comprises at least two cameras;

and determining the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current moment according to the candidate two-dimensional coordinates of the target object corresponding to the cameras in the group of cameras at the current moment and the conversion relation between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras.

In this implementation, for any one of a group of cameras in the plurality of cameras, a candidate detection frame corresponding to the target object at the current time is determined according to a video frame acquired by the camera at the current time, candidate two-dimensional coordinates of the target object corresponding to the camera at the current time are determined, and candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current time are determined according to the candidate two-dimensional coordinates of the target object corresponding to the camera at the current time and a conversion relationship between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras, so that the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current time can be more accurately recovered based on multi-view geometry.

In a possible implementation manner, the determining the three-dimensional coordinates of the target object at the current time according to the candidate three-dimensional coordinates of the target object at the current time includes:

and determining the three-dimensional coordinates of the target object at the current moment according to the candidate three-dimensional coordinates of the target object at the current moment, wherein the distance between the candidate three-dimensional coordinates and the three-dimensional coordinates of the target object at the previous moment is smaller than a second distance threshold value.

In this implementation, the candidate three-dimensional coordinates of the target object at the current time are filtered through the second distance threshold, so that the accuracy of the determined three-dimensional coordinates of the target object at the current time can be further improved.

In one possible implementation, the method further includes:

determining the second distance threshold according to the motion speed of the target object, wherein the second distance threshold is positively correlated to the motion speed of the target object.

In this implementation, by determining the second distance threshold according to the motion speed of the target object, it is possible to flexibly determine an appropriate second distance threshold based on the motion state of the target object, that is, when the motion speed of the target object is small, a smaller second distance threshold is determined, and when the motion speed of the target object is large, a larger second distance threshold is determined, so that when the motion speed of the target object is small, the three-dimensional coordinates of the target object at the current time can be determined based on the candidate three-dimensional coordinates in a smaller range, and when the motion speed of the target object is large, the three-dimensional coordinates of the target object at the current time can be determined based on the candidate three-dimensional coordinates in a larger range, so that the accuracy of the determined three-dimensional coordinates of the target object at the current time can be improved.

In one possible implementation manner, before the obtaining the three-dimensional coordinates of the target object at the time point previous to the current time point, the method further includes:

for any camera in the multiple cameras, determining a candidate detection frame corresponding to a target object at a first moment according to a detection frame with the highest confidence level in a video frame acquired by the camera at the first moment, wherein the first moment represents the starting moment of three-dimensional coordinate recovery;

and determining the three-dimensional coordinates of the target object at the first moment according to the candidate detection frame corresponding to the target object at the first moment.

In this implementation, for any one of the plurality of cameras, a candidate detection frame corresponding to a target object at a first time is determined according to a detection frame with the highest confidence level in a video frame acquired by the camera at the first time, and a three-dimensional coordinate of the target object at the first time is determined according to the candidate detection frame corresponding to the target object at the first time, so that the three-dimensional coordinate of the target object at the first time can be accurately determined. When the three-dimensional coordinates are restored at any time after the first time, the three-dimensional coordinates of the target object at the previous time are depended on, so that the subsequent three-dimensional coordinates are restored based on the three-dimensional coordinates of the target object at the first time, which are determined by the three-dimensional coordinates determination method, the accuracy of three-dimensional coordinate restoration is improved, and the accuracy of the three-dimensional motion trajectory of the target object which is restored subsequently is improved.

According to an aspect of the present disclosure, there is provided an apparatus for restoring three-dimensional coordinates, including:

the acquisition module is used for acquiring a three-dimensional coordinate of the target object at the previous moment of the current moment;

the first determining module is used for determining two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous moment according to the three-dimensional coordinates of the target object at the previous moment;

a second determining module, configured to, for any one of the multiple cameras, determine, according to a two-dimensional coordinate of the target object corresponding to the camera at the previous time and a distance between a detection frame in a video frame acquired by the camera at the current time, a candidate detection frame corresponding to the target object at the current time;

and the third determining module is used for determining the three-dimensional coordinates of the target object at the current moment according to the candidate detection frame corresponding to the target object at the current moment.

In one possible implementation manner, the first determining module is configured to:

for any one of the cameras, determining a detection frame which is closest to a two-dimensional coordinate of the target object corresponding to the camera at the previous moment in a video frame acquired by the camera at the current moment;

In one possible implementation, the apparatus further includes:

a fourth determining module, configured to determine the first distance threshold according to the motion speed of the target object, where the first distance threshold is positively correlated to the motion speed of the target object.

In one possible implementation manner, the third determining module is configured to:

and determining the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current moment according to the candidate two-dimensional coordinates of the target object corresponding to the cameras in the group of cameras at the current moment and the conversion relationship between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras.

and determining the three-dimensional coordinates of the target object at the current moment according to the candidate three-dimensional coordinates, of which the distance between the candidate three-dimensional coordinates of the target object at the current moment and the three-dimensional coordinates of the target object at the previous moment is smaller than a second distance threshold value, in the candidate three-dimensional coordinates of the target object at the current moment.

In one possible implementation, the apparatus further includes:

a fifth determining module, configured to determine the second distance threshold according to the motion speed of the target object, where the second distance threshold is positively correlated to the motion speed of the target object.

In one possible implementation, the apparatus further includes:

a sixth determining module, configured to determine, for any one of the multiple cameras, a candidate detection frame corresponding to a target object at a first time according to a detection frame with a highest confidence level in a video frame acquired by the camera at the first time, where the first time indicates a start time of three-dimensional coordinate recovery;

a seventh determining module, configured to determine, according to the candidate detection frame corresponding to the target object at the first time, a three-dimensional coordinate of the target object at the first time.

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a method for recovering three-dimensional coordinates provided by an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating a method for recovering three-dimensional coordinates according to an embodiment of the present disclosure.

Fig. 3 shows a block diagram of a three-dimensional coordinate restoration apparatus provided by an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an electronic device 800 provided by an embodiment of the disclosure.

Fig. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art three-dimensional reconstruction method, an algorithm for comparing mainstream is SFM (Structure From Motion). The algorithm estimates the relative motion of a camera during shooting by matching feature points in a multi-view image and based on a multi-view geometry principle, and further calculates the point cloud distribution of the structure of a target object. The SFM algorithm can be summarized into three steps of feature extraction, feature matching and subsequent engineering optimization. The SFM algorithm relies on the feature structure of the scene, i.e. requires more corners in the scene to extract enough matching feature points for subsequent engineering optimization. It is not universal for some special application scenarios, such as recovering a football or a player on a football pitch. This is because the number of corner points in a football pitch is small and the size of the target object (football or player) we are interested in is small. Therefore, with the SFM algorithm, it is difficult to recover the three-dimensional position of the target object in the scene with a small number of corners and/or a small size of the target object. Meanwhile, SFM relies on camera motion to recover a static scene, and many practical application scenes require the recovery of a dynamic scene while fixing the position of a camera, that is, recovering a three-dimensional motion trajectory of a target object.

In order to solve the technical problems similar to the above, embodiments of the present disclosure provide a method and an apparatus for restoring three-dimensional coordinates, an electronic device, and a storage medium, in which a three-dimensional coordinate of a target object at a previous time is acquired, two-dimensional coordinates of the target object at the previous time and corresponding to a plurality of cameras are determined according to the three-dimensional coordinate of the target object at the previous time, a candidate detection frame of the target object at the current time is determined according to a distance between the two-dimensional coordinate of the target object at the previous time and a detection frame of the camera in a video frame acquired by the camera at the current time, and a three-dimensional coordinate of the target object at the current time is determined according to the candidate detection frame of the target object at the current time, so that the three-dimensional coordinate of the target object at the previous time is acquired by combining a plurality of views corresponding to the plurality of cameras, thereby an accurate three-dimensional motion trajectory can be restored. The method and the device can recover to obtain the accurate three-dimensional motion trail in the scene with fewer angular points, and can be suitable for recovering the three-dimensional motion trail of the small-size target object which is difficult to detect.

Fig. 1 shows a flowchart of a three-dimensional coordinate restoration method provided by an embodiment of the present disclosure. The execution subject of the three-dimensional coordinate restoration method may be a three-dimensional coordinate restoration apparatus. For example, the three-dimensional coordinate restoration method may be executed by a terminal device or a server or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable device. In some possible implementations, the method for recovering three-dimensional coordinates may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, the method for restoring three-dimensional coordinates includes steps S11 to S14.

In step S11, the three-dimensional coordinates of the target object at the time immediately preceding the current time are acquired.

The embodiment of the disclosure can be applied to application scenarios such as intelligent scene analysis, intelligent education, intelligent cities, security protection, target detection, target tracking and the like. In the embodiment of the present disclosure, the area where the three-dimensional coordinate recovery and/or the three-dimensional motion trajectory recovery are/is required may be a relatively closed area, or may be a relatively open area. For example, the field requiring three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery may be a court, a mall, a classroom, or the like. The target object represents an object which needs to be subjected to three-dimensional coordinate restoration and/or three-dimensional motion trajectory restoration. For example, the target object may be a pedestrian, a vehicle, a player in a sports field (e.g., a player in a football stadium), a football in a football stadium, or any other object requiring three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery.

In the embodiment of the present disclosure, the three-dimensional coordinates of the target object may be determined separately for each video frame acquired by the camera. Of course, the video frames captured by the camera may not be analyzed frame by frame, for example, the three-dimensional coordinates of the target object may be determined every several video frames.

In the disclosed embodiment, the three-dimensional coordinates may be coordinates in a first coordinate system. The first coordinate system is a three-dimensional coordinate system. For example, the first coordinate system may be a world coordinate system or other virtual three-dimensional coordinate system. In a possible implementation manner, the first coordinate system may take a field plane requiring three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery as an x-y plane, and a z-axis of the first coordinate system is perpendicular to the x-y plane.

In one possible implementation, a time interval between two adjacent time instants may be equal to an inverse number of a frame rate at which the camera captures the video frames. In another possible implementation manner, the time interval between two adjacent time instants may be greater than the inverse of the frame rate at which the camera captures the video frames, for example, may be equal to H times the inverse of the frame rate at which the camera captures the video frames, where H is an integer greater than 1. The current time may represent a current time at which three-dimensional coordinate restoration is performed, and the previous time represents a previous time at which three-dimensional coordinate restoration is performed.

In this disclosure, if the current time is not the start time of three-dimensional coordinate restoration, the three-dimensional coordinate of the target object at the previous time of the current time may be acquired. Here, the start time of three-dimensional coordinate restoration may refer to a time at which three-dimensional coordinate restoration is started, that is, may refer to a time at which the first three-dimensional coordinate of the target object is restored. The start time of the three-dimensional coordinate restoration may be equal to the start time of the three-dimensional motion trajectory restoration. In one possible implementation manner, before the obtaining the three-dimensional coordinates of the target object at the time point previous to the current time point, the method further includes: for any camera in the multiple cameras, determining a candidate detection frame corresponding to a target object at a first moment according to a detection frame with the highest confidence level in a video frame acquired by the camera at the first moment, wherein the first moment represents the starting moment of three-dimensional coordinate recovery; and determining the three-dimensional coordinates of the target object at the first moment according to the candidate detection frame corresponding to the target object at the first moment.

In this implementation, the confidence of any detection box may represent the probability that the detection box belongs to the target object class. For example, the higher the probability that the detection box belongs to the target object class, the higher the confidence of the detection box; the lower the probability that the detection box belongs to the target object class, the lower the confidence of the detection box. For example, if the target object is a pedestrian, the target object category is a pedestrian category, and the confidence of any detection box may indicate the probability that the detection box belongs to the pedestrian category; for another example, if the target object is a football, the target object category is a football category, and the confidence of any detection box may indicate the probability that the detection box belongs to the football category. In this implementation manner, for any one of the multiple cameras, a detection frame with the highest confidence level in a video frame acquired by the camera at a first time may be determined as a candidate detection frame corresponding to a target object at the first time; or, for any one of the plurality of cameras, if the confidence of the detection frame with the highest confidence in the video frames acquired by the camera at the first moment is greater than or equal to a preset confidence threshold, determining the detection frame with the highest confidence in the video frames acquired by the camera at the first moment as a candidate detection frame corresponding to the target object at the first moment. Of course, in other possible implementation manners, for any one of the multiple cameras, a candidate detection frame corresponding to the target object at the first time may also be determined according to any detection frame whose confidence level is higher than a preset confidence level threshold in a video frame acquired by the camera at the first time, which is not limited herein.

In this implementation manner, the candidate three-dimensional coordinates of the target object at the first time may be determined according to the candidate detection frame corresponding to the target object at the first time; and determining the three-dimensional coordinates of the target object at the first moment according to the candidate three-dimensional coordinates of the target object at the first moment.

As an example of this implementation, for any one of a group of cameras in the plurality of cameras, determining candidate two-dimensional coordinates of the target object corresponding to the camera at the first time according to a candidate detection frame, corresponding to the target object at the first time, determined by a video frame acquired by the camera at the first time, where the group of cameras includes at least two cameras; and determining the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the first moment according to the candidate two-dimensional coordinates of the target object corresponding to the cameras in the group of cameras at the first moment and the conversion relationship between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras.

In this example, for any one of the plurality of cameras, the candidate two-dimensional coordinates of the target object corresponding to the camera at the first time may be determined according to the position of the candidate detection frame corresponding to the target object at the first time, which is determined by the video frame acquired by the camera at the first time. For example, if the candidate detection frame corresponding to the target object at the first time, which is determined by the video frame acquired by any camera at the first time, is the candidate detection frame B, any point on the candidate detection frame B or any point inside the candidate detection frame B may be used as the candidate two-dimensional coordinates of the target object corresponding to the camera at the first time. For example, the midpoint of the bottom edge of the candidate detection frame B may be used as a candidate two-dimensional coordinate of the target object corresponding to the camera at the first time.

As an example of this implementation, a median of the candidate three-dimensional coordinates of the target object at the first time may be determined as the three-dimensional coordinates of the target object at the first time. As another example of this implementation, an average value of the candidate three-dimensional coordinates of the target object at the first time may be determined as the three-dimensional coordinates of the target object at the first time.

In this implementation, for any one of the plurality of cameras, a candidate detection frame corresponding to the target object at the first time is determined according to a detection frame with the highest confidence level in a video frame acquired by the camera at the first time, and the three-dimensional coordinate of the target object at the first time is determined according to the candidate detection frame corresponding to the target object at the first time, so that the three-dimensional coordinate of the target object at the first time can be accurately determined. When the three-dimensional coordinates are restored at any time after the first time, the three-dimensional coordinates of the target object at the previous time are depended on, so that the subsequent three-dimensional coordinates are restored based on the three-dimensional coordinates of the target object at the first time, which are determined by the three-dimensional coordinates determination method, the accuracy of three-dimensional coordinate restoration is improved, and the accuracy of the three-dimensional motion trajectory of the target object which is restored subsequently is improved.

In another possible implementation manner, before the obtaining the three-dimensional coordinates of the target object at the time immediately before the current time, the method further includes: determining a first candidate detection frame corresponding to the target object at a first moment according to a detection frame with the highest confidence level in a video frame acquired by a first camera of the multiple cameras at the first moment, wherein the first camera is any one of the multiple cameras; determining a second candidate detection frame corresponding to the target object at the first moment according to a detection frame with the highest confidence level in a video frame acquired by a second camera of the multiple cameras at the first moment, wherein the second camera is any one of the multiple cameras except the first camera; determining a first candidate two-dimensional coordinate of the target object corresponding to the first camera at the first moment according to the first candidate detection frame; determining a second candidate two-dimensional coordinate of the target object corresponding to the second camera at the first moment according to the second candidate detection frame; and determining the three-dimensional coordinate of the target object at the first moment according to the first candidate two-dimensional coordinate, the second candidate two-dimensional coordinate and the conversion relation between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the first camera and the second camera. According to the implementation mode, the three-dimensional coordinates of the target object at the first moment can be determined only according to the video frames acquired by the two cameras at the first moment.

In a possible implementation manner, the target detection model may be used to detect a detection frame of a target object in a video frame acquired by the camera, that is, a boundary frame of the target object. For example, the object detection model may employ a Fast-RCNN (Faster recursive Convolutional Neural Network) or a Fast-RCNN (Fast recursive Convolutional Neural Network) or the like. The backbone network of the target detection model can adopt a structure of ResNet-18 and the like. In order to increase the processing speed of the target detection model, the target detection model can be compressed by methods of reducing the number of modules in a backbone network, channel pruning and the like, so that the model speed is increased without reducing the model precision. In one example, in order to improve the detection accuracy of the small-sized target object, the size of the Anchor (Anchor) may be adjusted, for example, the size of the Anchor is reduced, or a plurality of different Anchor sizes are set, so as to improve the recall rate of the small-sized target object and reduce the omission, and the method of the feature pyramid may be adopted to fuse the multi-layer features so as to detect the multi-scale target object. With this example, it may be adapted to recover three-dimensional coordinates and/or three-dimensional motion trajectories of small-sized target objects (e.g. players or football in a football stadium) and/or multi-scale target objects.

In step S12, two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous time are determined according to the three-dimensional coordinates of the target object at the previous time.

In the embodiment of the present disclosure, the plurality of cameras may be placed around a field where three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery are/is required. The two-dimensional coordinates may be coordinates in a second coordinate system. The second coordinate system is a two-dimensional coordinate system. For example, the second coordinate system may be a pixel coordinate system or an image coordinate system, etc. The two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous time may include two-dimensional coordinates of the target object corresponding to each of the plurality of cameras at the previous time. Of course, if any one of the cameras does not include the target object in the video frame acquired at the previous time, that is, the target object is not within the field of view of the camera at the previous time, the two-dimensional coordinates of the target object corresponding to the camera at the previous time may not be included in the two-dimensional coordinates of the target object corresponding to the cameras at the previous time, or the two-dimensional coordinates of the determined target object corresponding to the camera at the previous time may be outside the field of view of the camera.

In a possible implementation manner, the determining, according to the three-dimensional coordinates of the target object at the previous time, two-dimensional coordinates of the target object corresponding to a plurality of cameras at the previous time includes: and for any camera in the plurality of cameras, determining the two-dimensional coordinates of the target object corresponding to the camera at the previous moment according to the three-dimensional coordinates of the target object at the previous moment and the conversion relation between the three-dimensional coordinates and the two-dimensional coordinates corresponding to the camera. In this implementation, the two-dimensional coordinates of the target object corresponding to the camera at the previous time are determined by, for any one of the plurality of cameras, according to the three-dimensional coordinates of the target object at the previous time and the conversion relationship between the three-dimensional coordinates corresponding to the camera and the two-dimensional coordinates, so that the two-dimensional coordinates of the target object corresponding to the plurality of cameras at the previous time are accurately determined according to the three-dimensional coordinates of the target object at the previous time based on the re-projection method. Of course, in other possible implementation manners, in the process of determining the two-dimensional coordinates of the target object corresponding to the multiple cameras at the previous time according to the three-dimensional coordinates of the target object at the previous time, a method of minimizing a reprojection error may also be combined to further improve the accuracy of the two-dimensional coordinates of the determined target object corresponding to the multiple cameras at the previous time.

In step S13, for any one of the cameras, a candidate detection frame corresponding to the target object at the current time is determined according to the two-dimensional coordinates of the target object corresponding to the camera at the previous time and the distance between the detection frame in the video frame acquired by the camera at the current time.

For example, the plurality of cameras include a camera ch ₁ And the current moment is recorded as t, and the previous moment of the current moment is recorded as t-1. The target object corresponds to the camera ch at the last time t-1 ₁ Has a two-dimensional coordinate of q ₁ Video camera ch ₁ The position of a certain detection frame in the video frame acquired at the current moment t is q ₂ Then can be according to q ₁ And q is ₂ And determining whether the detection frame is determined as a candidate detection frame corresponding to the target object at the current time t.

In the embodiment of the present disclosure, in combination with the two-dimensional coordinates of the target object corresponding to each camera at the previous time, the candidate detection frame corresponding to the target object at the current time is determined, and a probability that the determined candidate detection frame belongs to a real detection frame of the target object is greater, that is, a probability that the determined candidate detection frame contains the target object is greater. The three-dimensional coordinates of the target object at the current moment are determined based on the candidate detection frame corresponding to the target object at the current moment, so that the accuracy of the three-dimensional coordinates of the determined target object at the current moment is improved.

In a possible implementation manner, for any one of the multiple cameras, determining a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinates of the target object corresponding to the camera at the previous time and the distance between the detection frame in the video frame acquired by the camera at the current time includes: for any one of the cameras, determining a detection frame which is closest to a two-dimensional coordinate of the target object corresponding to the camera at the previous moment in a video frame acquired by the camera at the current moment; and determining the detection frame with the closest distance as a candidate detection frame corresponding to the target object at the current moment in response to the distance between the detection frame with the closest distance and the two-dimensional coordinates of the target object corresponding to the camera at the last moment being less than or equal to a first distance threshold value.

In this implementation, for any one of the multiple cameras, there may be a detection frame in the video frame acquired by the camera at the current time, or there may not be a detection frame in the video frame. In a case that the camera has a detection frame in the video frame acquired at the current time, a detection frame closest to the two-dimensional coordinate of the target object corresponding to the camera at the previous time in the video frame acquired at the current time by the camera may be determined. For any of the plurality of cameras, if a distance between the detection frame with the closest distance and the two-dimensional coordinate of the target object corresponding to the camera at the previous time is greater than a first distance threshold, the detection frame with the closest distance may not be determined as the candidate detection frame of the target object corresponding to the current time. That is, for any one of the plurality of cameras, if there is no detection frame whose distance from the two-dimensional coordinates of the target object corresponding to the camera at the previous time is smaller than or equal to the first distance threshold in the video frame captured by the camera at the current time, the detection frame in the video frame captured by the camera at the current time may not be included in the candidate detection frame corresponding to the target object at the current time.

In this implementation, for any one of the multiple cameras, if a distance between the detection frame closest to the distance and the two-dimensional coordinate of the target object corresponding to the camera at the previous time is smaller than or equal to a first distance threshold, the position of the detection frame closest to the distance and the position of the target object at the previous time are closer, so that the probability that the detection frame closest to the distance belongs to the true detection frame of the target object is higher; if the distance between the detection frame closest to the target object and the two-dimensional coordinate of the camera corresponding to the last moment is greater than the first distance threshold, the position of the detection frame closest to the target object at the last moment is farther, and therefore the probability that the detection frame closest to the target object belongs to the real detection frame of the target object is lower. In this implementation, the candidate detection frame corresponding to the target object at the current time is filtered by the first distance threshold, so that the probability that the determined candidate detection frame belongs to the real detection frame of the target object, that is, the probability that the determined candidate detection frame includes the target object can be further increased. The three-dimensional coordinates of the target object at the current moment are determined based on the candidate detection frame corresponding to the target object at the current moment, which is beneficial to further improving the accuracy of the three-dimensional coordinates of the determined target object at the current moment.

As an example of this implementation, the method further comprises: determining the first distance threshold according to the movement speed of the target object, wherein the first distance threshold is positively correlated with the movement speed of the target object. In this example, the motion velocity of the target object may be determined from three-dimensional coordinates of the target object at a plurality of time instances. For example, the movement speed of the target object may be determined based on three-dimensional coordinates of the target object at the previous time and at least one time before the previous time, and a time interval between adjacent times. In this example, the greater the speed of movement of the target object, the greater the first distance threshold, whereas the lesser the speed of movement of the target object, the lesser the first distance threshold. In this example, by determining the first distance threshold according to the motion speed of the target object, it is possible to flexibly determine an appropriate first distance threshold based on the motion state of the target object, that is, by determining the first distance threshold having a smaller value when the motion speed of the target object is smaller, and determining the first distance threshold having a larger value when the motion speed of the target object is larger, so that it is possible to reduce the probability of erroneously determining a detection frame having a longer distance as a candidate detection frame when the motion speed of the target object is smaller, and to reduce the probability of missing a detection frame having a longer distance when the motion speed of the target object is larger, thereby making it possible to improve the accuracy of the determined candidate detection frame, and further making it possible to improve the accuracy of the three-dimensional coordinate of the determined target object at the current time. In this example, the first distance threshold may be dynamically adjusted as the movement speed of the target object changes, or the first distance threshold may not be adjusted after the first distance threshold is determined according to the movement speed of the target object.

Of course, in other examples, the first distance threshold may also be a preset constant to reduce the amount of calculation in the target tracking process.

In another possible implementation manner, for any one of the multiple cameras, determining a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinates of the target object corresponding to the camera at the previous time and the distance between the detection frames in the video frame acquired by the camera at the current time includes: and for any camera in the plurality of cameras, determining a detection frame which is closest to the two-dimensional coordinate distance of the target object corresponding to the camera at the previous moment in a video frame acquired by the camera at the current moment as a candidate detection frame corresponding to the target object at the current moment.

In step S14, according to the candidate detection frame corresponding to the target object at the current time, the three-dimensional coordinates of the target object at the current time are determined.

In the embodiment of the present disclosure, according to the three-dimensional coordinate of the target object at the current time and the three-dimensional coordinate of the target object at least one time before the current time, the three-dimensional motion trajectory of the target object may be obtained. For example, the three-dimensional coordinates of the target object at the current time and at each time before the current time may be connected according to a time sequence, so as to obtain a three-dimensional motion trajectory of the target object.

In a possible implementation manner, after the three-dimensional coordinates of the target object at the current time are determined, a kalman filter algorithm may be adopted to predict the estimated coordinates of the target object at the current time based on the three-dimensional coordinates of the target object at the previous time, and correct the three-dimensional coordinates of the target object at the current time according to the estimated coordinates of the target object at the current time. According to the implementation mode, the three-dimensional coordinates of the target object at the current moment are processed by adopting the Kalman filtering algorithm, so that the three-dimensional motion track of the target object obtained by recovery is smoother. Of course, other filtering algorithms may also be used to smooth the three-dimensional motion trajectory of the target object, or the three-dimensional motion trajectory of the target object may not be smoothed, which is not limited herein.

In the embodiment of the disclosure, by acquiring a three-dimensional coordinate of a target object at a previous time of a current time, and according to the three-dimensional coordinate of the target object at the previous time, determining two-dimensional coordinates corresponding to the target object at the previous time and a plurality of cameras respectively, for any one of the plurality of cameras, determining a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinate of the target object at the previous time corresponding to the camera and a distance between the candidate detection frame in a video frame acquired by the camera at the current time, and determining a three-dimensional coordinate of the target object at the current time according to the candidate detection frame corresponding to the target object at the current time, the three-dimensional coordinate of the target object at the current time is acquired by using the three-dimensional coordinate of the target object at the previous time and combining a plurality of view angles corresponding to the plurality of cameras, thereby recovering to obtain an accurate three-dimensional motion trajectory. The embodiment of the disclosure acquires the three-dimensional coordinates of the target object at the current moment by combining the position information (time sequence information) of the target object at the previous moment and a re-projection method, so that the accurate three-dimensional coordinates and/or three-dimensional motion trajectory can be recovered in a scene with a small number of corner points, and the method can be applied to the three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery of the target object which is small in size and difficult to detect.

In a possible implementation manner, the determining, according to the candidate detection box corresponding to the target object at the current time, the three-dimensional coordinates of the target object at the current time includes: determining candidate three-dimensional coordinates of the target object at the current moment according to the candidate detection frame corresponding to the target object at the current moment; and determining the three-dimensional coordinate of the target object at the current moment according to the candidate three-dimensional coordinate of the target object at the current moment. In this implementation manner, the candidate three-dimensional coordinates of the target object at the current time are recovered according to the position of the candidate detection frame corresponding to the target object at the current time, and the three-dimensional coordinates of the target object at the current time are determined according to the candidate three-dimensional coordinates of the target object at the current time, so that the three-dimensional coordinates of the target object at the current time can be accurately determined, and an accurate three-dimensional motion trajectory can be recovered.

As an example of this implementation manner, the determining, according to the candidate detection frame corresponding to the target object at the current time, the candidate three-dimensional coordinates of the target object at the current time includes: for any one camera in any group of cameras in the plurality of cameras, determining a candidate two-dimensional coordinate of the target object corresponding to the camera at the current moment according to a candidate detection frame, corresponding to the target object at the current moment, of the target object, which is determined by a video frame acquired by the camera at the current moment, wherein the any group of cameras comprises at least two cameras; and determining the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current moment according to the candidate two-dimensional coordinates of the target object corresponding to the cameras in the group of cameras at the current moment and the conversion relation between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras.

In this example, the number of cameras in any one group of cameras is two or more. For example, any two cameras may be grouped into a group of cameras. Any camera can belong to one or more than two groups of cameras simultaneously, e.g. camera ch ₂ Can be connected with a camera ch ₁ Form a first group of cameras, camera ch ₂ Can also be connected with a camera ch ₃ Forming a second group of cameras.

In this example, for any one of the plurality of cameras, the candidate two-dimensional coordinates of the target object corresponding to the camera at the current time may be determined according to the position of the candidate detection frame corresponding to the target object at the current time, which is determined by the video frame acquired by the camera at the current time. For example, if the candidate detection frame corresponding to the target object at the current time, which is determined by the video frame acquired by any camera at the current time, is the candidate detection frame B ', any point on the candidate detection frame B ' or any point inside the candidate detection frame B ' may be used as the candidate two-dimensional coordinates of the target object corresponding to the camera at the current time. For example, the middle point of the bottom side of the candidate detection frame B' may be used as the candidate two-dimensional coordinates of the target object corresponding to the camera at the current time.

In this example, candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current time are determined in combination with the multi-view geometry according to the candidate two-dimensional coordinates of the target object corresponding to the cameras in the group of cameras at the current time and a conversion relationship between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras.

In this example, for any one of the cameras in any group of the cameras, the candidate detection frame corresponding to the target object at the current time is determined according to the video frame acquired by the camera at the current time, the candidate two-dimensional coordinates of the target object corresponding to the camera at the current time are determined, and the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current time are determined according to the candidate two-dimensional coordinates of the target object corresponding to the cameras at the current time and the conversion relationship between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the cameras in the group of cameras, so that the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current time can be more accurately recovered based on multi-view geometry.

As an example of this implementation, the determining, according to the candidate three-dimensional coordinate of the target object at the current time, the three-dimensional coordinate of the target object at the current time includes: and determining the three-dimensional coordinates of the target object at the current moment according to the candidate three-dimensional coordinates, of which the distance between the candidate three-dimensional coordinates of the target object at the current moment and the three-dimensional coordinates of the target object at the previous moment is smaller than a second distance threshold value, in the candidate three-dimensional coordinates of the target object at the current moment.

In this example, for any candidate three-dimensional coordinate of the target object at the current time, if a distance between the candidate three-dimensional coordinate and the three-dimensional coordinate of the target object at the previous time is smaller than a second distance threshold, the candidate three-dimensional coordinate is closer to the position of the target object at the previous time, and the probability that the candidate three-dimensional coordinate can reflect the true three-dimensional coordinate of the target object at the current time is higher; if the distance between the candidate three-dimensional coordinate and the three-dimensional coordinate of the target object at the previous moment is greater than or equal to a second distance threshold, the candidate three-dimensional coordinate is farther from the position of the target object at the previous moment, and the probability that the candidate three-dimensional coordinate can reflect the real three-dimensional coordinate of the target object at the current moment is lower. Therefore, in this implementation, the candidate three-dimensional coordinates of the target object at the current time are filtered through the second distance threshold, so that the accuracy of the determined three-dimensional coordinates of the target object at the current time can be further improved.

In one example, an average value of candidate three-dimensional coordinates of the target object at the current time, which are less than a second distance threshold from the three-dimensional coordinate of the target object at the previous time, may be determined as the three-dimensional coordinate of the target object at the current time, so as to help make the three-dimensional motion trajectory of the restored target object smoother. In another example, the median of the candidate three-dimensional coordinates of the target object at the current time, the distance between the candidate three-dimensional coordinates of the target object and the three-dimensional coordinates of the target object at the previous time is smaller than a second distance threshold, may be determined as the three-dimensional coordinates of the target object at the current time.

In one example, the method further comprises: determining the second distance threshold according to the motion speed of the target object, wherein the second distance threshold is positively correlated to the motion speed of the target object. In this example, the greater the movement speed of the target object, the greater the second distance threshold, whereas the smaller the movement speed of the target object, the smaller the second distance threshold. In this example, by determining the second distance threshold according to the movement speed of the target object, it is possible to flexibly determine an appropriate second distance threshold based on the movement state of the target object, that is, when the movement speed of the target object is small, a small second distance threshold is determined, and when the movement speed of the target object is large, a large second distance threshold is determined, so that when the movement speed of the target object is small, the three-dimensional coordinates of the target object at the current time can be determined based on the candidate three-dimensional coordinates in a small range, and when the movement speed of the target object is large, the three-dimensional coordinates of the target object at the current time can be determined based on the candidate three-dimensional coordinates in a large range, so that the accuracy of the determined three-dimensional coordinates of the target object at the current time can be improved. In this example, the second distance threshold may be dynamically adjusted as the movement speed of the target object changes, or the second distance threshold may not be adjusted after the second distance threshold is determined according to the movement speed of the target object.

Of course, in other examples, the second distance threshold may also be a preset constant to reduce the amount of calculation in the target tracking process.

Fig. 2 is a schematic diagram illustrating a method for recovering three-dimensional coordinates according to an embodiment of the present disclosure. As shown in FIG. 2, the three-dimensional coordinates P of the target object at a time t-1 immediately preceding the current time t _t-1 Reprojection onto N cameras ch ₁ 、ch ₂ 、……、ch _N The two-dimensional coordinates (i.e., pixel coordinates) of the target object corresponding to the N cameras at the last time t-1 are obtained. For any camera in the N cameras, determining a candidate detection frame corresponding to the target object at the current time t according to the two-dimensional coordinates of the target object corresponding to the camera at the last time t-1 and the distance between the target object and the detection frame in the video frame acquired by the camera at the current time t. Suppose that at the current time t, there are N cameras ch among the N cameras ₁ 、ch ₂ 、……、ch _n There is a candidate detection box corresponding to the target object. Camera ch ₁ And a camera ch ₂ Can form a group of cameras, camera ch ₂ And a camera ch ₃ Can form a group of cameras, so on, the camera ch _n-1 And a camera ch _n A set of cameras may be formed. Corresponding to a camera ch at the current time t according to the target object ₁ And a camera ch ₂ And a camera ch ₁ And a camera ch ₂ The conversion relation between the corresponding two-dimensional coordinates and the three-dimensional coordinates can obtain the candidate three-dimensional coordinates p of the target object corresponding to the group of cameras at the current moment t ₁ Corresponding to the camera ch at the current time t according to the target object ₂ And a camera ch ₃ And a camera ch ₂ And a camera ch ₃ The conversion relation between the corresponding two-dimensional coordinates and the three-dimensional coordinates can obtain three candidate objects of the target object corresponding to the group of cameras at the current moment tDimensional coordinate p ₂ By analogy, the target object corresponds to the camera ch at the current time t _n-1 And a camera ch _n And a camera ch _n-1 And a camera ch _n The conversion relation between the corresponding two-dimensional coordinates and the three-dimensional coordinates can obtain the candidate three-dimensional coordinates of the target object corresponding to the group of cameras at the current moment t

According to the candidate three-dimensional coordinates p of the target object at the current moment t _i Three-dimensional coordinate P of the target object at last time t-1 _t-1 Is less than a second distance threshold L _th I.e. satisfies | pi-P _t-1 |<L _th P of (a) _i Determining the three-dimensional coordinates P of the target object at the current time t _t ，

Where m represents the three-dimensional coordinate P with the target object at the previous time t-1 _t-1 Is less than a second distance threshold L _th Candidate three-dimensional coordinates p of _i The number of the cells.

In a possible implementation manner, before step S11, the method further includes: acquiring three-dimensional coordinates of a marker; for any one of the cameras, determining internal and external parameters of the camera according to the two-dimensional coordinates of the marker and the three-dimensional coordinates of the marker, which are obtained by the camera; and determining a conversion relation between the two-dimensional coordinates and the three-dimensional coordinates corresponding to the camera and a conversion relation between the three-dimensional coordinates and the two-dimensional coordinates corresponding to the camera according to the internal and external parameters of the camera.

In this implementation, when calibrating the plurality of cameras, markers may be placed at a certain density in a field where three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery is required, for example, N markers may be placed in a field where three-dimensional coordinate recovery and/or three-dimensional motion trajectory recovery is required. Wherein the three-dimensional seating of any of the markersThe index may be represented as (X) _w ,Y _w ,Z _w ) For any of the plurality of cameras, the two-dimensional coordinates (e.g., pixel coordinates) of the marker in a second coordinate system (e.g., pixel coordinate system) of the camera may be represented as (u, v). The distortion coefficient can be obtained by adopting a checkerboard calibration method, as shown in formula 1:

wherein f is _x 、f _y 、c _x And c _y Forming an internal reference matrix of the camera, wherein c _x 、c _y Coordinates representing the principal point of the image of the camera, f _x And f _y Respectively representing the products of focal length and pixel aspect ratio; k _3×3 Denotes an internal reference, R, of the camera _3×3 And T _3×1 Showing the external parameters of the camera.

Simplifying equation 1 can lead to equation 2:

let us let P = [ X ] _w ,Y _w ,Z _w ,1] ^T ，t ₁ ＝[t ₁ ,t ₂ ,t ₃ ,t ₄ ] ^T ，t ₂ ＝[t ₅ ,t ₆ ,t ₇ ,t ₈ ] ^T ，t ₃ ＝[t ₉ ,t ₁₀ ,t ₁₁ ,t ₁₂ ] ^T Then equation 2 may be converted to equation 3:

according to the three-dimensional coordinates and the two-dimensional coordinates (such as pixel coordinates) obtained through calibration, internal and external parameters of the camera can be obtained through matrix singular value decomposition. According to the implementation mode, the internal and external parameters of the camera with higher precision can be obtained.

In a possible implementation manner, after the three-dimensional coordinates of the target object at the current moment are determined, a real-time special effect can be added to meet the requirements of entertainment and the like.

According to the three-dimensional coordinate recovery method provided by the embodiment of the disclosure, laser radar equipment or an RGB-D camera is not required, and camera motion is not required, so that the requirement on hardware can be reduced, and the cost of three-dimensional coordinate recovery and/or three-dimensional motion track recovery is reduced.

It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted.

It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.

In addition, the present disclosure also provides a device, an electronic device, a computer-readable storage medium, and a program for recovering three-dimensional coordinates, which can be used to implement any one of the methods for recovering three-dimensional coordinates provided in the present disclosure, and the corresponding technical solutions and descriptions thereof and the corresponding descriptions thereof in the methods section are omitted for brevity.

Fig. 3 shows a block diagram of a three-dimensional coordinate restoration apparatus provided by an embodiment of the present disclosure. As shown in fig. 3, the three-dimensional coordinate restoration apparatus includes:

the acquiring module 31 is configured to acquire a three-dimensional coordinate of the target object at a previous time of the current time;

a first determining module 32, configured to determine, according to the three-dimensional coordinate of the target object at the previous time, two-dimensional coordinates of the target object at the previous time, where the two-dimensional coordinates correspond to the multiple cameras respectively;

a second determining module 33, configured to determine, for any one of the multiple cameras, a candidate detection frame corresponding to the target object at the current time according to the two-dimensional coordinate of the target object at the previous time corresponding to the camera and a distance between the detection frames in the video frame acquired by the camera at the current time;

a third determining module 34, configured to determine, according to the candidate detection frame corresponding to the target object at the current time, a three-dimensional coordinate of the target object at the current time.

In one possible implementation, the first determining module 32 is configured to:

In one possible implementation, the apparatus further includes:

In a possible implementation manner, the third determining module 34 is configured to:

and determining the three-dimensional coordinate of the target object at the current moment according to the candidate three-dimensional coordinate of the target object at the current moment.

In one possible implementation, the apparatus further includes:

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable codes, and when the computer readable codes are run on a device, a processor in the device executes instructions for implementing the method for recovering three-dimensional coordinates provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the method for restoring three-dimensional coordinates provided in any of the above embodiments.

An embodiment of the present disclosure further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 4 shows a block diagram of an electronic device 800 provided by an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 4, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G/LTE, 5G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 5, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows, stored in memory 1932

Mac OS

Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as a punch card or an in-groove protruding structure with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for recovering three-dimensional coordinates, comprising:

determining two-dimensional coordinates of the target object corresponding to the multiple cameras at the previous moment according to the three-dimensional coordinates of the target object at the previous moment;

for any camera in the multiple cameras, determining a detection frame which is closest to the two-dimensional coordinate of the target object corresponding to the camera at the previous moment in a video frame acquired by the camera at the current moment, and determining the detection frame closest to the distance as a candidate detection frame corresponding to the target object at the current moment in response to the distance between the detection frame closest to the distance and the two-dimensional coordinate of the target object corresponding to the camera at the previous moment being less than or equal to a first distance threshold;

2. The method according to claim 1, wherein the determining two-dimensional coordinates of the target object corresponding to a plurality of cameras respectively at the previous time according to the three-dimensional coordinates of the target object at the previous time comprises:

and for any camera in the plurality of cameras, determining the two-dimensional coordinate of the target object corresponding to the camera at the previous moment according to the three-dimensional coordinate of the target object at the previous moment and the conversion relation between the three-dimensional coordinate and the two-dimensional coordinate corresponding to the camera.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

determining the first distance threshold according to the motion speed of the target object, wherein the first distance threshold is positively correlated with the motion speed of the target object.

4. The method according to any one of claims 1 to 3, wherein the determining the three-dimensional coordinates of the target object at the current time according to the candidate detection box corresponding to the target object at the current time comprises:

5. The method according to claim 4, wherein the determining the candidate three-dimensional coordinates of the target object at the current time according to the candidate detection frame corresponding to the target object at the current time comprises:

6. The method according to claim 4 or 5, wherein the determining the three-dimensional coordinates of the target object at the current time according to the candidate three-dimensional coordinates of the target object at the current time comprises:

7. The method of claim 6, further comprising:

8. The method according to any one of claims 1 to 7, wherein before the acquiring the three-dimensional coordinates of the target object at a time immediately preceding the current time, the method further comprises:

9. An apparatus for restoring three-dimensional coordinates, comprising:

the acquisition module is used for acquiring the three-dimensional coordinates of the target object at the previous moment of the current moment;

the first determining module is used for determining two-dimensional coordinates of the target object at the previous moment and corresponding to the multiple cameras respectively according to the three-dimensional coordinates of the target object at the previous moment;

a second determining module, configured to determine, for any one of the multiple cameras, a detection frame that is closest to a two-dimensional coordinate of the target object at the previous time and corresponds to the camera in a video frame acquired by the camera at the current time, and determine, in response to a distance between the detection frame that is closest to the two-dimensional coordinate of the target object at the previous time and corresponds to the camera being less than or equal to a first distance threshold, the detection frame that is closest to the distance as a candidate detection frame of the target object at the current time;

10. An electronic device, comprising:

one or more processors;

a memory for storing executable instructions;

wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any one of claims 1 to 8.

11. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 8.