CN114679576A

CN114679576A - Method and apparatus for processing video data

Info

Publication number: CN114679576A
Application number: CN202210302904.4A
Authority: CN
Inventors: 许多; 谢榛; 高玉涛
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-06-28

Abstract

A method and apparatus for processing video data, the method comprising: acquiring video stream data output by a camera and IMU (inertial measurement Unit) measurement data output by an IMU (inertial measurement Unit), wherein the IMU measurement data comprises IMU triaxial angle data; under the condition that the camera deflects, acquiring the pixel position of a target object according to a target frame image in video stream data; acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data; and acquiring the positioning information of the target object in a world coordinate system according to the pixel position of the target object and the projection matrix H'. The method corrects the target positioning projection matrix based on IMU three-axis angle data, and only corrects the positioning projection of the pixel coordinate of the concerned target object, but not corrects the whole image, thereby saving the calculation resource while realizing high-precision positioning.

Description

Method and apparatus for processing video data

Technical Field

The present application relates to the field of information technology, and in particular, to a method and apparatus for processing video data.

Background

In an intelligent public transportation scene, such as an expressway, an urban road and a maritime port, a visual intelligent algorithm is often deployed by using an observation camera to acquire traffic elements and traffic events. The target positioning algorithm identifies a target in an observation picture through a computer vision technology, and constructs space mapping between a camera and a world coordinate system so as to determine the position of the target. However, due to non-ideal environmental factors such as vehicle running resonance, ground settlement, strong wind, and the like, the images of the observation devices erected in the public transportation scene often have problems such as jitter, deflection, and the like. The imaging quality is poor when the picture shakes, and the target object is difficult to detect. When the image deflects, the previous space mapping relationship also changes, and if the original projection relationship is still used, the target positioning will have great deviation. Therefore, the method has great significance in detecting the shaking deflection and positioning and correcting the target of the camera.

However, most of the current image processing schemes are based on image information directly, camera shake and deflection detection is performed by using a computer vision method, the consumption of computing resources is very high, the operation frame rate is low, the image quality is very dependent, and the current image processing schemes cannot be applied to real traffic scenes with obvious light change and rich moving objects.

Therefore, there is a need for a solution for processing and analyzing target locations in video data to save computational resources.

Disclosure of Invention

The application provides a method and a device for processing video data, which are used for solving the problem of computing resource waste in video data analysis.

In a first aspect, a method for processing video data is provided, including: acquiring video stream data output by a camera and IMU (inertial measurement unit) measurement data output by an IMU (inertial measurement unit), wherein the IMU and the camera are at the same or similar positions, and the IMU measurement data comprise IMU three-axis angle data; under the condition that a camera deflects, acquiring the pixel position of a target object according to a target frame image in the video stream data, wherein the target frame image is any frame image in the video stream data; acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data; and acquiring the positioning information of the target object in a world coordinate system according to the pixel position of the target object and the projection matrix H'.

In a second aspect, the present application provides an apparatus for processing video data, comprising: the communication module is used for acquiring video stream data output by a camera and IMU (inertial measurement unit) measurement data output by an IMU (inertial measurement unit), wherein the IMU and the camera are at the same or similar positions, and the IMU measurement data comprises IMU triaxial angle data; the processing module is used for acquiring the pixel position of a target object according to a target frame image in the video stream data under the condition that a camera deflects, wherein the target frame image is any frame image in the video stream data; the processing module is further used for acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data; the processing module is further configured to obtain positioning information of the target object in a world coordinate system according to the pixel position of the target object and the projection matrix H'.

In a third aspect, a computer device is provided, comprising a processor for invoking a computer program from a memory, the processor being adapted to perform the method of the first aspect when the computer program is executed.

In a fourth aspect, there is provided a computer-readable storage medium for storing a computer program comprising code for performing the method of the first aspect described above.

In a fifth aspect, there is provided a computer program product comprising a computer program comprising code for performing the method of the first aspect.

The embodiment of the application provides a method and a device for processing video data, a target positioning projection matrix is corrected based on IMU three-axis angle data, and only the pixel coordinate of a concerned target object is corrected in positioning projection instead of the whole image, so that high-precision target positioning correction of real-time video streaming can be realized under very low computing resource consumption, and computing resources are saved while high-precision positioning is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an application scenario according to another embodiment of the present application;

FIG. 3 is a schematic diagram of the internal structure of the computing device 100 according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a method for processing video data according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an apparatus 500 according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus 600 according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

For ease of understanding, the terms referred to in this application are first explained.

World coordinate system: refers to the absolute coordinate System of the System, and the World coordinate System may be a geographic coordinate System, such as World Geodetic System (WGS 84), or any fixedly defined World coordinate System.

IMU coordinate system: the IMU coordinate system takes the center of the IMU as an origin, the X axis points to the left and right directions of the IMU, the Y axis points to the front and back directions of the IMU, and the Z axis points to the up and down directions of the IMU.

Camera coordinate system: the camera coordinate system is a relative coordinate system in which the optical center of the camera is used as an origin, the X axis points to the left and right directions of the camera, the Y axis points to the up and down directions of the camera, and the Z axis points to the direction observed by the camera, and changes with the movement of the camera.

Pixel coordinate system: the image pixel coordinate system is a plane rectangular coordinate system which is fixed on the image and takes pixels as units, the origin of the rectangular coordinate system is positioned at the upper left corner of the image, the X axis and the Y axis are parallel to the X axis and the Y axis of the camera coordinate system, and the rectangular coordinate system is a relative coordinate system depending on the camera coordinate system.

Inertial Measurement Unit (IMU): a sensor for measuring acceleration and rotational movement. The IMU may typically measure acceleration and angular velocity along three axes (X, Y, Z in the IMU coordinate system).

Homography matrix: in computer vision, if two-dimensional planes have a projection mapping relationship, a point on one of the two-dimensional planes can be projected to the other two-dimensional plane through a homography matrix.

Target positioning: aiming at one image, a computer vision algorithm identifies an object target in the image and obtains position information of the object target, a mapping relation between a camera and a world coordinate system is established, and pixel coordinates are projected to positions (such as longitude and latitude) of the world coordinate system.

Hand eye calibration (hand eye calibration): the calibration method is a calibration method for coordinate system conversion in the robot field, and can also be widely used for calibration of conversion between coordinate systems. The hand is typically a robotic arm and the eye is often a camera mounted on the robotic arm or a camera fixed in the environment. In practical applications, it is usually necessary to convert the pose of an object in the external environment observed by the camera from the camera coordinate system into the coordinate system of the robotic arm. In order to obtain a transformation matrix between the two coordinate systems, the robot needs to be calibrated by hands and eyes.

Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present application. As shown in fig. 1, in a public transportation video system, an observation camera, an exchange system, and a supervision center are generally included. The observation cameras are widely arranged at various traffic intersections, ports and other positions to acquire images or videos of vehicles or ships in public traffic scenes, and the acquired images or videos are transmitted to the supervision center through the switch system. A display device, a storage device, and a computing device are typically included in a regulatory center. The storage device may be configured to store the acquired image or video information, the display device may be configured to display the observation image, and the computing device may parse and process the acquired image or video. In some examples, the computing device may include a server or a cloud server.

It should be understood that the description of the application scenario in fig. 1 is only by way of example and not limitation, and in practice, appropriate modifications and additions may be made to the above scenario and still apply to the solution of the embodiments of the present application.

In order to solve the above technical problems in the prior art, the embodiments of the present application provide a scheme for processing video data, and the scheme provides that three-axis acceleration data, three-axis angular velocity data, and three-axis angle data acquired by an IMU are used to detect camera shake and deflection, and meanwhile, a projection matrix update model after camera deflection is established based on an attitude angle to perform high-precision positioning correction on a target object identified in a camera video stream, so that high-precision target positioning correction on video data can be realized under the condition of consuming relatively low computing resources.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a schematic structural diagram of an application scenario according to another embodiment of the present application. As shown in fig. 2, in the embodiment of the present application, the IMU 220 may be disposed at the position of the camera 210, and the positions of the IMU 220 and the camera 210 may be fixed to each other. Since the IMU 220 is in the same or similar location as the camera 210, IMU measurement data output by the IMU 220 may be considered IMU measurement data of the camera 210. Alternatively, the IMU 220 may be disposed outside the camera 210, may be disposed inside the camera 210, or may be integrated with a camera module within the camera 210, as long as the position between the camera 210 and the IMU 220 is the same or similar.

For ease of understanding, in the embodiment of the present application, the combination of the camera 210 and the IMU 220 is referred to as the camera system 200, the camera system 200 may send IMU measurement data and video stream data to the computing device 100, and the computing device 100 may parse according to the IMU measurement data and the video stream data to obtain a parsing result, such as target location information.

Alternatively, the computing device 100 may comprise a server, a processor, or any other device with data processing capabilities. In some examples, the computing device 100 described above may include a processor disposed inside the camera 210.

Fig. 3 is a schematic structural diagram of the inside of the computing device 100 according to an embodiment of the present application. As shown in fig. 3, computing device 100 includes a camera shake detection module 110, a camera deflection detection module 120, a timestamp alignment module 130, a spatial coordinate system transformation module 140, a projection matrix rectification module 150, a video recognition module 160, and a target location module 170. Input information to the computing device 100 may include IMU measurement data output by the camera system 200 as well as video stream data. The output information of computing device 100 may include camera shake alert information and target location information.

Wherein, the IMU measurement data includes: the IMU acquires three-axis acceleration data, three-axis angular velocity data and three-axis angle data. For the sake of brevity, the above data may be referred to as acceleration data, angular velocity data, and angle data, respectively. The angle data may also be referred to as attitude angle data.

The method mainly comprises the steps of detecting two abnormal states of camera shaking and camera deflection by mainly utilizing collected triaxial acceleration data, triaxial angular velocity data and triaxial angle data, converting the collected triaxial angle data into camera angle change through space-time alignment, correcting a projection matrix required by target positioning, and performing high-precision positioning on a target object identified in a camera video stream.

The functions of the modules in the computing device 100 will be described in detail below in conjunction with fig. 3.

A1. Camera shake detection module 110

The camera shake detection module 110 may determine whether the camera shakes using the three-axis acceleration data and the three-axis angular velocity data acquired by the IMU. For example, it may be determined whether the triaxial acceleration or the triaxial angular velocity exceeds a preset shake threshold, and if the triaxial acceleration or the triaxial angular velocity exceeds the shake threshold, it is determined that the camera shakes, and camera shake warning information is output. Wherein the jitter threshold may be determined according to practice. Alternatively, different dither thresholds may be set for the triaxial acceleration and the triaxial angular velocity, respectively.

As an example, the specific process of determining whether the camera is shaken includes: and acquiring a plurality of triaxial accelerations and a plurality of triaxial angular velocities stored in a data queue in the latest time period, respectively calculating the average values of the absolute values of the triaxial accelerations and the triaxial angular velocities, and outputting camera shake alarm information if any one of the average values is greater than the corresponding shake threshold. The length of the time period and the jitter threshold value can be obtained by adjusting according to scene requirements and specific experiments. As an example, the length of the above time period may be 100 milliseconds (ms).

A2. Camera deflection detection module 120

The camera deflection detection module 120 may determine whether the camera has a deflection by using the three-axis angle data collected by the IMU. For example, it may be determined whether the three-axis angle exceeds a preset deflection threshold, and if the three-axis angle exceeds the preset deflection threshold, it is determined that the camera is deflected. Wherein the deflection threshold value may be determined according to practice.

As an example, the specific process of determining whether the camera is deflected includes: recording IMU triaxial angle data (roll) of camera under non-deflection condition after system power-on₀,pitch₀,yaw₀) As a reference value, each frame IM subsequently acquired is then addedAnd respectively calculating the difference between the U triaxial angle data and the reference triaxial angle data, and judging whether the absolute value of the U triaxial angle data is greater than a deflection threshold value. If any one of the three angles is larger than the deflection threshold value, the camera is judged to have deflected, and the subsequent modules are continuously executed. The deflection threshold value can be obtained by adjusting according to scene requirements and specific experiments. Optionally, the deflection threshold corresponding to each of the three-axis angles may be set to be the same or different.

A3. Timestamp alignment module 130

The timestamp alignment module 130 is used to timestamp align the video stream output by the camera and the measurement data output by the IMU. In order to guarantee the accuracy effect of the camera target positioning projection matrix correction, the IMU data and the camera image which are used need to be located at the same time point as much as possible, so that the IMU data and the camera video are required to use the same clock source to record the time stamp.

As an example, considering that there is a certain delay in the camera video stream encoding and decoding process, the data queue may be used to store the IMU data in the preset time period before the current time, and after a frame of image in the video data arrives, two frames of IMU measurement data that are closest before and after the frame of image are selected according to the image timestamp as IMU measurement data aligned with the frame of image. For example, the angle of the IMU corresponding to the frame image may be estimated using an interpolation algorithm based on the euler angle, or one frame of IMU measurement data may be arbitrarily selected from the two previous and next frames of IMU measurement data, and used as IMU measurement data corresponding to the frame image.

Alternatively, in some examples, if the IMU and the camera module within the camera are integrated, timestamp alignment may be implemented within the camera system 200, and the computing device 100 need not perform the timestamp alignment operation.

A4. Spatial coordinate system transformation module 140

The spatial coordinate system transformation module 140 is configured to transform the IMU three-axis angle data from the IMU coordinate system to the camera coordinate system to obtain the real pose of the camera. Alternatively, the spatial coordinate system transformation module 140 may obtain a rotation matrix R by calculation according to the angle data measured by the IMU, where the rotation matrix R is used to convert the IMU three-axis angle data from the IMU coordinate system to the camera coordinate system, and the calculation manner is as follows.

First, assuming that the currently processed image frame corresponds to time t, the IMU angle at time t is represented as (roll, pitch, yaw), and a rotation matrix R corresponding to the current image frame can be obtained according to formula (1)_cam。

Wherein, the IMU angle is generally expressed by Euler angle in ZYX rotation sequence, and F (x, y, z) represents IMU angle and R_IMUAs a function of the transformation relationship of (c). Rotation matrix R_camCan be understood as a rotation matrix of the camera coordinate system at time t relative to the camera coordinate system in the initial state.

Wherein the content of the first and second substances,

an external reference matrix for the camera coordinate system to the IMU coordinate system. In some examples, the camera system may be installed such that the IMU coordinate system is parallel to the three-axis coordinate system of the camera coordinate system, which may result in the extrinsic reference matrix through direct settlement

Alternatively, in still other examples, the external reference matrix may be obtained by a hand-eye external reference calibration method commonly used by robots

Alternatively, F (x, y, z) may be obtained by formula (2):

similarly, assume that the angle of the IMU in the undeflected state is represented as (roll)₀,pitch₀,yaw₀) Then the camera is rotated in an undeflected stateMatrix R_cam0According to formula (3):

wherein the rotation matrix R_cam0Can be understood as a rotation matrix of the camera coordinate system in the undeflected state relative to the camera coordinate system in the initial state. According to a rotation matrix R_camAnd a rotation matrix R_cam0A rotation matrix R can be obtained, which conforms to equation (4):

the rotation matrix R can be understood as a rotation matrix of the camera coordinate system at time t relative to the camera coordinate system in an undeflected state.

A5. Projection matrix rectification module 150

The projection matrix rectification module 150 is used to acquire a projection matrix H' between the pixel coordinate system and the world coordinate system of the camera under the condition of deflection.

First, an initial homography H, i.e., a homography between the target pixel location of the camera without deflection and the world coordinate system, may be acquired. Objects of interest to a vision algorithm, such as vehicles, ships, etc., are typically in the same plane in the camera image, so the initial homography matrix H can be used to establish the pixel position p of the target object without the camera deflecting₁The projection relation of the position X in the world coordinate system is shown as the formula (5):

X＝H*p₁ (5)

as an example, the initial homography matrix H may be obtained by calibration of the camera image with the world coordinate plane. For example, according to the detected marker coordinates in the video image in the initial state and the marker positions in the world coordinate system picture, the initial homography matrix H is solved by using a random sampling consistency algorithm.

When the camera deflects, the translation amplitude of the camera is far smaller than the rotation, and the camera can be approximated to pure rotation motion, and if an internal reference matrix converted from a camera coordinate system to a pixel coordinate system is K, the pixel coordinate p of the camera image before and after deflection at the moment₁And p₂The relationship therebetween may satisfy formula (6):

p₁＝KRK^-1p₂ (6)

wherein R is a rotation matrix R obtained according to formula (4).

Thus, after camera deflection, the projection matrix H' can be determined according to equation (7):

H'＝HKRK^-1 (7)

A6. video identification module 160

The video recognition module 160 is used for intelligent analysis of the camera video stream to identify target objects, such as observed objects in public scenes, such as vehicles and ships. The input to the video recognition module 160 is each frame of image output by the camera, which may include an Identification (ID) of the target object, a category, and its pixel location on the image. For example, for a deflected camera, the pixel position of the target object is p₂。

As an example, the video recognition module 160 mainly performs target detection and tracking on various types of target objects of interest based on a deep learning model, and acquires an ID, a category, and a position thereof on an image of each target object. Optionally, the deep learning model includes, but is not limited to: convolutional Neural Networks (CNNs), multi-scale networks, and the like.

A7. Object localization module 170

The target positioning module 170 is configured to determine and output the coordinates of the target object output by the video recognition module 160 and the projection matrix H' output by the projection matrix rectification module 150, that is, the target positioning module 170 is configured to construct a position relationship between the camera and the world coordinate system. The target positioning module 170 may convert the target position coordinates output by the video recognition module 160 into coordinates in a world coordinate system according to the projection matrix H' output by the projection matrix correction module, so as to implement high-precision positioning of the target object.

Wherein the pixel position p of the target object₂The projection relation with the position X under the world coordinate system is shown in the formula (8):

X＝H'*p₂ (8)

in the embodiment of the application, camera shake deflection detection is carried out based on IMU angular velocity, acceleration and angle data, high-frequency camera abnormal state detection can be achieved under the condition of low computing resource consumption, and the detection effect is more stable and higher in precision compared with a scheme based on image feature points.

The embodiment of the application corrects the target positioning projection matrix based on IMU three-axis angle data, and only corrects the pixel coordinate of the concerned target object by positioning projection instead of correcting the whole image, so that high-precision target positioning correction of real-time video streaming can be realized under very low computing resource consumption, and computing resources are saved while high-precision positioning is realized.

In addition, in the embodiment of the application, a time synchronization and space coordinate system transformation scheme for IMU and camera measurement values is provided in the conversion process of the IMU coordinate system and the camera coordinate system, so that the accuracy of target positioning correction is ensured, and the accuracy of target positioning is improved.

Fig. 4 is a flowchart illustrating a method for processing video data according to an embodiment of the present application. The method of fig. 4 may be performed by the computing devices of fig. 1-3. As shown in fig. 4, the method includes the following.

S401, video stream data output by the camera and IMU measurement data output by the IMU are obtained, wherein the IMU and the camera are located at the same or similar positions, and the IMU measurement data comprise IMU three-axis angle data.

Wherein the IMU may be provided at a position of the camera and the position of the IMU and the camera may be fixed to each other. Since the IMU is in the same or similar position as the camera, IMU measurement data output by the IMU may be considered IMU measurement data for the camera. Alternatively, the IMU may be located outside the camera, inside the camera, or integrated with a camera module within the camera, so long as the camera and IMU are in the same or similar positions.

In some examples, the IMU measurement data further includes: IMU triaxial acceleration data, IMU triaxial angular velocity data, the method also includes: judging whether the camera shakes or not according to the triaxial acceleration data and the triaxial angular velocity data; and under the condition that the camera shakes, outputting camera shake alarm information.

For example, it may be determined whether the triaxial acceleration or the triaxial angular velocity exceeds a preset shake threshold, and if the triaxial acceleration or the triaxial angular velocity exceeds the shake threshold, it is determined that the camera shakes, and camera shake warning information is output. Wherein the jitter threshold may be determined according to practice. Alternatively, different dither thresholds may be set for the triaxial acceleration and the triaxial angular velocity, respectively.

For the sake of brevity, the above data may be referred to as acceleration data, angular velocity data, and angle data, respectively. The angle data may also be referred to as pose angle data.

S402, under the condition that the camera deflects, acquiring the pixel position of a target object according to a target frame image in the video stream data, wherein the target frame image is any frame image in the video stream data.

Optionally, the triaxial angle data collected by the IMU may be used to determine whether there is a yaw in the camera. For example, it may be determined whether the three-axis angle exceeds a preset deflection threshold, and if the three-axis angle exceeds the preset deflection threshold, it is determined that the camera is deflected. Wherein the deflection threshold value may be determined according to practice.

As an example, the specific process of determining whether the camera is deflected includes: recording IMU triaxial angle data (roll) of camera under non-deflection condition after system is powered on₀,pitch₀,yaw₀) And then, respectively subtracting the subsequently acquired three-axis angle data of each frame of IMU from the reference three-axis angle data to judge whether the absolute value of the difference is greater than a deflection threshold value. If any one of the three angles is larger than the deflection threshold value, the camera is judged to be deflected, and the subsequent modules are continuously executed. The deflection threshold value can be obtained by adjusting according to scene requirements and specific experiments. Optionally, the deflection threshold corresponding to each of the three-axis angles may be set to be the same or different.

And S403, acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data.

For a specific principle of obtaining the projection matrix H', reference may be made to the description of the spatial coordinate system transformation module 140 in fig. 3, and details are not repeated here for brevity.

Optionally, in S403, acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera in a case of deflection according to the IMU measurement data and the video stream data, includes: acquiring a rotation matrix R according to the IMU three-axis angle data, wherein the rotation matrix R is a rotation matrix converted from an IMU coordinate system to a camera coordinate system; obtaining a projection matrix H 'according to the rotation matrix R, wherein the projection matrix H' meets the following conditions:

H'＝HKRK^-1； (7)

where H is a homography matrix between the pixel coordinate system and the world coordinate system in the case where the camera is not deflected, and K denotes an internal reference matrix converted from the camera coordinate system to the pixel coordinate system.

Optionally, in S403, obtaining a rotation matrix R according to the IMU triaxial angle data, including: obtaining a rotation matrix R corresponding to the target frame image according to the IMU three-axis angle data_cam，R_camThe following conditions are met:

obtaining a rotation matrix R of the camera in an undeflected state according to the IMU triaxial angle data_cam0，R_cam0The following conditions are met:

according to a rotation matrix R_camAnd a rotation matrix R_cam0And acquiring a rotation matrix R, wherein the rotation matrix R meets the following conditions:

R＝R_cam0 ^-1*R_cam； (4)

wherein (roll, pitch, yaw) represents the corresponding IMU angle of the target frame image, (roll)₀,pitch₀,yaw₀) Representing the IMU angle corresponding to the camera in an undeflected state, and F (x, y, z) is an indicator of the IMU angle and R_IMUAs a function of the transfer relationship of (a).

S404, acquiring the positioning information of the target object in the world coordinate system according to the pixel position of the target object and the projection matrix H'.

Optionally, in S404, obtaining the positioning information of the target object in the world coordinate system according to the pixel position of the target object and the projection matrix H', includes:

acquiring the positioning information of the target object in a world coordinate system according to the following formula:

X＝H'*p₂， (8)

wherein X represents the coordinates of the target object in the world coordinate system, p₂Representing the coordinates of the target object in the pixel coordinate system.

Optionally, the method of fig. 4 further comprises: determining IMU measurement data aligned with a timestamp of the target frame image; and determining IMU three-axis angle data corresponding to the target frame image according to the aligned IMU measurement data.

In order to guarantee the accuracy effect of the camera target positioning projection matrix correction, the IMU data and the camera image which are used need to be located at the same time point as much as possible, so that the IMU data and the camera video are required to use the same clock source to record the time stamp. The accuracy of target positioning correction is guaranteed, and the accuracy of target positioning is improved.

As an example, considering that there is a certain delay in the camera video stream encoding and decoding process, the data queue may be used to store IMU data in a preset time period before the current time, and when a frame of image in the video data arrives, two frames of IMU measurement data that are closest to each other before and after the frame of image arrives may be selected according to the image timestamp as IMU measurement data aligned with the frame of image. For example, the angle of the IMU corresponding to the frame image may be estimated using an interpolation algorithm based on the euler angle, or one frame of IMU measurement data may be arbitrarily selected from the two previous and next frames of IMU measurement data, and used as IMU measurement data corresponding to the frame image.

Alternatively, in practice, the coordinates of the world coordinate system may be a geographical coordinate system, for example, the coordinates in WGS84, or any fixedly defined world coordinate system.

In the embodiment of the application, in the process of converting the IMU coordinate system and the camera coordinate system, a scheme of time synchronization and space coordinate system conversion of the IMU and the camera measurement value is provided, so that the accuracy of target positioning correction is ensured, and the accuracy of target positioning is improved.

Fig. 5 is a schematic structural diagram of an apparatus 500 according to an embodiment of the present application. The apparatus 500 is configured to perform the method described above as being performed by the computing device 100.

The apparatus 500 includes a communication module 510 and a processing module 520. The apparatus 500 is used to implement the operations performed by the computing device 100 in the various method embodiments above.

For example, the communication module 510 is configured to obtain video stream data output by a camera and IMU measurement data output by an IMU, where the IMU and the camera are in the same or similar position, and the IMU measurement data includes IMU triaxial angle data; the processing module 520 is configured to, when the camera deflects, obtain a pixel position of a target object according to a target frame image in the video stream data, where the target frame image is any frame image in the video stream data; the processing module 520 is further configured to obtain a projection matrix H' between the pixel coordinate system and the world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data; the processing module 520 is further configured to obtain the positioning information of the target object in the world coordinate system according to the pixel position of the target object and the projection matrix H'.

Fig. 6 is a schematic structural diagram of an apparatus 600 according to an embodiment of the present application. The apparatus 600 is configured to perform the method described above as being performed by the computing device 100.

The apparatus 600 includes a processor 610, the processor 610 is configured to execute the computer program or instructions stored in the memory 620 or read data stored in the memory 620 to perform the methods in the above method embodiments. Optionally, the processor 610 is one or more.

Optionally, as shown in fig. 6, the apparatus 600 further comprises a memory 620, the memory 620 being used for storing computer programs or instructions and/or data. The memory 620 may be integrated with the processor 610 or may be provided separately. Optionally, the memory 620 is one or more.

Optionally, as shown in fig. 6, the apparatus 600 further comprises a communication interface 630, and the communication interface 630 is used for receiving and/or transmitting signals. For example, processor 610 may be used to control the receipt and/or transmission of signals by communications interface 630.

Optionally, the apparatus 600 is configured to implement the operations performed by the computing device 100 in the above method embodiments.

For example, processor 610 is operative to execute computer programs or instructions stored by memory 620 to perform operations associated with the modules in computing device 100 of the various method embodiments described above.

It should be noted that the apparatus 600 in fig. 6 may be the computing device 100 in the foregoing embodiment, or may be a component (e.g., a chip) of the computing device 100, and is not limited herein.

In the embodiment of the present application, the processor is a circuit having a signal processing capability, and in one implementation, the processor may be a circuit having an instruction reading and executing capability, such as a CPU, a microprocessor, a GPU (which may be understood as a kind of microprocessor), or a DSP; in another implementation, the processor may implement certain functions through the logic relationship of hardware circuits, which are fixed or reconfigurable, for example, the processor is a hardware circuit implemented by an ASIC or PLD, such as an FPGA. In the reconfigurable hardware circuit, the process of loading the configuration document by the processor to implement the configuration of the hardware circuit may be understood as a process of loading instructions by the processor to implement the functions of some or all of the above units. Furthermore, it may also be a hardware circuit designed for artificial intelligence, which may be understood as an ASIC, such as an NPU, TPU, DPU, etc.

It is seen that the units in the above apparatus may be one or more processors (or processing circuits) configured to implement the above method, for example: CPU, GPU, NPU, TPU, DPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.

In addition, all or part of the units in the above apparatus may be integrated together, or may be implemented independently. In one implementation, these units are integrated together, implemented in the form of a system-on-a-chip (SOC). The SOC may include at least one processor for implementing any one of the above methods or implementing functions of each unit of the apparatus, and the at least one processor may be of different types, for example, including a CPU and an FPGA, a CPU and an artificial intelligence processor, a CPU and a GPU, and so on.

Accordingly, embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps in the method performed by the computing device 100 in fig. 2 to 4.

Accordingly, embodiments of the present application also provide a computer program product, which includes computer programs/instructions, when executed by a processor, cause the processor to implement the steps in the methods performed by the computing device 100 in fig. 2 to 4.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for processing video data, comprising:

acquiring video stream data output by a camera and IMU (inertial measurement Unit) measurement data output by an IMU (inertial measurement Unit), wherein the IMU and the camera are at the same or similar positions, and the IMU measurement data comprise IMU three-axis angle data;

under the condition that a camera deflects, acquiring the pixel position of a target object according to a target frame image in the video stream data, wherein the target frame image is any frame image in the video stream data;

acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data;

and acquiring the positioning information of the target object in a world coordinate system according to the pixel position of the target object and the projection matrix H'.

2. The method of claim 1, wherein said obtaining a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under yaw from the IMU measurement data and video stream data comprises:

acquiring a rotation matrix R according to the IMU three-axis angle data, wherein the rotation matrix R is a rotation matrix converted from an IMU coordinate system to a camera coordinate system;

obtaining the projection matrix H 'according to the rotation matrix R, wherein the projection matrix H' meets the following conditions:

H'＝HKRK^-1；

where H is a homography between the pixel coordinate system and the world coordinate system of the camera without deflection, and K represents an internal reference matrix converted from the camera coordinate system to the pixel coordinate system.

3. The method of claim 1 or 2, wherein the method further comprises:

determining IMU measurement data aligned with a timestamp of the target frame image;

and determining IMU three-axis angle data corresponding to the target frame image according to the aligned IMU measurement data.

4. The method of claim 1 or 2, wherein the IMU measurement data further comprises: IMU triaxial acceleration data, IMU triaxial angular velocity data, the method also includes:

judging whether the camera shakes or not according to the triaxial acceleration data and the triaxial angular velocity data;

and outputting camera shake alarm information under the condition that the camera shakes.

5. An apparatus for processing video data, comprising:

the communication module is used for acquiring video stream data output by a camera and IMU measurement data output by an IMU, wherein the IMU and the camera are at the same or similar positions, and the IMU measurement data comprises IMU three-axis angle data;

the processing module is used for acquiring the pixel position of a target object according to a target frame image in the video stream data under the condition that a camera deflects, wherein the target frame image is any frame image in the video stream data;

the processing module is further used for acquiring a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under the deflection condition according to the IMU measurement data and the video stream data;

the processing module is further configured to obtain positioning information of the target object in a world coordinate system according to the pixel position of the target object and the projection matrix H'.

6. The apparatus of claim 5, wherein in said obtaining a projection matrix H' between a pixel coordinate system and a world coordinate system of the camera under yaw from the IMU measurement data and video stream data, the processing module is specifically configured to: acquiring a rotation matrix R according to the IMU three-axis angle data, wherein the rotation matrix R is a rotation matrix converted from an IMU coordinate system to a camera coordinate system; obtaining the projection matrix H 'according to the rotation matrix R, wherein the projection matrix H' meets the following conditions: h ═ HKRK^-1；

7. The apparatus of claim 5 or 6, wherein the processing module is further to: determining IMU measurement data aligned with a timestamp of the target frame image; and determining IMU three-axis angle data corresponding to the target frame image according to the aligned IMU measurement data.

8. The apparatus of claim 5 or 6, wherein the IMU measurement data further comprises: IMU triaxial acceleration data, IMU triaxial angular velocity data, processing module is still used for: judging whether the camera shakes or not according to the triaxial acceleration data and the triaxial angular velocity data; and outputting camera shake alarm information under the condition that the camera shakes.

9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1 to 4.

10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the method of any one of claims 1 to 4.