CN112668563B

CN112668563B - Method and related device for realizing movement track calculation based on video image

Info

Publication number: CN112668563B
Application number: CN202110282936.8A
Authority: CN
Inventors: 贾双成; 单国航; 朱磊; 李成军
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-06-04
Anticipated expiration: 2041-03-16
Also published as: CN112668563A

Abstract

The application relates to a method and a related device for realizing movement track calculation based on video images. The method comprises the following steps: acquiring a first image and a second image acquired by a vehicle in a driving process; acquiring a pixel value of each row of pixel points in the first image and the second image; dividing each row of pixel points in the first image and the second image into two pixel subsets; respectively inputting pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image; and obtaining the moving track of the vehicle from shooting the first image to shooting the second image according to the pose information when shooting the first image and the rotation matrix and the translation vector between the first image and the second image. The scheme that this application provided can promote the precision that the vehicle moved the orbit.

Description

Method and related device for realizing movement track calculation based on video image

Technical Field

The application relates to the technical field of navigation, in particular to a method and a related device for realizing movement track calculation based on video images.

Background

Currently, the Positioning and navigation of vehicles are mostly implemented by using a Positioning System, such as a GPS (Global Positioning System), an RTK (Real-time kinematic), and the like. However, in practical applications, it is found that in some situations, for example, under bridges, culverts, tunnels, dense buildings, etc., where satellite positioning signals are not good, positioning errors of the positioning system of the related art are large, and even a positioning result cannot be provided, so that the accuracy of the measured moving track of the vehicle is poor. Therefore, it is necessary to find a trajectory estimation method with high accuracy.

Disclosure of Invention

In order to solve or partially solve the problems in the related art, the application provides a method and a related device for calculating a movement track based on a video image, which can improve the precision of the movement track of a vehicle.

The application provides a method for realizing movement track calculation based on a video image, which comprises the following steps:

acquiring a first image and a second image acquired by a vehicle in a driving process;

acquiring the pixel value of each row of pixel points in the first image and the second image;

dividing each row of pixel points in the first image and the second image into two pixel subsets;

inputting pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image, wherein the preset data model is obtained by using multiple groups of data and utilizing a preset neural network algorithm for training, and each group of data in the multiple groups of data comprises: two adjacent images and pixel values of at least one row of pixel points with the same row number in the two adjacent images;

and obtaining the moving track of the vehicle when the first image is shot to the second image according to the pose information when the first image is shot and the rotation matrix and the translation vector between the first image and the second image.

Preferably, the step of inputting the pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image includes:

and inputting the pixel values of two pixel subsets with the same line number in the first image and the second image, a previous prediction rotation vector and a previous prediction translation vector into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image.

Preferably, when the first image and the second image are initial images, the previous predicted rotation vector is a preset initial rotation vector, and the previous predicted translation vector is a preset initial translation vector.

Preferably, when the first image and the second image are non-initial images, the prior predicted rotation vector is calculated by using a formula r12= r31 (ts1-ts3)/(ts2-ts1), and the prior predicted translation vector is calculated by using a formula t12= t31 (ts1-ts3)/(ts2-ts1), wherein the r12 is the prior predicted rotation vector, the r31 is a rotation vector obtained by converting a rotation matrix between the third image and the first image, which are obtained by predicting the first image and the third image by using the preset data model, the t12 is the prior predicted translation vector, the t31 is a translation vector between the third image and the first image, which are obtained by predicting the first image and the third image by using the preset data model, the ts1 is the acquisition time of the first image, the ts2 is the acquisition time of the second image, the ts3 is the acquisition time of the third image, the ts3 is smaller than the ts1, and the ts1 is smaller than the ts 2.

respectively inputting pixel values of two pixel subsets corresponding to at least two lines with the same line number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector corresponding to the at least two lines;

and weighting the rotation matrixes corresponding to the at least two lines and weighting the translation vectors corresponding to the at least two lines to obtain the rotation matrixes and the translation vectors between the first image and the second image.

Preferably, the weighting the rotation matrix corresponding to the at least two rows and the weighting the translation vector corresponding to the at least two rows to obtain the rotation matrix and the translation vector between the first image and the second image includes:

calculating a matrix distance between any two rotation matrixes in the rotation matrixes corresponding to the at least two rows, and calculating a vector distance between any two translation vectors in the translation vectors corresponding to the at least two rows;

if the matrix distances between the rotation matrix corresponding to one row and the rotation matrices corresponding to other rows are larger than a first preset value, and/or if the vector distances between the translation vector corresponding to one row and the translation vectors corresponding to other rows are larger than a second preset value, deleting the rotation matrix and the translation vector corresponding to the row to obtain the rotation matrix and the translation vector corresponding to the remaining rows;

and performing weighting processing on the rotation matrix corresponding to the residual row and performing weighting processing on the translation vector corresponding to the residual row to obtain a rotation matrix and a translation vector between the first image and the second image.

Preferably, before the obtaining of the pixel values of each row of pixel points in the first image and the second image, the method further includes:

carrying out target identification on the first image and the second image, and carrying out range marking on the identified targets to obtain a target area;

and adjusting the pixel value of the pixel point in the target area to be zero.

Preferably, after obtaining the pixel value of each row of pixel points in the first image and the second image, the method further includes:

if the ratio of the number of pixels with a row of pixel values of zero to the total number of the pixels in the row in the first image and/or the second image exceeds a preset ratio, deleting the pixels in the row in the first image and the second image to obtain the pixels in the remaining rows;

wherein the dividing each row of pixel points in the first image and the second image into two pixel subsets comprises:

and dividing each row of pixel points of the rest rows in the first image and the second image into two pixel subsets.

The second aspect of the present application provides an apparatus for calculating a movement trajectory based on a video image, including:

the image acquisition unit is used for acquiring a first image and a second image acquired by a vehicle in the running process;

the pixel acquisition unit is used for acquiring the pixel value of each row of pixel points in the first image and the second image;

the pixel segmentation unit is used for dividing each row of pixel points in the first image and the second image into two pixel subsets;

a parameter prediction unit, configured to input pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction, so as to obtain a rotation matrix and a translation vector between the first image and the second image, where the preset data model is obtained by using multiple sets of data and training the multiple sets of data by using a preset neural network algorithm, and each set of data in the multiple sets of data includes: two adjacent images and pixel values of at least one row of pixel points with the same row number in the two adjacent images;

and the track acquisition unit is used for acquiring the moving track of the vehicle when the first image is shot to the second image according to the pose information when the first image is shot and the rotation matrix and the translation vector between the first image and the second image.

A third aspect of the present application provides an electronic device comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

A fourth aspect of the present application provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a method as described above.

According to the technical scheme, the first image and the second image which are collected in sequence in the driving process of the vehicle can be obtained, the pixel value of each row of pixel points in the two images is obtained, and each row of pixel points is divided into two pixel subsets; furthermore, pixel values of two pixel subsets with the same row number in the two images can be input into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image, and then the moving track of the vehicle when the first image and the second image are shot can be calculated by combining pose information when the first image is shot. According to the technical scheme, two images are given, each line of the images is divided into two parts according to the driving inertia of the vehicle, model prediction is carried out on pixel values of the four parts of the two parts with the same line number in the two images, a rotation matrix and a translation vector between the two images can be obtained, then the moving track of the vehicle when the two images are shot can be calculated by combining pose information of the shot previous image, and compared with the track measured by a traditional positioning system, the track calculated by the method is higher in precision.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application, as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a schematic flowchart illustrating a method for calculating a movement trajectory based on a video image according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a result of target recognition performed on an image acquired during a vehicle driving process according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an apparatus for performing motion trajectory estimation based on a video image according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise. The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present application provides a method for calculating a movement trajectory based on a video image. As shown in fig. 1, the method may include the steps of:

110. the method comprises the steps of acquiring a first image and a second image acquired by a vehicle in the driving process.

In the embodiment of the application, video data in the driving process can be acquired through a camera device, wherein the camera device can include but is not limited to a device with a camera shooting function, such as a driving recorder, a camera or a mobile phone of a driver, which is installed on a vehicle. The camera device may be disposed at a head of the vehicle to capture a video of a front of the vehicle, or disposed at a tail of the vehicle to capture a video of a rear of the vehicle, which is not limited herein. In order to process the image subsequently, the video data acquired by the camera device needs to be subjected to frame extraction. Generally, the frame rate of the video is 30 frames per second, and the video may be decimated according to a preset rule, for example, decimating 10 frames per second, 15 frames per second, 20 frames per second, 30 frames per second, or other values, so as to obtain a plurality of captured images, where the acquisition time interval of any two adjacent images is a decimation time interval.

The first image and the second image may be any two images of a plurality of images obtained by frame extraction, and the acquisition time of the first image may be earlier than that of the second image. When the acquisition time of the first image is adjacent to that of the second image, the acquisition time interval of the first image and the second image is a frame extraction time interval.

The method provided by the embodiment of the application can be applied to the vehicle machine and can also be applied to other equipment with calculation and processing functions, such as a computer, a mobile phone and the like. Taking the car machine as an example, the camera device may be built in the car machine, or may be disposed outside the car machine, and establishes a communication connection with the car machine.

120. And acquiring the pixel value of each row of pixel points in the first image and the second image.

Wherein the first image and the second image can be regarded as two bitmap images. The first image and the second image have the same size, and can be formed by pixel point matrixes of N rows and M columns, wherein N and M are positive integers.

When the image is a gray image, the pixel value of the pixel point can be represented as the gray value of the pixel point. When the image is a color image, the pixel value of the pixel point may be represented as an RGB value of the pixel point, or the pixel value of the pixel point may also be represented as a gray value converted from the RGB value of the pixel point.

130. And dividing each row of pixel points in the first image and the second image into two pixel subsets.

Due to the driving inertia of the vehicle, when the vehicle drives forwards, the left object in the collected video image moves leftwards, and the right object moves rightwards, namely the left and right states are different, so that the pixel point of each line in the image can be divided into a left pixel subset and a right pixel subset. Preferably, the pixels in each row in the image may be equally divided into two subsets of left and right pixels, for example, when the size of the image is 1024 × 720, the pixels in each row are equally divided into two subsets of left and right pixels, and each subset of pixels includes 360 pixels.

140. And respectively inputting the pixel values of two pixel subsets with the same line number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image.

The preset data model is obtained by using a plurality of groups of data and utilizing preset neural network algorithm training, and each group of data in the plurality of groups of data can comprise: two adjacent images and the pixel values of at least one row of pixel points with the same row number in the two adjacent images. The rotation matrix and the translation vector between the first image and the second image can be regarded as the rotation amount and the translation amount of the second image relative to the first image, namely the pose change condition of the second image relative to the first image.

The preset data model can be downloaded from a server side, or can be obtained by acquiring a plurality of sample images in advance in the vehicle running process and training the sample images by using a preset neural network algorithm. The pre-set Neural Network algorithm may include, but is not limited to, RNN (Current Neural Network), Bi-RNN (Bidirectional Neural Network), LSTM (Long Short-Term Memory Network), and the like.

When the sizes of the first image and the second image are both N × M, the line numbers of the first image are respectively the 0 th line, the 1 st line, … …, the N-2 th line and the N-1 st line from top to bottom, and the line numbers of the second image are also respectively the 0 th line, the 1 st line, … …, the N-2 nd line and the N-1 st line from top to bottom.

In the embodiment of the present application, the pixel values of two pixel subsets of one or more or all of the lines with the same line number in the first image and the second image may be used as input. For example, pixel values of two pixel subsets of 0 th row of each of the first image and the second image, namely 4 pixel subsets, are input into a preset data model for prediction. For another example, the pixel values of two pixel subsets of the middle row (e.g., N/2 rows) of the first image and the second image, which are 4 pixel subsets, are input into the preset data model for prediction. For another example, a row with the largest accumulated pixel values in the first image is obtained, and the pixel values of two pixel subsets of the row in the first image and the second image, which are 4 pixel subsets in total, are input into a preset data model for prediction.

In an optional implementation manner, the step 140 respectively inputs the pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction, and a specific implementation manner of obtaining the rotation matrix and the translation vector between the first image and the second image may include:

14a) respectively inputting pixel values of two pixel subsets corresponding to at least two lines with the same line number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector corresponding to the at least two lines;

14b) and weighting the rotation matrix corresponding to the at least two lines and weighting the translation vector corresponding to the at least two lines to obtain the rotation matrix and the translation vector between the first image and the second image.

Wherein, the weighting coefficients of the rotation matrix and/or the translation vector corresponding to different rows can be set to be the same or different. For example, the weighting coefficients of the rotation matrices of different rows are all set to 0.5, i.e., the rotation matrices of each row are averaged. For example, the weighting coefficients of the rotation matrices of different rows may be set according to the pixel value distribution of the row, and if the accumulated sum of the pixel values of the row is large, the weighting coefficient is large, and if the accumulated sum is small, the weighting coefficient is small.

For example, the pixel values of 4 pixel subsets of two pixel subsets of (N-2)/2 rows in the first image and the second image are input into a preset data model for prediction to obtain a first set of rotation matrix and translation vector, the pixel values of 4 pixel subsets of two pixel subsets of N/2 rows in the first image and the second image are input into the preset data model for prediction to obtain a second set of rotation matrix and translation vector, and the two sets of rotation matrix and translation vector are respectively subjected to weighting processing (such as averaging) to obtain a final rotation matrix and translation vector.

For another example, pixel values of two pixel subsets of 0 th row in the first image and the second image, which are 4 pixel subsets, are input into a preset data model for prediction to obtain a first group of rotation matrix and translation vector, pixel values of two pixel subsets of 1 st row in the first image and the second image, which are 4 pixel subsets, are input into the preset data model for prediction to obtain a second group of rotation matrix and translation vector, and then the steps are sequentially carried out until pixel values of 4 pixel subsets of N-1 th row in the first image and the second image are input into the preset data model for prediction to obtain an N-th group of rotation matrix and translation vector. And performing weighting processing (such as averaging) on the N groups of rotation matrixes, and performing weighting processing (such as averaging) on the N groups of translation vectors to obtain final rotation matrixes and translation vectors.

In an optional implementation manner, the step 14 b) of performing weighting processing on the rotation matrix corresponding to the at least two rows and performing weighting processing on the translation vector corresponding to the at least two rows to obtain the rotation matrix and the translation vector between the first image and the second image may include the following steps:

and performing weighting processing on the rotation matrixes corresponding to the residual rows and performing weighting processing on the translation vectors corresponding to the residual rows to obtain the rotation matrixes and the translation vectors between the first image and the second image.

Specifically, since there may be some noise points in the images, for example, a vehicle suddenly enters and is collected, the pixel values of some rows of the two previous and next images will change greatly, and the rotation matrix and the translation vector predicted by using the rows are greatly different from those of other rows. Therefore, the noise rows can be screened by calculating the distance between the matrixes, the rotation matrixes and the translation vectors predicted by the noise rows are deleted, and the rest rotation matrixes and the rest translation vectors are respectively weighted, so that the final rotation matrixes and the final translation vectors are obtained. By removing the noise point row, the accuracy of pose prediction can be effectively improved.

In a possible implementation manner, the step 140 respectively inputs the pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction, and a specific implementation manner of obtaining the rotation matrix and the translation vector between the first image and the second image may include:

14c) and respectively inputting the pixel values of two pixel subsets with the same line number in the first image and the second image, a previous prediction rotation vector and a previous prediction translation vector into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image.

Optionally, when the first image and the second image are two initial images acquired at the beginning, a preset initial rotation vector and an initial translation vector may be acquired, that is, a rotation vector is predicted in advance as the preset initial rotation vector, and a translation vector is predicted in advance as the preset initial translation vector. For example, the initial rotation vector and the initial translation vector may both be set to zero vectors.

For example, the pixel values of two pixel subsets of the respective N/2 th row in the first image and the second image, and 6 input parameters of the initial rotation matrix and the initial translation vector are input into the preset data model for prediction.

Alternatively, when the first image and the second image are non-initial images, a prior predicted rotation vector may be calculated by using a formula r12= r31 (ts1-ts3)/(ts2-ts1), and a prior predicted translation vector may be calculated by using a formula t12= t31 (ts1-ts3)/(ts2-ts1), where r12 is the prior predicted rotation vector, r31 is a rotation vector converted by a rotation matrix between a third image and the first image, the third image and the first image are predicted by using a preset data model, t12 is the prior predicted translation vector, t31 is a translation vector between the third image and the first image, the first image and the third image are predicted by using a preset data model, ts1 is an acquisition time of the first image, ts2 is an acquisition time of the second image, and ts3 is an acquisition time of the third image, ts3 is less than ts1 and ts1 is less than ts 2.

Specifically, when the first image and the second image are non-initial images, the rotation vector and the translation vector of the previous prediction can be determined by predicting the first image and a third image with earlier acquisition time through a preset data model to obtain a rotation matrix and a translation vector between the third image and the first image. The rotation matrix can be converted into a corresponding rotation vector by using a Rodrigues formula. When ts1-ts3 is equal to ts2-ts1, it can be shown that the acquisition time interval between the first image and the third image is equal to the acquisition time interval between the second image and the first image, for example, the acquisition time interval between the first image and the third image is a frame-extracting time interval, and the acquisition time interval between the second image and the first image is also a frame-extracting time interval. In this case, the prior prediction rotation vector r12 is a rotation vector r31 obtained by converting a rotation matrix obtained by model prediction of the third image and the first image into a roederi formula, and the prior prediction translation vector t12 is a translation vector t31 obtained by model prediction of the third image and the first image. When ts1-ts3 is not equal to ts2-ts1, the previous predicted rotation vector r12 is the rotation vector between the third image and the first image multiplied by the ratio of the two sampling time intervals, and the previous predicted translation vector t12 is the translation vector between the third image and the first image multiplied by the ratio of the two sampling time intervals. And the predicted rotation matrix and translation vector between the first image and the second image can be used as the input for the pose prediction of the second image and the next image thereof. The result of the previous prediction is used as the input of the next prediction, so that the relevance of the whole system can be improved, and the accuracy of the result is higher.

In an alternative embodiment, before performing step 120, the method described in fig. 1 may further include the following steps:

and adjusting the pixel value of the pixel point in the target area to be zero.

For example, as shown in fig. 2, a time watermark is often displayed on an image during image acquisition, a part of a vehicle body of a vehicle may be shot due to a shooting angle of a camera device, and even a reflection factor, an aperture, and other influence factors may appear in the image due to weather or light. The display positions of the factors in different images are the same and can be regarded as image noise, and the accuracy of the prediction result is reduced by using the pixel values of the image noise to predict the pose, so that the influence of the noise needs to be eliminated before the pixel values are extracted. Because the positions of the noise points in the image are always fixed, such as the time watermark is positioned right above the image, the vehicle body is positioned below the image, and the like, the positions can be directly subjected to range marking to obtain a plurality of target areas. In fig. 2, the sky is displayed in the image and occupies a large area, and the influence of the sky area can be eliminated because the sky contains less feature information.

Furthermore, the vehicles often collect some dynamic objects during driving, such as other driving vehicles, people walking on sidewalks or both sides of roads, animals, or moving objects such as planes or kites flying on the sky. Because the position and the posture of the dynamic object may change in real time, if the position and the posture are predicted by using the pixel values on the dynamic object, the accuracy of the prediction result is also reduced. Therefore, it is necessary to exclude the influence of these dynamic objects before extracting the pixel values. Specifically, a target detection algorithm may be used to identify a target dynamic object on the image, and perform range labeling on the identified target dynamic object. As shown in fig. 2, a rectangular frame may be used for range marking to mark the position and size of the target dynamic object, so as to obtain the target area. It is understood that the target dynamic object may be calibrated by using a circle, an oval ring or other regular or irregular shapes, which is not limited herein. Further, to eliminate the influence of the pixel values in the target regions, the pixel value of each pixel point in each target region may be set to 0.

In an alternative embodiment, after performing step 120, the method described in fig. 1 may further include:

if the ratio of the number of pixels with zero pixel value in one row in the first image and/or the second image to the total number of the pixels in the row exceeds a preset ratio, deleting the pixels in the row in the first image and the second image to obtain the pixels in the rest rows;

accordingly, the specific implementation of step 130 of dividing each row of pixel points in the first image and the second image into two pixel subsets may include the following steps:

Specifically, if the ratio of the number of pixels having a pixel value of 0 in a row to the total number of pixels in the row in an image exceeds a preset ratio, it may be considered that the effective feature information of the row is less, and the influence on the pose prediction is less. Therefore, the pixel points of the row in the two images can be deleted, and the position and pose prediction is not carried out subsequently by using the pixel points of the row, so that the calculation amount of the position and pose prediction can be reduced, and the calculation efficiency is improved. The preset ratio can be a default value, and can also be adaptively adjusted according to actual requirements. For example, the predetermined ratio is 1/3, 2/5, 1/2 or other values.

150. And obtaining the moving track of the vehicle from shooting the first image to shooting the second image according to the pose information when shooting the first image and the rotation matrix and the translation vector between the first image and the second image.

In the embodiment of the present application, the pose information when the first image is captured may include pose information of the image capturing device or the vehicle when the first image is captured. The position information when the first image is captured may be directly acquired from a positioning device (e.g., GPS, RTK), or may be derived from the position information when the initial image is captured. When the pose information when the first image is shot is obtained, the pose information when the second image is shot can be obtained by combining the rotation matrix and the translation vector between the first image and the second image. Further, the moving track of the vehicle when the first image and the second image are shot can be calculated through the pose information when the first image and the second image are shot.

In an optional embodiment, after the rotation matrix and the translation vector between the first image and the second image are predicted, the fourth image may be continuously acquired, wherein the acquisition time of the fourth image is later than the acquisition time of the second image, and the pixel value of each row of pixel points in the fourth image is acquired,

and dividing each row of pixel points in the fourth image into two pixel subsets. Further, when the acquisition time interval between the second image and the first image is equal to the acquisition time interval between the fourth image and the second image, the rotation matrix between the first image and the second image can be converted into a rotation vector, and then the pixel values of two pixel subsets with the same row number in the second image and the fourth image, and the rotation vector and the translation vector between the first image and the second image are respectively input into a preset data model for prediction to obtain a rotation matrix and a translation vector between the second image and the fourth image;

or when the acquisition time interval between the second image and the first image is not equal to the acquisition time interval between the fourth image and the second image, calculating the ratio of the two acquisition time intervals, multiplying the rotation vector between the first image and the second image by the ratio to obtain a prior prediction rotation vector, multiplying the translation vector between the first image and the second image by the ratio to obtain a prior prediction translation vector, and inputting the pixel values of two pixel subsets with the same row number in the second image and the fourth image, the prior prediction rotation vector and the prior prediction translation vector into a preset data model for prediction to obtain a rotation matrix and a translation vector between the second image and the fourth image.

And obtaining the moving track of the vehicle when the second image to the fourth image are shot according to the pose information when the second image is shot and the rotation matrix and the translation vector between the second image and the fourth image.

It can be understood that the rotation matrix and the translation vector between the second image and the fourth image obtained by model prediction can be used as input for pose prediction of the subsequent fourth image and the next image thereof. According to the method, the rotation matrix and the translation vector of any two images in the driving process of the vehicle can be obtained, so that the moving track of the vehicle in the whole driving process can be calculated.

According to the embodiment of the application, two images are given, each line of the images is divided into two parts according to the driving inertia of a vehicle, model prediction is carried out on pixel values of the four parts of the two parts with the same line number in the two images, a rotation matrix and a translation vector between the two images can be obtained, then the moving track of the vehicle when the two images are shot can be calculated by combining pose information of the shot previous image, and compared with the track measured by a traditional positioning system, the track calculated by the method is higher in precision. In addition, the result of the previous prediction is used as the input of the next prediction, so that the relevance of the whole system can be improved, and the accuracy of the result is higher.

Furthermore, before model prediction is carried out, noise points of the image can be eliminated, the influence of time watermarks, a vehicle body, a dynamic object and the like is eliminated, the accuracy of a prediction result can be improved, and the moving track precision of the vehicle is higher. In addition, the rows with less effective information can be deleted, the calculation amount can be reduced, and the calculation efficiency can be improved.

Referring to fig. 3, an embodiment of the present application further provides a device for calculating a movement track based on a video image, which can be used to execute the method for calculating a movement track based on a video image provided in the foregoing embodiment. As shown in fig. 3, the apparatus may include:

an image acquisition unit 310, configured to acquire a first image and a second image acquired during a driving process of a vehicle;

a pixel obtaining unit 320, configured to obtain a pixel value of each row of pixel points in the first image and the second image;

the pixel segmentation unit 330 is configured to divide each row of pixel points in the first image and the second image into two pixel subsets;

the parameter prediction unit 340 is configured to input pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction, so as to obtain a rotation matrix and a translation vector between the first image and the second image, where the preset data model is obtained by using multiple sets of data and training the multiple sets of data by using a preset neural network algorithm, and each set of data in the multiple sets of data may include: two adjacent images and the pixel values of at least one row of pixel points with the same row number in the two adjacent images;

the trajectory acquiring unit 350 is configured to obtain a moving trajectory of the vehicle when the first image is captured to the second image according to the pose information when the first image is captured and the rotation matrix and the translation vector between the first image and the second image.

Optionally, the specific implementation manner of the parameter prediction unit 340 that respectively inputs the pixel values of two pixel subsets with the same row number in the first image and the second image into the preset data model for prediction to obtain the rotation matrix and the translation vector between the first image and the second image may include:

the parameter prediction unit 340 inputs the pixel values of two pixel subsets with the same row number in the first image and the second image, and the prior predicted rotation vector and the prior predicted translation vector into a preset data model for prediction, so as to obtain a rotation matrix and a translation vector between the first image and the second image.

Optionally, when the first image and the second image are initial images, the previous predicted rotation vector may be a preset initial rotation vector, and the previous predicted translation vector may be a preset initial translation vector.

Alternatively, when the first image and the second image are non-initial images, a prior predicted rotation vector may be calculated by using a formula r12= r31 (ts1-ts3)/(ts2-ts1), and a prior predicted translation vector may be calculated by using a formula t12= t31 (ts1-ts3)/(ts2-ts1), where r12 is the prior predicted rotation vector, r31 is a rotation vector obtained by converting a rotation matrix between a third image and the first image, the third image and the third image are predicted by using a preset data model, t12 is the prior predicted translation vector, t31 is a translation vector between the third image and the first image, the third image and the first image are predicted by using a preset data model, ts1 is an acquisition time of the first image, ts2 is an acquisition time of the second image, and ts3 is an acquisition time of the third image, ts3 is less than ts1 and ts1 is less than ts 2.

the parameter prediction unit 340 inputs the pixel values of two pixel subsets corresponding to at least two lines with the same line number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector corresponding to the at least two lines, performs weighting processing on the rotation matrix corresponding to the at least two lines and performs weighting processing on the translation vector corresponding to the at least two lines to obtain a rotation matrix and a translation vector between the first image and the second image.

Optionally, the specific implementation manner of the parameter prediction unit 340 performing weighting processing on the rotation matrix corresponding to the at least two lines and performing weighting processing on the translation vector corresponding to the at least two lines to obtain the rotation matrix and the translation vector between the first image and the second image may include:

the parameter prediction unit 340 calculates a matrix distance between any two rotation matrices in the rotation matrices corresponding to the at least two rows and calculates a vector distance between any two translation vectors in the translation vectors corresponding to the at least two rows; if the matrix distances between the rotation matrix corresponding to one row and the rotation matrices corresponding to other rows are larger than a first preset value, and/or if the vector distances between the translation vector corresponding to one row and the translation vectors corresponding to other rows are larger than a second preset value, deleting the rotation matrix and the translation vector corresponding to the row to obtain the rotation matrix and the translation vector corresponding to the remaining rows; and performing weighting processing on the rotation matrixes corresponding to the residual rows and performing weighting processing on the translation vectors corresponding to the residual rows to obtain the rotation matrixes and the translation vectors between the first image and the second image.

Optionally, the apparatus shown in fig. 3 may further include an object recognition unit and a pixel adjustment unit (not shown in the figure), wherein:

the target identification unit is used for performing target identification on the first image and the second image before the pixel value of each row of pixel points in the first image and the second image is acquired by the pixel acquisition unit 320, and performing range marking on the identified targets to obtain target areas;

and the pixel adjusting unit is used for adjusting the pixel value of the pixel point in the target area to be zero.

Optionally, the apparatus shown in fig. 3 may further comprise an image processing unit (not shown in the figure), wherein:

the image processing unit is configured to delete the pixel points of the line in the first image and the second image to obtain the pixel points of the remaining lines if the ratio of the number of the pixel points with the pixel value of one line being zero to the total number of the pixel points of the line in the first image and/or the second image exceeds a preset ratio after the pixel value of each line of the pixel points in the first image and the second image is obtained by the pixel obtaining unit 320;

accordingly, the specific implementation of the pixel segmentation unit 330 dividing each row of pixel points in the first image and the second image into two pixel subsets may include:

the pixel segmentation unit 330 divides each row of pixel points of the remaining rows in the first image and the second image into two pixel subsets.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.

By implementing the device shown in fig. 3, for any two images acquired in the vehicle running process, each line of the images is divided into two parts according to the running inertia of the vehicle, model prediction is performed on pixel values of four parts of the two parts with the same line number in the two images, so that a rotation matrix and a translation vector between the two images can be obtained, and then the moving track of the vehicle when the two images are shot can be calculated by combining the pose information of the shot previous image. In addition, the result of the previous prediction is used as the input of the next prediction, so that the relevance of the whole system can be improved, and the accuracy of the result is higher.

Referring to fig. 4, an embodiment of the present application provides an electronic device, which can be used to execute the method for calculating a movement track based on a video image according to the foregoing embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 410 and a memory 420. Wherein processor 410 and memory 420 are communicatively coupled. It is understood that the structure of the electronic device 400 shown in fig. 4 does not limit the embodiments of the present application, and may include more components than those shown in the drawings, such as a communication interface (e.g., a bluetooth module, a WIFI module, etc.), an input/output interface (e.g., a key, a touch screen, a speaker, a microphone, etc.), a sensor, and so on. Wherein:

the Processor 410 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 420 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 410 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. In addition, the memory 420 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 420 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 420 has stored thereon executable code, which when processed by the processor 410, may cause the processor 410 to perform some or all of the steps of the methods described above.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or electronic device, server, etc.), causes the processor to perform some or all of the various steps of the above-described methods in accordance with the present application.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for realizing movement track calculation based on video images is characterized by comprising the following steps:

obtaining a moving track of the vehicle when the first image is shot to the second image according to the pose information when the first image is shot and the rotation matrix and the translation vector between the first image and the second image;

wherein, the step of inputting the pixel values of two pixel subsets with the same row number in the first image and the second image into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image comprises:

inputting pixel values of two pixel subsets with the same line number in the first image and the second image, a previous prediction rotation vector and a previous prediction translation vector into a preset data model for prediction to obtain a rotation matrix and a translation vector between the first image and the second image; alternatively, the first and second electrodes may be,

2. The method according to claim 1, wherein the prior predicted rotation vector is a preset initial rotation vector and the prior predicted translation vector is a preset initial translation vector when the first image and the second image are initial images.

3. The method according to claim 1, wherein when the first image and the second image are non-initial images, the prior predicted rotation vector is calculated using the formula r12= r31 (ts1-ts3)/(ts2-ts1), and the prior predicted translation vector is calculated using the formula t12= t31 (ts1-ts3)/(ts2-ts1), wherein the r12 is the prior predicted rotation vector, the r31 is a rotation vector after rotation matrix conversion between the third image and the first image predicted by the first image and the third image using the preset data model, the t12 is the prior predicted translation vector, the t31 is a translation vector between the third image and the first image predicted by the first image and the third image using the preset data model, the ts1 is the acquisition time of the first image, the ts2 is the acquisition time of the second image, the ts3 is the acquisition time of the third image, the ts3 is smaller than the ts1, and the ts1 is smaller than the ts 2.

4. The method according to any one of claims 1-3, wherein before obtaining the pixel values of each row of pixels in the first image and the second image, the method further comprises:

and adjusting the pixel value of the pixel point in the target area to be zero.

5. The method of claim 4, wherein after obtaining the pixel values of each row of pixels in the first image and the second image, the method further comprises:

6. An apparatus for calculating a movement trajectory based on a video image, comprising:

the track acquisition unit is used for acquiring a moving track of the vehicle when the first image is shot to the second image according to the pose information when the first image is shot and the rotation matrix and the translation vector between the first image and the second image;

7. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-5.