CN115761558A

CN115761558A - Method and device for determining key frame in visual positioning

Info

Publication number: CN115761558A
Application number: CN202111021732.5A
Authority: CN
Inventors: 刘世蔷; 谢铭诗; 张义; 张慧
Original assignee: SAIC Motor Corp Ltd
Current assignee: SAIC Motor Corp Ltd; Shanghai Automotive Industry Corp Group
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2023-03-07

Abstract

The embodiment of the application discloses a method and a device for determining key frames in visual positioning, wherein a video stream is provided with a plurality of image frames which are arranged according to a time sequence, a first key frame is arranged in the video stream, a first target image frame positioned behind the first key frame in the video stream can be obtained, the actual parallax between the first target image frame and the first key frame is calculated according to the distance between the first key frame and matched feature points in the first target image frame, if the actual parallax between the first target image frame and the first key frame is larger than or equal to the preset parallax, the number of the matched feature points is smaller than or equal to a first preset value or the difference between the number of the matched feature points is larger than or equal to a second preset value, the difference between the first target image frame and the first key frame is larger, and the redundant information is less, the first target image frame can be determined to be a second key frame, each determined key frame has lower redundant information while ensuring accurate pose, the amount of stored data is reduced, and computing resources are saved.

Description

Method and device for determining key frame in visual positioning

Technical Field

The invention relates to the field of computers, in particular to a method and a device for determining a key frame in visual positioning.

Background

The visual positioning technology is a technology for positioning an object by using a video stream acquired by a camera, and specifically, the video stream can be acquired by using the camera on a vehicle, and the moving track of the vehicle is determined by using the positions of key points in the video stream in a multi-frame image, wherein the key points are usually static points in a scene. However, in the video stream, if each frame is stored, the amount of global data is huge, and if the moving speed of the vehicle is slow or stops, a large amount of repeated information exists in the video stream, and how to reduce the amount of data and save the computing resources is an important problem for those skilled in the art.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present application provide a method and an apparatus for determining a key frame in visual positioning, so that redundant information is reduced, data amount is reduced, and computational resources are saved.

The embodiment of the application provides a method for determining key frames in visual positioning, a video stream is provided with a plurality of image frames which are arranged in a time sequence, a first key frame is arranged in the video stream, and the method comprises the following steps:

determining a first target image frame in the video stream that is located after the first keyframe;

determining the first target image frame as a second key frame if the first target image frame meets at least one of the following conditions:

the actual parallax of the first target image frame and the first key frame is greater than or equal to a first preset parallax, the number of matched feature points in the first key frame and the first target image frame is less than or equal to a first preset value, and the frame number difference between the first key frame and the first target image frame is greater than or equal to a second preset value;

and calculating the actual parallax of the first target image frame and the first key frame according to the distance between the first key frame and the matched feature points in the first target image frame.

Optionally, the method further includes:

acquiring a second target image frame positioned after the second key frame in the video stream;

if the second target image frame meets at least one of the following conditions, determining that the second target image frame is a third key frame:

the number of matched feature points in the second key frame and the second target image frame is less than or equal to a third preset value, the frame number difference between the second key frame and the second target image frame is greater than or equal to a fourth preset value, the actual parallax between the second target image frame and the second key frame is greater than or equal to a second preset parallax, and the actual parallax between the second target image frame and the first key frame is greater than or equal to a third preset parallax;

and calculating the actual parallax between the second target image frame and the second key frame according to the distance between the second key frame and the matched feature points in the second target image frame.

Optionally, the feature points in the first keyframe and the first target image frame are detected by an ORB algorithm.

Optionally, the video stream is acquired by a camera on a vehicle, the preset parallax corresponds to the first target image frame, and the preset parallax c' is represented by the following formula:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

wherein Δ x is a translation distance of the vehicle in a first direction, Δ y is a translation distance of the vehicle in a second direction, Δ z is a rotation angle of the vehicle around a third direction, the first direction, the second direction and the third direction are three coordinate axis directions of a three-dimensional rectangular coordinate system, the third direction is a vertical direction, the Δ x, the Δ y and the Δ z are determined according to poses of the vehicle in the first keyframe and the first target image frame, and the poses of the vehicle in the first keyframe and the first target image frame are determined according to relative positions of the matched feature points; said w ₁ W to ₂ And said w ₃ Are the weight of Δ x, Δ y and Δ z, respectively, andis a number from 0 to 1.

Optionally, the first key frame is an image frame of the first frame.

The embodiment of the application provides a device for determining key frames in visual positioning, a video stream is provided with a plurality of image frames which are arranged in a time sequence, a first key frame is arranged in the video stream, and the device comprises:

a disparity calculating unit, configured to determine that the first target image frame is a second key frame according to that the first key frame satisfies at least one of the following conditions:

Optionally, the frame determining unit is further configured to: acquiring a second target image frame positioned after the second key frame in the video stream;

the key frame determination unit is further configured to:

Optionally, the video stream is acquired by a camera on a vehicle, the preset parallax corresponds to the first target image frame, and the preset parallax c' is expressed by the following formula:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

wherein Δ x is a translation distance of the vehicle in a first direction, Δ y is a translation distance of the vehicle in a second direction, Δ z is a rotation angle of the vehicle around a third direction, the first direction, the second direction and the third direction are three coordinate axis directions of a three-dimensional rectangular coordinate system, the third direction is a vertical direction, the Δ x, the Δ y and the Δ z are determined according to poses of the vehicle in the first keyframe and the first target image frame, and the poses of the vehicle in the first keyframe and the first target image frame are determined according to relative positions of the matched feature points; said w ₁ W to ₂ And said w ₃ The weights of Δ x, Δ y, and Δ z are numbers of 0 to 1, respectively.

Optionally, the first key frame is an image frame of the first frame.

The embodiment of the application provides a method and a device for determining key frames in visual positioning, a video stream is provided with a plurality of image frames which are arranged in a time sequence, a first key frame is arranged in the video stream, a first target image frame positioned behind the first key frame in the video stream can be obtained, the actual parallax between the first target image frame and the first key frame is calculated according to the distance between the first key frame and matched feature points in the first target image frame, if the actual parallax between the first target image frame and the first key frame is larger than or equal to the preset parallax, the number of the matched feature points is smaller than or equal to a first preset value or the difference between the number of the matched feature points is larger than or equal to a second preset value, the difference between the first target image frame and the first key frame is larger, and the redundant information is less, the first target image frame can be determined to be a second key frame, each determined key frame has lower redundant information while ensuring accurate pose, the amount of stored data is reduced, and therefore computing resources are saved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a method for determining a key frame in visual positioning according to an embodiment of the present disclosure;

fig. 2 is a block diagram of a device for determining a keyframe in visual positioning according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, an object may be located by using a video stream acquired by a camera, specifically, a camera on a vehicle may be used to acquire a video stream, and a moving track of the vehicle is determined by using positions of key points in a multi-frame image in the video stream, where the key points are usually static points in a scene. However, in the video stream, if the moving speed of the vehicle is slow, the position difference of the key points in the multi-frame images is small, and storing each frame results in huge global data volume, and meanwhile, the multi-frame images with small position difference of the key points have repeated information, so how to reduce the data volume and save the computing resources is an important problem for those skilled in the art.

Based on this, an embodiment of the present application provides a method and an apparatus for determining a keyframe in visual positioning, where a video stream has a plurality of image frames arranged in a time sequence, and the video stream has a first keyframe, a first target image frame located after the first keyframe in the video stream may be obtained, and an actual disparity between the target image frame and the first keyframe is calculated according to a distance between the first keyframe and a feature point matched with the first target image frame, and if the actual disparity between the first target image frame and the first keyframe is greater than or equal to a preset disparity, and the number of matched feature points is less than or equal to a first preset value or a frame number difference is greater than or equal to a second preset value, it is indicated that a difference between the first target image frame and the first keyframe is greater, and there is less redundant information, the first target image frame may be determined to be a second keyframe, and each determined keyframe has lower redundant information while ensuring accurate pose, and an amount of stored data is reduced, thereby saving computing resources.

The following describes in detail a specific implementation of a method and an apparatus for determining a keyframe in visual positioning according to an embodiment of the present application with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a method for determining a key frame in visual positioning according to an embodiment of the present application is provided, where the method may include the following steps.

S101, a first target image frame located after a first key frame in the video stream is determined.

In the embodiment of the application, the image frames in the video stream can be analyzed, so that the moving track of the vehicle in the image frames is obtained. The video stream comprises a plurality of image frames which are arranged according to a time sequence, the image frames are acquired by a camera, and the time of the image frames is the time of acquiring the image frames by the camera. Specifically, the video stream may be captured by a camera mounted on the vehicle, and each image frame represents a position of an object around the vehicle relative to the camera, so that a movement track of the vehicle may be obtained based on the position of the surrounding object relative to the camera by using the video stream. The camera that captures the video stream may be a binocular vision system.

Because the video stream is used for realizing visual positioning, only one or a few image frames with smaller parallax in the video stream can be reserved, and the image frames needing to be reserved are used as key frames, so that redundant information is reduced, and the amount of stored data is reduced.

In this embodiment, a first key frame may be determined in a video stream, where the first key frame may be an image frame of the first frame or an nth image frame, and then a second key frame, a third key frame, and the like may be determined according to the first key frame. That is, whether a subsequent image frame is a key frame may be determined from a previous key frame, and for convenience of description, the previous key frame is taken as a first key frame and a key frame determined from the first key frame is taken as a second key frame. The previous key frame may be a first key frame in the video stream, or may be a key frame determined according to other key frames in the video stream, and after determining a second key frame according to the first key frame, the second key frame may be used as a new first key frame for determining a new second key frame.

After determining that the video stream has the first key frame, some image frame after the first key frame may be analyzed as the first target image frame to determine whether the first target image frame is the second key frame. The first target image frame may be an image frame after the first key frame, that is, the acquisition time of the first target image frame is later than that of the first key frame, the first target image frame and the first key frame are adjacent image frames, or other image frames may be separated from the first key frame. Specifically, each image frame after the first key frame may be sequentially used as the first target image frame.

S102, calculating the actual parallax between the first target image frame and the first key frame according to the distance between the first key frame and the matched feature points in the first target image frame, and determining that the first target image frame is the second key frame if the actual parallax between the first key frame and the first target image frame is larger than or equal to a first preset parallax.

In the embodiment of the application, the first keyframe and the first target image frame may have matched feature points, the matched feature points are at least one pair of feature points in the first keyframe and the first target image frame, two feature points in the pair of feature points have substantially the same features, and represent points at the same position corresponding to the same object collected by the camera, and the feature points are stationary and easily recognized points in the field of view of the camera, and may be feature points on a stationary vehicle ahead, points on a lane line on a road surface, points of surrounding trees, and the like. The feature points are usually corner points where the change of the gray value of pixels in the image is obvious, and the feature points can be matched according to the description similarity of each feature point.

The feature points can be detected by ORB (original FAST and Rotated Brief) algorithm, the ORB algorithm is divided into two parts, which are original FAST feature point extraction and Brief feature point description, the feature point extraction is developed by FAST (featurefrom acquired Segment Test) algorithm, the original FAST is an improved FAST corner, the main direction of FAST feature points is calculated, the feature point description is improved according to Brief (Binary route independent element) feature description algorithm, brief is a descriptor, and the feature after rotation can be calculated by using the direction information of key points, so that the feature point has rotation invariance. The ORB feature is to combine the detection method of FAST feature points with BRIEF feature descriptors and make improvements and optimization on the original basis.

For example, the first keyframe has a first feature point that is the center point of the left wheel of the front stationary vehicle, and the first feature point has a first coordinate (x) in the first keyframe ₁ ，y ₁ ) The first target image frame has a second feature point, the second feature point is a center point of a left vehicle of the front stationary vehicle, and the second feature point has a second coordinate (x) in the first target image frame ₂ ，y ₂ ) If the first feature point and the second feature point have similar descriptors, the first feature point and the second feature point can be used as matched feature points to form a pair of feature points, the first feature point and the second feature pointThe first coordinate of the first characteristic point and the second coordinate of the second characteristic point represent the moving track of the center point of the left vehicle of the front vehicle relative to the camera, and the vehicle where the first characteristic point and the second characteristic point are located is a front static vehicle, so that the first coordinate of the first characteristic point and the second coordinate of the second characteristic point represent the moving track of the vehicle where the camera is located.

According to the distance between the first key frame and the matched feature points in the first target image frame, the actual parallax between the first target image frame and the first key frame can be calculated, and the actual parallax represents the position difference of the matched feature points of the first key frame and the first target image frame. The distance between the first key frame and the matched feature point in the first target image frame may be a euclidean distance, for example, the euclidean distance between the first feature point and the second feature point may be expressed as:

the distance between the matched feature points can be calculated, and the average value of the distances between the matched feature points can be used as the actual disparity c of the first target image frame and the first key frame.

In the embodiment of the application, the actual disparity between the first keyframe and the first target image frame represents the position difference between the matched feature points in the first keyframe and the target image frame, and if the actual disparity between the first keyframe and the first target image frame is larger, which indicates that the position difference between the matched feature points in the first keyframe and the first target image frame is larger, the first target image frame can be used as the second keyframe, and the first keyframe and the second keyframe store different positions of the matched feature points, so that the accuracy of the pose is ensured.

The first preset parallax may be a preset value for evaluating the size of the position difference between the first key frame and the matched feature point in the first target image frame, and the first preset parallax may be a uniform value.

When the video stream is captured by a camera on the vehicle, the moving direction of the vehicle is only the movement in the plane, and the reason for generating the parallax in the first key frame and the first target image frame may include: translation of the vehicle in the horizontal direction and rotation in the horizontal plane. For convenience of description, a three-dimensional rectangular coordinate system may be established for the vehicle, the first direction, the second direction, and the third direction are three coordinate axis directions of the three-dimensional rectangular coordinate system, and the third direction is a vertical direction, so that the movement of the vehicle may include three degrees of freedom: the translation distance Δ x of the vehicle in the first direction, the translation distance Δ y of the vehicle in the second direction, and the rotation angle Δ z of the vehicle around the third direction, where Δ x, Δ y, and Δ z may be determined according to the poses of the vehicle in the first keyframe and the first target image frame, and the poses of the vehicle in the first keyframe and the first target image frame are determined according to the relative positions of the matched feature points.

Therefore, in the embodiment of the application, a first preset parallax matched with the first target image frame can be determined according to the pose of the vehicle in the first key frame and the first target image frame, and when the actual parallax between the first key frame and the first target image frame is greater than or equal to the first preset parallax corresponding to the first target image frame, the first target image frame can be determined to be a second key frame. Specifically, the first preset parallax c' is represented by the following formula:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

wherein, w ₁ 、w ₂ And w ₃ Are respectively the weight of Deltax, delay and Deltaz, and are a number from 0 to 1, w ₁ 、w ₂ And w ₃ The value of (c) can be set according to actual conditions. For example w ₁ 、w ₂ And w ₃ May be 0.5.Δ x, Δ y, and Δ z may be determined according to the distance between the matched feature points.

S103, if the number of the matched feature points in the first key frame and the first target image frame is smaller than or equal to a first preset value, determining that the first target image frame is a second key frame.

In the embodiment of the present application, a next key frame may be determined based on a previous key frame according to the number of matched feature points. Specifically, when the number of the matched feature points in the first keyframe and the first target image frame is less than or equal to a first preset value, it is indicated that the number of the newly extracted feature points is large, and the first target image frame needs to be retained, the first target image frame may be determined to be the second keyframe, and when the number of the matched feature points in the first keyframe and the first target image frame is greater than the first preset value, it is indicated that the feature points of the first keyframe and the first target image frame are close to each other, the first target image frame may not be retained, and is not taken as the keyframe. The first preset value can be determined according to actual conditions and can be represented by k.

And S104, if the frame number difference between the first key frame and the first target image frame is greater than or equal to a second preset value, determining that the first target image frame is a second key frame.

In this embodiment, the next key frame may be determined based on the previous key frame according to the frame number difference from the previous key frame. Specifically, when the frame number difference between the first key frame and the first target image frame is greater than or equal to a second preset value, the first target image frame is determined to be a second key frame, where the second preset value may be determined according to an actual situation and may be denoted by T.

In addition, a third key frame may be determined according to the first key frame, or a third key frame may be determined according to the second key frame, where the third key frame is located after the second key frame, and specifically, an actual disparity between the third key frame and the first key frame is greater than an actual disparity between the second key frame and the first key frame. In a specific implementation, a second target image frame located after the second key frame in the video stream may be obtained first.

In this embodiment of the application, when the actual disparity between the second target image frame and the second key frame is greater than or equal to the second preset disparity, it may be determined that the second target image frame is a third key frame, or when it is determined that the actual disparity between the second target image frame and the first key frame is greater than or equal to the third preset disparity, it is determined that the second target image frame is a third key frame, and the third preset disparity is greater than the first preset disparity. The actual parallax between the second target image frame and the second key frame can be calculated according to the distance between the second key frame and the matched feature points in the second target image frame. The second preset parallax may be determined in a manner that refers to the first preset parallax.

In the embodiment of the present application, a next key frame may be determined based on a previous key frame according to the number of matched feature points. Specifically, when the number of the feature points matched in the second keyframe and the second target image frame is less than or equal to a third preset value, it indicates that the number of the newly extracted feature points is large, and the second target image frame needs to be retained, the second target image frame may be determined to be the third keyframe, and when the number of the feature points matched in the second keyframe and the second target image frame is greater than the third preset value, it indicates that the feature points of the second keyframe and the second target image frame are close to each other, and the second target image frame may not be retained and is not taken as the keyframe. The third preset value may be determined according to actual conditions, and the third preset value may be the same as the first preset value and may be denoted by k.

In the embodiment of the present application, a next key frame may be determined based on a previous key frame according to a frame number difference from the previous key frame. Specifically, when the frame number difference between the second keyframe and the second target image frame is greater than or equal to a fourth preset value, the second target image frame is determined as a third keyframe, where the fourth preset value may be determined according to an actual situation, and the fourth preset value may be the same as the second preset value and may be represented by T.

That is, for the second target image frame, at least one of the following conditions is satisfied, and it may be regarded as the third key frame: 1) The actual parallax between the second target image frame and the second key frame is larger than or equal to a second preset parallax; 2) The actual parallax between the second target image frame and the first key frame is larger than or equal to a third preset parallax; 3) The number of the matched feature points in the second key frame and the second target image frame is less than or equal to a third preset value; 4) The difference in frame number between the second keyframe and the second target image frame is greater than or equal to a fourth preset value. The determination of whether the second target image frame satisfies the four conditions may be performed in any order, which is not limited herein.

In this embodiment of the present application, the third key frame determined above may be used as a new first key frame to determine a new second key frame, and a repeated description is not repeated here.

The embodiment of the application provides a method for determining a key frame in visual positioning, a video stream is provided with a plurality of image frames which are arranged in a time sequence, a first key frame is arranged in the video stream, a first target image frame positioned behind the first key frame in the video stream can be obtained, the actual parallax between the first target image frame and the first key frame is calculated according to the distance between the first key frame and matched feature points in the first target image frame, if the actual parallax between the first target image frame and the first key frame is larger than or equal to a preset parallax, the number of the matched feature points is smaller than or equal to a first preset value or the difference between the number of frame numbers is larger than or equal to a second preset value, the difference between the first target image frame and the first key frame is larger, and the redundant information is less, the first target image frame can be determined to be a second key frame, the determined key frames have lower redundant information while ensuring accurate pose, the amount of stored data is reduced, and therefore computing resources are saved.

Based on the foregoing method for determining a key frame in visual positioning, an embodiment of the present application further provides a device for determining a key frame in visual positioning, and referring to fig. 2, a block diagram of a device for determining a key frame in visual positioning provided in an embodiment of the present application is shown, where a video stream has a plurality of image frames arranged in a time sequence, the video stream has a first key frame, and the device includes:

a frame determination unit 110 for determining a first target image frame located after the first key frame in the video stream;

a key frame determining unit 120, configured to determine that the first target image frame is a second key frame if at least one of the following conditions is met:

the key frame determination unit is further configured to:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

wherein Δ x is a translational distance of the vehicle along a first direction, Δ y is a translational distance of the vehicle along a second direction, Δ z is a rotation angle of the vehicle around a third direction, and the first direction and the second direction are differentThe direction and the third direction are three coordinate axis directions of a three-dimensional rectangular coordinate system, the third direction is a vertical direction, the delta x, the delta y and the delta z are determined according to the poses of the vehicle in the first key frame and the first target image frame, and the poses of the vehicle in the first key frame and the first target image frame are determined according to the relative positions of the matched feature points; said w ₁ The above-mentioned w ₂ And said w ₃ The weights of Δ x, Δ y, and Δ z are numbers of 0 to 1, respectively.

Optionally, the first key frame is an image frame of the first frame.

The embodiment of the application provides a device for determining key frames in visual positioning, a video stream has a plurality of image frames arranged in a time sequence, a first key frame is arranged in the video stream, a first target image frame positioned behind the first key frame in the video stream can be obtained, the actual disparity between the first target image frame and the first key frame is calculated according to the distance between the first key frame and matched feature points in the first target image frame, if the actual disparity between the first target image frame and the first key frame is greater than or equal to a preset disparity, the number of the matched feature points is less than or equal to a first preset value or the difference between the number of frame numbers is greater than or equal to a second preset value, it is indicated that the difference between the first target image frame and the first key frame is greater, and the redundant information is less, the first target image frame can be determined to be a second key frame, and each determined key frame has lower redundant information while ensuring accurate pose, the amount of stored data is reduced, and thus computing resources are saved.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described embodiments of the apparatus and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only a preferred embodiment of the present application and is not intended to limit the scope of the present application. It should be noted that, for a person skilled in the art, several modifications and refinements can be made without departing from the application, and these modifications and refinements should also be regarded as the protection scope of the application.

Claims

1. A method for determining keyframe in visual positioning, wherein a video stream having a plurality of image frames arranged in a time sequence, the video stream having a first keyframe, the method comprising:

2. The method of claim 1, further comprising:

the number of the matched feature points in the second key frame and the second target image frame is less than or equal to a third preset value, the frame number difference between the second key frame and the second target image frame is greater than or equal to a fourth preset value, the actual parallax between the second target image frame and the second key frame is greater than or equal to a second preset parallax, and the actual parallax between the second target image frame and the first key frame is greater than or equal to a third preset parallax;

3. The method of claim 1, wherein the feature points in the first keyframe and the first target image frame are detected using an ORB algorithm.

4. The method according to any one of claims 1 to 3, wherein the video stream is captured by a camera on a vehicle, the preset disparity corresponds to the first target image frame, and the preset disparity c' is expressed by the following formula:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

wherein Δ x is a translation distance of the vehicle along a first direction, Δ y is a translation distance of the vehicle along a second direction, Δ z is a rotation angle of the vehicle around a third direction, the first direction, the second direction and the third direction are three coordinate axis directions of a three-dimensional rectangular coordinate system, the third direction is a vertical direction, the Δ x, the Δ y and the Δ z are determined according to poses of the vehicle in the first keyframe and the first target image frame, and the poses of the vehicle in the first keyframe and the first target image frame are determined according to relative positions of the matched feature points; said w ₁ The above-mentioned w ₂ And said w ₃ The weights of Δ x, Δ y, and Δ z are numbers of 0 to 1, respectively.

5. The method of any of claims 1-3, wherein the first keyframe is an image frame of a first frame.

6. An apparatus for determining keyframe in visual positioning, wherein a video stream having a plurality of temporally sequenced image frames with a first keyframe therein, the apparatus comprising:

a frame determination unit for determining a first target image frame located after the first key frame in the video stream;

a key frame determining unit, configured to determine that the first target image frame is a second key frame if the first target image frame satisfies at least one of the following conditions:

7. The apparatus of claim 6,

the frame determination unit is further configured to: acquiring a second target image frame positioned after the second key frame in the video stream;

the key frame determination unit is further configured to: if the second target image frame meets at least one of the following conditions, determining that the second target image frame is a third key frame:

8. The apparatus of claim 6, wherein the feature points in the first keyframe and the first target image frame are detected using an ORB algorithm.

9. The apparatus according to any one of claims 6-8, wherein the video stream is captured by a camera on a vehicle, the preset disparity corresponds to the first target image frame, and the preset disparity c' is expressed by the following formula:

c′＝w ₁ |Δx|+w ₂ |Δy|+w ₃ |Δz|，

10. The apparatus of any of claims 6-8, wherein the first key frame is an image frame of a first frame.