CN116883966A

CN116883966A - Vehicle-mounted camera pose calculation method and device and computer readable storage medium

Info

Publication number: CN116883966A
Application number: CN202310904859.4A
Authority: CN
Inventors: 童文超; 罗小平; 曾峰; 张轩宇
Original assignee: Shenzhen Longhorn Automotive Electronic Equipment Co Ltd
Current assignee: Shenzhen Longhorn Automotive Electronic Equipment Co Ltd
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-13

Abstract

The embodiment of the application provides a vehicle-mounted camera pose calculation method, a vehicle-mounted camera pose calculation device and a computer-readable storage medium, wherein the method comprises the steps of obtaining a current image frame and obtaining a global descriptor of the current image frame; screening in a pre-stored scene database to select a reference scene image matched with the global descriptor from the scene database; up-sampling the current image frame by adopting each feature extraction layer of the feature extraction network model to correspondingly obtain features to be fused, adding and fusing the features to be fused, and processing by adopting a nonlinear activation function to obtain an actual dense high-dimensional feature image of the current image frame; determining an interested region in a reference scene image, and calculating matching feature points in the actual dense high-dimensional feature map, wherein the matching feature points correspond to key feature points in the interested region one by one; and calculating and obtaining a current rotation and translation matrix of the vehicle-mounted camera based on a local odometer map construction principle and a P3P pose estimation algorithm. The embodiment can effectively improve the calculation precision.

Description

Vehicle-mounted camera pose calculation method and device and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of vehicle-mounted camera image processing, in particular to a vehicle-mounted camera pose calculating method, a vehicle-mounted camera pose calculating device and a computer readable storage medium.

Background

At present, the realization of visual positioning (also called pose estimation or pose calculation) of a vehicle-mounted camera based on image feature matching is a common processing method, wherein the image feature matching is a key step, so that the problem of data association in visual positioning can be effectively solved, and the accuracy of the image feature matching determines the effect of visual positioning to a great extent.

The existing image feature matching method is mainly a Scale-Invariant Feature Trans-form (Scale-invariant feature transform) algorithm or an ORB (Oriented Fast and Rotated Brief, accelerating feature point extraction and description) algorithm, and the main steps comprise feature extraction, feature description and feature matching. However, the inventors have found that the above method for matching image features has many drawbacks when implemented, for example: the reconstruction result obtained by the method based on the feature points is sparse, and the feature points with enough quantity are difficult to detect when the method is applied to a real environment; and secondly, when the method is applied to the gesture calculation of the vehicle-mounted camera, the gesture calculation accuracy is relatively poor, and the positioning of the obstacle by the image vision is not facilitated.

Disclosure of Invention

The technical problem to be solved by the embodiment of the application is to provide a vehicle-mounted camera attitude calculation method which can effectively improve calculation accuracy.

The technical problem to be further solved by the embodiment of the application is to provide the vehicle-mounted camera attitude calculating device which can effectively improve the calculating precision.

The technical problem to be further solved by the embodiments of the present application is to provide a computer readable storage medium for storing a computer program capable of effectively improving the pose calculation precision of a vehicle-mounted camera.

In order to solve the technical problems, the embodiment of the application provides the following technical scheme: a vehicle-mounted camera attitude calculation method comprises the following steps:

extracting a current image frame from an original image of the surrounding environment of the motor vehicle, which is acquired and transmitted by a vehicle-mounted camera in real time, and sequentially processing the current image frame through a pre-stored feature extraction network model and a NetVLAD network model to obtain a global descriptor of the current image frame, wherein the feature extraction network model is formed by sequentially connecting a plurality of feature extraction layers;

screening a reference scene image matched with the global descriptor from a pre-stored scene database, wherein the scene database is scene map data of a pre-constructed motor vehicle in an actual driving environment;

up-sampling the current image frame by adopting each feature extraction layer of the feature extraction network model to correspondingly obtain features to be fused, adding and fusing the features to be fused, and then processing by adopting a nonlinear activation function to obtain an actual dense high-dimensional feature image of the current image frame;

determining an interested region in the reference scene image, and calculating matching feature points in the actual dense high-dimensional feature map, wherein the matching feature points are in one-to-one correspondence with each key feature point in the interested region, the actual dot product of the same key feature point and the matching feature points matched correspondingly is larger than the actual dot product of each other feature point of the actual dense high-dimensional feature map, and the actual distance between the same key feature point and the matching feature points matched correspondingly is smaller than the actual distance between the same key feature point and each other feature point of the actual dense high-dimensional feature map; and

and calculating and obtaining the current rotation and translation matrix of the vehicle-mounted camera by adopting key feature points and matching feature points which are mutually matched based on a local odometer map construction principle and a P3P pose estimation algorithm.

Further, the screening the reference scene image matched with the global descriptor from the pre-stored scene database specifically includes:

calculating the actual hamming distance between the global descriptor and the VLAD vector of each key frame in the scene database, and based on a decision tree algorithm model, key frames meeting a preset preliminary screening condition from the scene database, wherein the preset preliminary screening condition is that the descending order sequence number of the actual hamming distance is smaller than or equal to a preset sequence number; and

and calculating the actual similarity between each adjacent image frame of the key frames meeting the preset preliminary screening conditions and the current image frame, and determining the key frame with the maximum actual similarity and meeting the preset preliminary screening conditions as the reference scene image.

Further, the global descriptor is subjected to dimension reduction processing based on a principal component analysis algorithm, and then the actual hamming distance between the global descriptor after the dimension reduction processing and the VLAD vector of each key frame in the scene database is used.

Further, the calculating the current rotation translation matrix of the vehicle-mounted camera by adopting the key feature points and the matching feature points which are matched with each other based on the local odometer map construction principle and the P3P pose estimation algorithm specifically comprises the following steps:

constructing an actual three-dimensional space of a running scene of the motor vehicle based on a local odometer map construction principle;

correspondingly projecting the key feature points and the matching feature points which are correspondingly matched to the actual three-dimensional space to correspondingly generate key three-dimensional points and matching three-dimensional points respectively; and

and calculating to obtain the current rotation translation matrix of the vehicle-mounted camera by adopting three pairs of key three-dimensional points which are not collinear and are matched with each other and the matched three-dimensional points based on a P3P pose estimation algorithm.

Further, the method further comprises:

calculating the actual projection errors of the other key three-dimensional points matched with each other and the rotation translation matrix corresponding to the matched three-dimensional points based on a random sampling algorithm model; and

and calculating the minimum value of the actual projection error based on an error optimization algorithm model, and recalculating and updating the rotation translation matrix according to the key three-dimensional points and the matched three-dimensional points which meet the minimum value and are matched with each other.

Further, the region of interest is determined by determining whether the actual number of corner points of each region in the reference scene image is greater than a predetermined number.

On the other hand, in order to solve the above technical problems, the embodiment of the present application provides the following technical solutions: a vehicle-mounted camera pose calculation device connected with a vehicle-mounted camera of a motor vehicle, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the vehicle-mounted camera pose calculation method according to any one of the above when executing the computer program.

On the other hand, in order to solve the above technical problems, the embodiment of the present application provides the following technical solutions: a computer readable storage medium comprising a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the method for calculating the pose of the vehicle-mounted camera according to any one of the above.

After the technical scheme is adopted, the embodiment of the application has at least the following beneficial effects: the embodiment of the application firstly obtains the global descriptor by processing the current image frame acquired by the vehicle-mounted camera through the feature extraction network model and the NetVLAD network model, and further determines the reference scene image matched with the global descriptor by screening in the scene database, and the scene map data of the pre-constructed motor vehicle in the actual driving environment ensures that the global descriptor is matched with the actual driving environment of the motor vehicle as much as possible; further, each feature extraction layer of the feature extraction network model is adopted to up-sample the current image frame so as to correspondingly obtain features to be fused, each feature to be fused is added and fused and then processed by a nonlinear activation function so as to obtain an actual dense high-dimensional feature map, the actual dense high-dimensional feature map not only fuses features of different scales, but also processes by the nonlinear activation function so as to improve the nonlinear description of the high-dimensional features on key points, and the generalization capability of the actual dense high-dimensional feature map is improved; further, matching feature points in the actual dense high-dimensional feature map, which are in one-to-one correspondence with each key feature point in the region of interest, are calculated, screening and judging are carried out in a dot product matching and actual distance mode, so that the matching feature points matched with the key feature points are screened out based on the principle of maximum correlation by determining the cross correlation between the key feature points and each feature point in the actual dense high-dimensional feature map; and finally, the current rotation translation matrix of the vehicle-mounted camera can be obtained by adopting the key feature points and the matching feature points which are matched with each other based on the local odometer map construction principle and the P3P pose estimation algorithm, so that the pose calculation of the vehicle-mounted camera is realized, and the calculation accuracy is higher.

Drawings

Fig. 1 is a flowchart illustrating steps of an alternative embodiment of the method for calculating the pose of a vehicle-mounted camera according to the present application.

Fig. 2 is a specific flowchart of step S2 of an alternative embodiment of the method for calculating the pose of the vehicle-mounted camera according to the present application.

Fig. 3 is a flowchart showing an alternative embodiment of the method for calculating the pose of the vehicle-mounted camera according to the present application in step S5.

Fig. 4 is a schematic block diagram of an alternative embodiment of the vehicle-mounted camera pose calculation device of the present application.

FIG. 5 is a functional block diagram of an alternative embodiment of the vehicle camera pose calculation device of the present application.

Detailed Description

The application will be described in further detail with reference to the drawings and the specific examples. It should be understood that the following exemplary embodiments and descriptions are only for the purpose of illustrating the application and are not to be construed as limiting the application, and that the embodiments and features of the embodiments of the application may be combined with one another without conflict.

As shown in fig. 1, an alternative embodiment of the present application provides a vehicle-mounted camera pose calculating method, which includes the following steps:

s1: extracting a current image frame from an original image of the surrounding environment of the motor vehicle, which is acquired and transmitted by the vehicle-mounted camera 1 in real time, and sequentially processing the current image frame through a pre-stored feature extraction network model and a NetVLAD network model to obtain a global descriptor of the current image frame, wherein the feature extraction network model is formed by sequentially connecting a plurality of feature extraction layers;

s2: screening a reference scene image matched with the global descriptor from a pre-stored scene database, wherein the scene database is scene map data of a pre-constructed motor vehicle in an actual driving environment;

s3: up-sampling the current image frame by adopting each feature extraction layer of the feature extraction network model to correspondingly obtain features to be fused, adding and fusing the features to be fused, and then processing by adopting a nonlinear activation function to obtain an actual dense high-dimensional feature image of the current image frame;

s4: determining an interested region in the reference scene image, and calculating matching feature points in the actual dense high-dimensional feature map, wherein the matching feature points are in one-to-one correspondence with each key feature point in the interested region, the actual dot product of the same key feature point and the matching feature points matched correspondingly is larger than the actual dot product of each other feature point of the actual dense high-dimensional feature map, and the actual distance between the same key feature point and the matching feature points matched correspondingly is smaller than the actual distance between the same key feature point and each other feature point of the actual dense high-dimensional feature map; and

s5: and calculating to obtain the current rotation translation matrix of the vehicle-mounted camera 1 by adopting key feature points and matching feature points which are mutually matched based on a local odometer map construction principle and a P3P pose estimation algorithm.

The embodiment of the application firstly obtains the global descriptor by processing the current image frame acquired by the vehicle-mounted camera 1 through a feature extraction network model and a NetVLAD network model, and further determines the reference scene image matched with the global descriptor by screening in a scene database, and the scene map data of the pre-constructed motor vehicle in the actual driving environment ensures that the global descriptor is matched with the actual driving environment of the motor vehicle as much as possible; further, each feature extraction layer of the feature extraction network model is adopted to up-sample the current image frame so as to correspondingly obtain features to be fused, each feature to be fused is added and fused and then processed by a nonlinear activation function so as to obtain an actual dense high-dimensional feature map, the actual dense high-dimensional feature map not only fuses features of different scales, but also processes by the nonlinear activation function so as to improve the nonlinear description of the high-dimensional features on key points, and the generalization capability of the actual dense high-dimensional feature map is improved; further, matching feature points in the actual dense high-dimensional feature map, which are in one-to-one correspondence with each key feature point in the region of interest, are calculated, screening and judging are carried out in a dot product matching and actual distance mode, so that the matching feature points matched with the key feature points are screened out based on the principle of maximum correlation by determining the cross correlation between the key feature points and each feature point in the actual dense high-dimensional feature map; and finally, the current rotation translation matrix of the vehicle-mounted camera 1 can be obtained by adopting the key feature points and the matching feature points which are matched with each other based on the local odometer map construction principle and the P3P pose estimation algorithm, so that the pose calculation of the vehicle-mounted camera 1 is realized, and the calculation accuracy is higher.

In implementation, the actual dense high-dimensional feature map is set to be dhc_q, and the region of interest in the reference scene image is set to be shc_r, and then the dot product of the two can be expressed as:

HC _cross =shc_r, dhc_q (formula 1)

Wherein HC is _cross Representing the sense of a reference scene imageThe dot product of the high-dimensional features of the region of interest and the actual dense high-dimensional feature map.

In an alternative embodiment of the present application, as shown in fig. 2, the step S2 specifically includes:

s21: calculating the actual hamming distance between the global descriptor and the VLAD vector of each key frame in the scene database, and based on a decision tree algorithm model, key frames meeting a preset preliminary screening condition from the scene database, wherein the preset preliminary screening condition is that the descending order sequence number of the actual hamming distance is smaller than or equal to a preset sequence number; and

s22: and calculating the actual similarity between each adjacent image frame of the key frames meeting the preset preliminary screening conditions and the current image frame, and determining the key frame with the maximum actual similarity and meeting the preset preliminary screening conditions as the reference scene image.

In this embodiment, firstly, the actual hamming distance between the global descriptor and the VLAD vector of each key frame in the scene database is calculated, and the greater the actual hamming distance is, the higher the similarity between the global descriptor and the key frame is, so that the actual hamming distance corresponding to each key frame is adopted to perform descending sorting on the key frames, and each key frame with the front actual hamming distance is screened out; based on the principle that the similarity between the adjacent frames (i.e. the previous frame and the next frame) of the real reference scene image and the current image frame should be higher than that of the false reference scene image, the actual similarity between the adjacent image frames of the key frame meeting the preset preliminary screening condition and the current image frame is judged, so that the real reference scene image is accurately determined in further screening.

In specific implementation, it can be understood that, by performing descending order on each key frame with reference to the corresponding actual hamming distance, a predetermined number 10 can be set, i.e. each key frame with the actual hamming distance number 1-10 is screened out.

In an alternative embodiment of the present application, the global descriptor is first subjected to a dimension reduction process based on a principal component analysis algorithm (PCA, principal Component Analysis), and then the actual hamming distance between the dimension-reduced global descriptor and the VLAD vectors of each key frame in the scene database is based on the dimension reduction process. In this embodiment, the dimension reduction processing is performed on the global descriptor based on the principal component analysis algorithm, so that the calculation amount of the global descriptor can be effectively reduced, and the calculation efficiency is improved.

In specific implementation, the actual hamming distance calculation formula is as follows:

wherein D is _hanming Representing the actual Hamming distance, V _Pca K-dimensional binary vector representing reduced-dimension global descriptors, M _i Represents the k-dimensional binary global descriptor corresponding to the ith key frame of the scene database.

In an alternative embodiment of the present application, as shown in fig. 3, the step S5 specifically includes:

s51: constructing an actual three-dimensional space of a running scene of the motor vehicle based on a local odometer map construction principle;

s52: correspondingly projecting the key feature points and the matching feature points which are correspondingly matched to the actual three-dimensional space to correspondingly generate key three-dimensional points and matching three-dimensional points respectively; and

s53: and calculating to obtain the current rotation translation matrix of the vehicle-mounted camera 1 by adopting three pairs of key three-dimensional points which are not collinear and are matched with each other and the matched three-dimensional points based on a P3P pose estimation algorithm.

In this embodiment, an actual three-dimensional space of a driving scene of a motor vehicle is firstly constructed based on a local odometer map construction principle, then matched key feature points and the matched feature points are projected to the actual three-dimensional space in sequence, and finally a current rotation translation matrix of the vehicle-mounted camera 1 can be rapidly calculated based on a P3P pose estimation algorithm and the key three-dimensional points and the matched three-dimensional points generated by projection, so that pose calculation is realized, and a calculation process is relatively simple.

In an alternative embodiment of the application, the method further comprises:

In this embodiment, the actual projection errors of the other key three-dimensional points and the matching three-dimensional points which are matched with each other and corresponding to the rotation translation matrix are calculated by adopting a random sampling algorithm model, so that the minimum value of the actual projection errors is calculated by an error optimization algorithm model, and the rotation translation matrix is recalculated and updated according to the key three-dimensional points and the matching three-dimensional points which meet the minimum value and are matched with each other, thereby realizing optimization of the rotation translation matrix and improving calculation precision.

In an alternative embodiment of the application, the region of interest is determined by determining whether the actual number of corner points of the respective region in the reference scene image is greater than a predetermined number. The more the number of angular points in the region is, the more the features are, and the image matching is facilitated, so that the region of interest is determined by judging the number of angular points of each region in the reference scene image, and the judgment principle is simple.

On the other hand, as shown in fig. 4, an embodiment of the present application provides a vehicle-mounted camera pose calculation device 3 connected to a vehicle-mounted camera 1 of a motor vehicle, including a processor 30, a memory 32, and a computer program stored in the memory and configured to be executed by the processor 30, wherein the processor 30 implements the vehicle-mounted camera pose calculation method according to the above embodiment when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 32 and executed by the processor 30 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the in-vehicle camera pose calculation device 3. For example, the computer program may be divided into functional modules in the in-vehicle camera pose calculation apparatus 3 as illustrated in fig. 5, wherein the image acquisition and descriptor extraction module 41, the scene graph screening module 42, the dense feature calculation module 43, the feature point matching module 44, and the pose calculation module 45 respectively perform the above steps S1 to S5 correspondingly.

The vehicle-mounted camera gesture calculating device 3 can be a desktop computer, a notebook computer, a palm computer, a cloud server and other calculating equipment. The onboard camera pose computing device 3 may include, but is not limited to, a processor 30, a memory 32. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the in-vehicle camera pose calculation apparatus 3 and does not constitute a limitation of the in-vehicle camera pose calculation apparatus 3, and may include more or less components than illustrated, or combine certain components, or different components, e.g. the in-vehicle camera pose calculation apparatus 3 may further include input and output devices, network access devices, buses, etc.

The processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 30 is a control center of the vehicle-mounted camera pose calculation apparatus 3, and connects various parts of the entire vehicle-mounted camera pose calculation apparatus 3 using various interfaces and lines.

The memory 32 may be used to store the computer program and/or module, and the processor 30 may implement various functions of the in-vehicle camera pose computing device 3 by running or executing the computer program and/or module stored in the memory 32 and invoking data stored in the memory 32. The memory 32 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a pattern recognition function, a pattern layering function, etc.), and the like; the storage data area may store data created according to the use of the in-vehicle camera pose calculation device 3 (such as graphic data, etc.), and the like. In addition, the memory 32 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The functionality of the embodiments of the present application, if implemented in the form of software functional modules or units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, the implementation of all or part of the flow of the method of the foregoing embodiment according to the embodiments of the present application may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the foregoing method embodiments when executed by the processor 30. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

In another aspect, an embodiment of the present application provides a computer readable storage medium, including a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the method for calculating the pose of the vehicle-mounted camera according to the above embodiment.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are all within the scope of the present application.

Claims

1. The vehicle-mounted camera attitude calculation method is characterized by comprising the following steps of:

2. The method for calculating the pose of the vehicle-mounted camera according to claim 1, wherein the step of screening the reference scene image matched with the global descriptor from the pre-stored scene database specifically comprises:

3. The method for calculating the pose of the vehicle-mounted camera according to claim 2, wherein the global descriptor is subjected to dimension reduction processing based on a principal component analysis algorithm, and the actual hamming distance between the dimension-reduced global descriptor and the VLAD vector of each key frame in the scene database is then used.

4. The method for calculating the pose of the vehicle-mounted camera according to claim 1, wherein the calculating the current rotational translation matrix of the vehicle-mounted camera based on the local odometer map construction principle and the P3P pose estimation algorithm by using the key feature points and the matching feature points which are matched with each other specifically comprises:

5. The vehicle-mounted camera pose calculation method according to claim 4, wherein the method further comprises:

6. The vehicle-mounted camera pose calculation method according to claim 1, wherein the region of interest is determined by judging whether an actual number of corner points of each region in the reference scene image is greater than a predetermined number.

7. A vehicle-mounted camera pose calculation device connected to a vehicle-mounted camera of a motor vehicle, characterized in that it comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the vehicle-mounted camera pose calculation method according to any of claims 1 to 6 when executing the computer program.

8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to perform the vehicle-mounted camera pose calculation method according to any one of claims 1 to 6.