CN114066987B - Camera pose estimation method, device, equipment and storage medium - Google Patents

Camera pose estimation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114066987B
CN114066987B CN202210029300.7A CN202210029300A CN114066987B CN 114066987 B CN114066987 B CN 114066987B CN 202210029300 A CN202210029300 A CN 202210029300A CN 114066987 B CN114066987 B CN 114066987B
Authority
CN
China
Prior art keywords
image block
matching
images
frames
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210029300.7A
Other languages
Chinese (zh)
Other versions
CN114066987A (en
Inventor
程飞洋
刘国清
杨广
王启程
郑伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjia Innovation Technology Co.,Ltd.
Original Assignee
Shenzhen Minieye Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Minieye Innovation Technology Co Ltd filed Critical Shenzhen Minieye Innovation Technology Co Ltd
Priority to CN202210029300.7A priority Critical patent/CN114066987B/en
Publication of CN114066987A publication Critical patent/CN114066987A/en
Application granted granted Critical
Publication of CN114066987B publication Critical patent/CN114066987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Abstract

The invention discloses a camera position and attitude estimation method, a device, equipment and a storage medium, wherein two adjacent frames of images are obtained, and characteristic points in the two frames of images are extracted, so that the two frames of images are subjected to image block division, and an image block sequence corresponding to each frame of image is generated and extracted; and inputting the image block sequence into the trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result. Compared with the prior art, the pose solving is carried out by training a neural network model, the camera pose solving problems are unified into a joint optimization problem, the complex calculation flow in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera pose is improved, and the method has strong practicability.

Description

Camera pose estimation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for estimating a camera pose.
Background
The camera pose estimation refers to the calculation of motion parameters of a camera, and comprises a vision method relying only on sequence image data, a fusion method relying on laser radar point cloud, a direct measurement method relying on a high-precision pose measurement sensor and the like. The traditional method has the problems that the feature point is difficult to define, the feature point detection repeatability cannot be guaranteed, strong matching features are needed for matching the feature points, more mismatching is easily caused, and the like.
In the prior art, some progress is made in the aspects of feature point detection, feature point feature learning, feature point screening pose optimization solution and the like by methods based on deep learning, but the methods still follow the traditional camera pose estimation process, only solve the precision improvement problem of a certain step or steps, and fail to solve the camera pose solution problem as an integral joint optimization problem.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the camera pose estimation method, the device, the equipment and the storage medium are provided, pose solution is carried out by training a neural network model, the camera pose solution problems are unified into a joint optimization problem, a complex calculation flow in a traditional method is avoided, the calculation amount of the model is reduced, the camera pose acquisition efficiency is improved, and the method has strong practicability.
In order to solve the above technical problem, the present invention provides a camera pose estimation method, including:
acquiring two adjacent frames of images, and performing feature point detection on the two frames of images to generate feature point sets corresponding to the two frames of images;
respectively carrying out image block division on the two frames of images by taking each feature point in the feature point set as a center to generate image block sequences corresponding to the two frames of images;
inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;
the neural network model acquires a feature vector of each image block in the image block sequence through a neural network encoder, acquires a position code of each image block according to a preset position coding rule, and acquires a coding feature of each image block by combining the feature vector and the position code;
inputting the coding features into a global coding module to obtain global coding features of each image block;
and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.
Further, the optimization solution is performed on the basis matrix according to the matching point pairs and the matching probability, and specifically includes:
constructing a linear equation according to the matching point pairs to generate a linear equation set;
and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.
Further, the coding features are input into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:
acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;
and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.
Further, the present invention provides a camera pose estimation device, including: the system comprises an extraction module, a camera pose acquisition module and a neural network training module;
the extraction module is used for acquiring two adjacent frames of images, detecting feature points of the two frames of images, generating feature point sets corresponding to the two frames of images, and respectively dividing image blocks of the two frames of images by taking each feature point in the feature point sets as a center to generate image block sequences corresponding to the two frames of images;
the camera pose acquisition module is used for inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;
the neural network training module is used for pre-training the neural network model, acquiring a feature vector of each image block in the image block sequence through a neural network encoder, acquiring a position code of each image block according to a preset position coding rule, and acquiring a coding feature of each image block by combining the feature vector and the position code;
inputting the coding features into a global coding module to obtain global coding features of each image block;
and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.
Further, the neural network training module is configured to perform optimization solution on a basis matrix according to the matching point pairs and the matching probability, specifically:
constructing a linear equation according to the matching point pairs to generate a linear equation set;
and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.
Further, the neural network training module is used for inputting the coding features into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows: acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;
and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.
Further, the present invention also provides a terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method as described in any one of the above when executing the computer program.
Further, the present invention also provides a computer-readable storage medium including a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the camera pose estimation method according to any one of the above.
Compared with the prior art, the camera pose estimation method, the camera pose estimation device, the camera pose estimation equipment and the storage medium have the following beneficial effects:
the image blocks of the two acquired frames of images are divided, the image block sequence is used as the input of the pre-trained neural network model, the problem that the calculated amount of the whole image is too large when the whole image is coded is avoided, the lightweight of the model can be ensured, the pose solving is directly carried out on the basis of the estimated basic matrix output by the trained neural network model, the camera pose solving problem is unified into a joint optimization problem, and the complex calculating process in the traditional method is avoided. And the neural network model is pre-trained, based on the integrated neural network encoder and the matching classifier, a large amount of image data can be used for training the neural network model, the convergence of model training is ensured, an optimized basic matrix is output, and the accuracy of subsequent camera pose solving is improved. Compared with the prior art, the method has the advantages that the position and pose solution is carried out by training a neural network model, the camera position and pose solution problems are unified into a joint optimization problem, the complex calculation process in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera position and pose is improved, and the method has strong practicability.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a camera pose estimation method provided in the present invention;
FIG. 2 is a schematic structural diagram of a neural network model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an embodiment of a camera pose estimation apparatus provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, fig. 1 is a schematic flowchart of an embodiment of a camera pose estimation method provided by the present invention, and as shown in fig. 1, the method includes steps 101 to 102, specifically as follows:
step 101: acquiring two adjacent frames of images, and extracting feature points in the two frames of images to divide the two frames of images into image blocks, and generating and extracting an image block sequence corresponding to each frame of image.
In this embodiment, two adjacent frames of images captured by a camera are obtained, and feature point detection is performed on the two frames of images. Specifically, any existing feature point detection method is adopted to detect feature points of two adjacent acquired images, wherein the existing feature point detection method includes, but is not limited to, a traditional FAST, SIFT feature point detection method and a feature point detection method based on deep learning; similarly, in this embodiment, the above-mentioned multiple feature point detection method may also be used to perform multiple detections on the same frame image, so as to obtain more feature points, so as to improve the probability that the detected feature points have matching feature points in another frame image.
In this embodiment, after feature point detection is performed, feature point sets corresponding to the two frames of images are generated; specifically, the two frame images include a first frame image and a second frame image, after feature points of the two frame images are detected by the feature point detection method, feature points in the detected first frame image are obtained, and feature points in the first frame image are collected and recorded as a feature point set P of the first frame image, where P = (P1, P2,.., pN 1), which is common to all
Figure DEST_PATH_IMAGE001
Similarly, the feature points in the acquired second frame image are collected and recorded as a feature point set Q = (Q1, Q2.., qN 2) of the second frame image, and the feature points are collectively recorded as feature points in the second frame image
Figure 768536DEST_PATH_IMAGE002
And (4) a characteristic point.
In this embodiment, with each feature point in the feature point set as a center, image block division is performed on the two frames of images, so as to generate image block sequences corresponding to the two frames of images. Specifically, taking each feature point in the feature point set P of the first frame image and the feature point set Q of the second frame image as a center, intercepting image blocks with the same size in the corresponding frame images, as an example in this embodiment, intercepting image blocks of 16 × 16 with the same size in the corresponding frame images, and performing sorting and aggregation on the image blocks intercepted from the corresponding frame images, an image block sequence corresponding to the first frame image and an image block sequence corresponding to the second frame image can be obtained, where the image block sequences corresponding to the first frame image and the second frame image include all the image blocks partitioned from the frame images.
In this embodiment, all image blocks in the image block sequence corresponding to the acquired first frame image and second frame image are used as input data of a neural network model for subsequent pre-training.
Step 102: and inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result.
In this embodiment, the image block sequences corresponding to the first frame image and the second frame image acquired in step 101 are input into a trained neural network model, and the neural network model can output an estimated fundamental matrix, where the fundamental matrix includes a camera pose, and the camera pose includes a camera rotation matrix and a displacement between the two acquired frame images; and decomposing the output basic matrix to obtain a camera pose estimation result.
In the embodiment, the image block sequence is used as the input of the neural network model, so that the subsequent encoding of the whole image is avoided, the calculated amount of the model is greatly reduced, the embedded type neural network model can operate on an embedded platform, and the embedded type neural network model has high practicability.
In this embodiment, for the decomposition of the basic matrix, a relatively mature method may be directly adopted, for example, the pose parameter of the camera may be solved by calculating the essential matrix through the basic matrix and then decomposing the essential matrix through the decomposesesentatimat function of the OpenCV.
As a preferred scheme in this embodiment, referring to fig. 2, fig. 2 is a schematic structural diagram of a neural network model, a neural network encoder, a transform global encoding module, a matching classifier, and a basis matrix calculation module are integrated in a pre-trained neural network model, so that feature point matching is performed on two frames of images by using the trained neural network model, the feature point matching probability is calculated, and the basis matrix is solved, so that direct calculation of the camera pose between two frames of images can be realized end-to-end, and a complex calculation flow in a conventional method is avoided.
In this embodiment, a neural network encoder, a transform global encoding module, and a matching classifier designed in a neural network model need to be pre-trained. Firstly, acquiring a feature vector of each image block in the image block sequence through a neural network encoder. Specifically, according to the method in step 101, each image block in the image block sequence corresponding to two frames of images is extracted as an input of the neural network model, each image block is encoded into a d-dimensional feature vector through the same neural network encoder, and then the two frames of images generate N feature vectors in total, wherein,
Figure 685676DEST_PATH_IMAGE003
and d is a preset dimension number, such as generating a 128-dimensional feature vector.
In this embodiment, the position code of each image block is further obtained according to a preset position coding rule, and the coding feature of each image block is obtained by combining the feature vector and the position code. Specifically, the preset position coding rule is that corresponding N position codes are generated by aiming at [0, 1., N-1] one-dimensional coordinates; wherein, the definition of the position code is as follows:
Figure 527730DEST_PATH_IMAGE004
wherein d is an input feature dimension,
Figure 922940DEST_PATH_IMAGE005
for the position coding of the t-th input, i is the index of the feature dimension.
And adding the N position codes with the feature vectors corresponding to the input N image blocks respectively to serve as the coding features of each image block, so that each coding feature contains the index information of the image block and simultaneously constitutes the input features of a subsequent transform global coding module.
In the embodiment, after the coding features of all image blocks in two image block sequences are input into the Transformer global coding module, the Transformer global coding module completes the global information transmission between the image block sequences corresponding to two frames of images through a multiple attention mechanism, so that the global coding features corresponding to the input of each image block are finally output and contain the required global information; the input coding feature quantity and the output global coding feature quantity of the transform global coding module are the same as the number N of image blocks divided by two frames of images, namely, each image block corresponds to one global coding feature.
In this embodiment, the global coding features output by the transform global coding module are input into a matching classifier to obtain a matching image block index corresponding to the coding feature corresponding to each image block, so as to obtain a matching point pair and a matching probability between the two frames of images. Specifically, the global coding feature corresponding to each image block output by the transform global coding module is used as an input of a matching classifier, so that the global coding feature corresponding to each image block is calculated by the classifier to obtain a matching probability vector, the matching probability vector is used as the matching degree of the feature point represented by the global coding feature and other feature points, the feature point most matched with the feature point represented by the global feature is obtained according to the maximum matching degree, and the two matched feature points form a matching point pair between two frames of images. As an example in this embodiment, for the first frame image, each feature point corresponds to one feature point
Figure 48896DEST_PATH_IMAGE006
The classification probability of the dimension represents the feature point and the feature point in the second frame image
Figure 871359DEST_PATH_IMAGE007
The probability of matching individual feature points and the probability of no matching points. Similarly, for the second frame image, each feature point corresponds to one
Figure 138392DEST_PATH_IMAGE008
The classification probability of the dimension represents the feature point and the first frame image
Figure 399609DEST_PATH_IMAGE007
The probability of matching individual feature points and the probability of no matching points.
As a preferred scheme in this embodiment, a matching point pair required by pre-training is obtained, and a method of directly matching SIFT feature descriptors can be selected, so that parameters of a pre-trained neural network encoder, a Transformer module, and a matching classifier can be helped to converge to the vicinity of an optimized value, and the training time and difficulty of subsequent supervised training on a neural network model are reduced. In the embodiment, in the pre-training process, data labeling is not needed to be carried out on the image and the divided image blocks, a large amount of image data can be used for directly training, data bottleneck does not exist, the output precision of the neural network model is improved, and the convergence speed of the subsequent joint optimization training on the model is ensured.
In this embodiment, the basis matrix is optimized and solved according to the matching point pairs. Specifically, based on the obtained matching point pair of the first frame image and the second frame image, a basis matrix F is generated based on the relationship between the matching point pair and the camera pose parameter, where the basis matrix F satisfies the following formula:
Figure 68488DEST_PATH_IMAGE009
Figure 874901DEST_PATH_IMAGE010
where K is a known camera intrinsic parameter, R is a camera rotation matrix in the camera pose parameter, t is a displacement in the camera pose parameter,
Figure 629230DEST_PATH_IMAGE011
for the matching point in the first frame image,
Figure 366242DEST_PATH_IMAGE012
is a matching point in the second frame image, wherein,
Figure 951944DEST_PATH_IMAGE011
and
Figure 381789DEST_PATH_IMAGE012
is a pair of matching point pairs for two frame images. And the relationship of the basis matrix F and R, t for the camera pose parameters is described in the equation, the estimated camera pose is thus embodied as a problem that solves the basis matrix F by matching pairs of feature points.
In this embodiment, F is a 3 × 3 matrix with a degree of freedom of 8, so more than 8 matching point pairs can be obtained to solve and calculate F, but since the basic matrix F is solved through the matched feature point pairs, the solved basic matrix F is not accurate enough due to possible errors of the matching point pairs. Therefore, in this embodiment, an end-to-end neural network model is constructed through the basis matrix calculation module, and the matching point pairs, the matching probability and the solved basis matrix F are used as a joint optimization process to perform joint optimization training. The training true value of the basis matrix F can be obtained through calculation of a high-precision pose sensor or motion reconstruction software through global optimization, the obtaining difficulty is relatively low, and the problem that the feature point matching training true value is difficult to obtain is solved.
In this embodiment, based on the obtained matching point pairs and the matching probability thereof, an F for optimal solution can be obtained through the basis matrix calculation module. Specifically, the parameters of the basis matrix F of 3 × 3 are expanded into a parameter vector θ, and the matching point pairs p = (u ', v') and q = (u, v) satisfy the following equation:
Figure 937928DEST_PATH_IMAGE013
;
therefore, the N matching point pairs form N linear equations, which can be formed into a linear equation set:
Figure 213052DEST_PATH_IMAGE014
;
where W is a diagonal matrix, and each diagonal element represents the matching probability of each pair of matching point pairs. And taking the optimal solution of the linear equation set as the optimal solution of the basic matrix F, wherein the optimal solution of the linear equation set can be obtained through a singular value decomposition algorithm.
As a preferred scheme in this embodiment, since the neural network encoder, the transform global encoding module, the matching classifier, and the basis matrix calculation module in the entire neural network model are all derivable, all parameters can be jointly optimized based on an optimization algorithm of gradient descent, that is, supervised training is performed through a real pose. Specifically, a loss function applied to training and evaluating the accuracy of the basis matrix F is calculated, wherein the loss function is a symmetric limit distance, and the definition is as follows:
Figure 918840DEST_PATH_IMAGE015
;
in the formula, p and q are based on the real basic matrix
Figure 254006DEST_PATH_IMAGE016
Virtual matching point pairs, i.e. pairs that satisfy the following real limit geometric constraints:
Figure 982928DEST_PATH_IMAGE017
;
referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of the camera pose estimation apparatus provided by the present invention, as shown in fig. 3, the apparatus includes an extraction module 301, a camera pose acquisition module 302, and a neural network training module 3021, which are as follows:
an extraction module 301, configured to acquire two adjacent frames of images, and extract feature points in the two frames of images, so as to perform image block division on the two frames of images, generate and extract an image block sequence corresponding to each frame of image;
a camera pose acquisition module 302, configured to input the image block sequence into a trained neural network model, so that the neural network model outputs an estimated basic matrix, and decompose the basic matrix to obtain a camera pose estimation result;
the neural network training module 3021 is configured to pre-train the neural network model, obtain a feature vector of each image block in the image block sequence through a neural network encoder, obtain a position code of each image block according to a preset position coding rule, and obtain a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block; and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.
In this embodiment, the extraction module 301 is configured to obtain two adjacent frames of images, and extract feature points in the two frames of images, so as to perform image block division on the two frames of images, generate and extract an image block sequence corresponding to each frame of image; specifically, two adjacent frames of images are obtained, and feature point detection is performed on the two frames of images, so that feature point sets corresponding to the two frames of images are generated; and respectively carrying out image block division on the two frames of images by taking each feature point in the feature point set as a center to generate image block sequences corresponding to the two frames of images.
In this embodiment, the neural network training module 3021 is configured to perform optimization solution on the basis matrix according to the matching point pairs and the matching probabilities; specifically, a linear equation is constructed according to the matching point pairs, and a linear equation set is generated; and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.
In this embodiment, the neural network training module 3021 is configured to input the coding features into the global coding module to obtain global coding features of each image block; finally, inputting the global coding characteristics into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images; specifically, the encoding features of the image block sequences corresponding to the two frames of images are obtained, and the encoding features of each image block are input into the global encoding module, so that the encoding features of each image block traverse the encoding features of all the image blocks, and the global encoding features corresponding to each image block are obtained; and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
It should be noted that the above-mentioned embodiments of the camera pose estimation apparatus are merely illustrative, where the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical units, that is, may be located in one place, or may also be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
On the basis of the above-described embodiment of the camera pose estimation method, another embodiment of the present invention provides a camera pose estimation terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method according to any one of the embodiments of the present invention when executing the computer program.
Illustratively, the computer program may be partitioned in this embodiment into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of instruction segments of a computer program capable of performing a specific function, the instruction segments describing an execution process of the computer program in the camera pose estimation terminal device.
The camera pose estimation terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The camera pose estimation terminal apparatus can include, but is not limited to, a processor, a memory.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor is a control center of the camera pose estimation terminal device, and various interfaces and lines are used to connect the respective parts of the entire camera pose estimation terminal device.
The memory may be configured to store the computer program and/or the module, and the processor may implement various functions of the camera pose estimation terminal device by executing or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
On the basis of the above-described embodiment of the camera pose estimation method, another embodiment of the present invention provides a storage medium including a stored computer program, wherein when the computer program runs, an apparatus on which the storage medium is controlled performs the camera pose estimation method according to any one of the embodiments of the present invention.
In this embodiment, the storage medium is a computer-readable storage medium, and the computer program includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and so on. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In summary, according to the camera pose estimation method, apparatus, device and storage medium provided by the present invention, the two adjacent frames of images are obtained, and the feature points in the two frames of images are extracted, so that the two frames of images are divided into image blocks, and an image block sequence corresponding to each frame of image is generated and extracted; inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result; the neural network model acquires a feature vector of each image block in the image block sequence through a neural network encoder, acquires a position code of each image block according to a preset position coding rule, and acquires a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block; and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability. Compared with the prior art, the pose solving is carried out by training a neural network model, the camera pose solving problems are unified into a joint optimization problem, the complex calculation flow in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera pose is improved, and the method has strong practicability.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims (8)

1. A camera pose estimation method is characterized by comprising the following steps:
acquiring two adjacent frames of images, and performing feature point detection on the two frames of images to generate feature point sets corresponding to the two frames of images;
taking each feature point in the feature point set as a center, respectively carrying out image block division on the two frames of images, and generating and extracting image block sequences corresponding to the two frames of images;
inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;
the neural network model acquires a feature vector of each image block in the image block sequence through a neural network encoder, acquires a position code of each image block according to a preset position coding rule, and acquires a coding feature of each image block by combining the feature vector and the position code;
inputting the coding features into a global coding module to obtain global coding features of each image block;
and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.
2. The method according to claim 1, wherein the optimal solution is performed on the basis matrix according to the matching point pairs and the matching probabilities, and specifically comprises:
constructing a linear equation according to the matching point pairs to generate a linear equation set;
and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.
3. The method according to claim 1, wherein the coding features are input into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:
acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;
and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.
4. A camera pose estimation device, comprising: the system comprises an extraction module, a camera pose acquisition module and a neural network training module;
the extraction module is used for acquiring two adjacent frames of images, detecting feature points of the two frames of images, generating feature point sets corresponding to the two frames of images, and respectively dividing image blocks of the two frames of images by taking each feature point in the feature point sets as a center to generate image block sequences corresponding to the two frames of images;
the camera pose acquisition module is used for inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;
the neural network training module is used for pre-training the neural network model, acquiring a feature vector of each image block in the image block sequence through a neural network encoder, acquiring a position code of each image block according to a preset position coding rule, and acquiring a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block;
and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.
5. The camera pose estimation apparatus according to claim 4, wherein the neural network training module is configured to perform an optimal solution on a basis matrix according to the matching point pairs and the matching probability, and specifically:
constructing a linear equation according to the matching point pairs to generate a linear equation set;
and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.
6. The camera pose estimation apparatus according to claim 4, wherein the neural network training module is configured to input the coding features into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:
acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;
and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.
7. A terminal device characterized by comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method according to any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium characterized by comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the camera pose estimation method according to any one of claims 1 to 3.
CN202210029300.7A 2022-01-12 2022-01-12 Camera pose estimation method, device, equipment and storage medium Active CN114066987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210029300.7A CN114066987B (en) 2022-01-12 2022-01-12 Camera pose estimation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210029300.7A CN114066987B (en) 2022-01-12 2022-01-12 Camera pose estimation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114066987A CN114066987A (en) 2022-02-18
CN114066987B true CN114066987B (en) 2022-04-26

Family

ID=80230808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210029300.7A Active CN114066987B (en) 2022-01-12 2022-01-12 Camera pose estimation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114066987B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359192B (en) * 2022-10-14 2023-03-28 阿里巴巴(中国)有限公司 Three-dimensional reconstruction and commodity information processing method, device, equipment and storage medium
CN115641559B (en) * 2022-12-23 2023-06-02 深圳佑驾创新科技有限公司 Target matching method, device and storage medium for looking-around camera group
CN115661780A (en) * 2022-12-23 2023-01-31 深圳佑驾创新科技有限公司 Camera target matching method and device under cross view angle and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
CN113160375A (en) * 2021-05-26 2021-07-23 郑健青 Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm
CN113643365A (en) * 2021-07-07 2021-11-12 紫东信息科技(苏州)有限公司 Camera pose estimation method, device, equipment and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298884B (en) * 2019-05-27 2023-05-30 重庆高开清芯科技产业发展有限公司 Pose estimation method suitable for monocular vision camera in dynamic environment
CN110490928B (en) * 2019-07-05 2023-08-15 天津大学 Camera attitude estimation method based on deep neural network
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning
CN113378973B (en) * 2021-06-29 2023-08-08 沈阳雅译网络技术有限公司 Image classification method based on self-attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
CN113160375A (en) * 2021-05-26 2021-07-23 郑健青 Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm
CN113643365A (en) * 2021-07-07 2021-11-12 紫东信息科技(苏州)有限公司 Camera pose estimation method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN114066987A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN114066987B (en) Camera pose estimation method, device, equipment and storage medium
Shao et al. Real-time and accurate UAV pedestrian detection for social distancing monitoring in COVID-19 pandemic
Zhao et al. Fusion of 3D LIDAR and camera data for object detection in autonomous vehicle applications
Lu et al. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN108549873B (en) Three-dimensional face recognition method and three-dimensional face recognition system
Paisitkriangkrai et al. Pedestrian detection with spatially pooled features and structured ensemble learning
CN109753885B (en) Target detection method and device and pedestrian detection method and system
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
EP3907660A1 (en) Method, apparatus, electronic device, and storage medium for recognizing license plate
CN112766229B (en) Human face point cloud image intelligent identification system and method based on attention mechanism
CN104574401A (en) Image registration method based on parallel line matching
CN114219855A (en) Point cloud normal vector estimation method and device, computer equipment and storage medium
Shao et al. ClusterNet: 3D instance segmentation in RGB-D images
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN111353429A (en) Interest degree method and system based on eyeball turning
CN111488810A (en) Face recognition method and device, terminal equipment and computer readable medium
Qin et al. PointSkelCNN: Deep Learning‐Based 3D Human Skeleton Extraction from Point Clouds
CN113435432B (en) Video anomaly detection model training method, video anomaly detection method and device
CN112507992B (en) Method, device, equipment and medium for determining shooting distance between road images
CN116246119A (en) 3D target detection method, electronic device and storage medium
Cao et al. Stable image matching for 3D reconstruction in outdoor
CN112084874B (en) Object detection method and device and terminal equipment
CN104268531A (en) Face feature data obtaining system
Zhang et al. RETRACTED: Cross-camera multi-person tracking by leveraging fast graph mining algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Floor 25, Block A, Zhongzhou Binhai Commercial Center Phase II, No. 9285, Binhe Boulevard, Shangsha Community, Shatou Street, Futian District, Shenzhen, Guangdong 518000

Patentee after: Shenzhen Youjia Innovation Technology Co.,Ltd.

Address before: 518051 401, building 1, Shenzhen new generation industrial park, No. 136, Zhongkang Road, Meidu community, Meilin street, Futian District, Shenzhen, Guangdong Province

Patentee before: SHENZHEN MINIEYE INNOVATION TECHNOLOGY Co.,Ltd.