CN113724330B - Monocular camera object pose estimation method, system, equipment and storage medium - Google Patents

Monocular camera object pose estimation method, system, equipment and storage medium Download PDF

Info

Publication number
CN113724330B
CN113724330B CN202111025418.4A CN202111025418A CN113724330B CN 113724330 B CN113724330 B CN 113724330B CN 202111025418 A CN202111025418 A CN 202111025418A CN 113724330 B CN113724330 B CN 113724330B
Authority
CN
China
Prior art keywords
image
specific
data
actual
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111025418.4A
Other languages
Chinese (zh)
Other versions
CN113724330A (en
Inventor
陈忠伟
石岩
王益亮
邓辉
李正昊
李华伟
赵越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiangong Intelligent Technology Co ltd
Original Assignee
Shanghai Xiangong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiangong Intelligent Technology Co ltd filed Critical Shanghai Xiangong Intelligent Technology Co ltd
Priority to CN202111025418.4A priority Critical patent/CN113724330B/en
Publication of CN113724330A publication Critical patent/CN113724330A/en
Application granted granted Critical
Publication of CN113724330B publication Critical patent/CN113724330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a monocular camera object pose estimation method, a system, equipment and a storage medium based on key points, which comprise the steps of obtaining actual size information of an actual object and acquiring an actual object image of the actual object based on a monocular camera; importing an actual object image into a preset specific object detection model, and generating two-dimensional image coordinate data of a specific number of key points; generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera; and generating the current object pose information according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on the PNP principle. According to the method, a specific object detection model is generated in advance according to standard object image data training, manual sample collection is not needed, the problems of insufficient sample collection and difficulty in image marking are solved, the pose of an object is calculated by combining the PNP principle, the effect of positioning the three-dimensional coordinates of the object is achieved, and the efficiency of pose information acquisition is improved.

Description

Monocular camera object pose estimation method, system, equipment and storage medium
Technical Field
The application relates to the technical field of visual positioning, in particular to a monocular camera object pose estimation method, system, equipment and storage medium based on key points.
Background
Pose estimation (Pose estimation) is a very important loop in the field of computer vision. The method has great application in the aspects of controlling by estimating the pose of the robot by using the vision sensor, navigating the robot, enhancing reality and the like.
The basis of this process of pose estimation is to find the correspondence point between the real world and the image projection. And then adopting a corresponding pose estimation method according to the types of the point pairs. Of course, the same type of point pairs also have a division based on algebraic and nonlinear optimization methods, such as direct linear transformation (DIRECT LINEAR Transform, DLT) and beam-balancing (Bundle Adjustment, BA). Whereas the prior art generally refers to the process of estimating pose from known point pairs as solving PnP (pespective-n-point, perspective-n-point).
In the prior art, most of the linear laser or point cloud equipment is used for environment detection, and the defects of high equipment cost and greatly reduced effect under the shielding condition exist.
For this reason, the prior art proposes a method and a device for estimating the pose of an object by a monocular camera based on deep learning (patent application publication number: CN 109816725A), wherein the method comprises 1) generating a training set and a verification set according to the projection of the three-dimensional image of the obtained object in a two-dimensional space, the object coordinates corresponding to the projection, and the tag file of the object; 2) Learning the training set by using the cascade convolutional neural network model, and iterating the super-parameters; 3) And testing the trained cascade convolution neural network model by using a test set, and estimating the pose of the object by using the trained cascade convolution neural network model when the accuracy of the trained cascade convolution neural network model is not less than a first preset threshold.
However, a disadvantage of this prior art is that the sample preparation for learning is single and is too different from the actual environment, and the deep learning network used by this method is estimated to be a rough pose, and further optimization is required by ICP (ITERATIVE CLOSEST POINT ) algorithm.
Disclosure of Invention
The main object of the present invention is to provide a method, a system, a device and a storage medium for estimating the pose of an object of a monocular camera based on key points, so as to improve the drawbacks of the prior art in the background art.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a monocular camera object pose estimation method based on key points, the method comprising:
step S100: acquiring actual size information of an actual object and acquiring an actual object image of the actual object based on a monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
Step S200: the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated by training according to standard object image data in advance;
step S300: generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
step S400: and generating current object pose information according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on a PNP principle.
Specifically, step S200: the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated in advance according to standard object image data training, and the method further comprises the following steps:
step S201: obtaining object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is an environment formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
Step S202: rendering the object model data image and generating a standard two-dimensional sample image;
Step S203: scaling the standard two-dimensional sample image to a specific scale size, and setting a training data set and a test data set according to a specific number scale;
step S204: training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training, and generating a specific object detection model after testing.
Specifically, step S200: importing the actual object image into a preset specific object detection model, and generating two-dimensional image coordinate data of a specific number of key points specifically comprises:
step S210: importing the actual object image into a preset specific object detection model, and scaling the actual object image to a size matched with the standard two-dimensional sample image;
step S220: and generating a specific number of two-dimensional image coordinate data according to the scaled actual object image.
Specifically, step S202: rendering the object model data image and generating a standard two-dimensional sample image, further comprising:
and presetting a specific number and key points at specific positions according to the object model data.
Specifically, the specific location includes: the specific number is the sum of the number of the corner points and the center points of the object model.
In order to achieve the above object, according to a second aspect of the present invention, there is also provided a monocular camera object pose estimation system based on key points, the system comprising:
the information acquisition module is used for acquiring the actual size information of the actual object and acquiring an actual object image of the actual object based on the monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
The image importing module is used for importing the actual object image into a preset specific object detection model and generating two-dimensional image coordinate data of a specific number of key points, wherein the specific object detection model is generated by training according to standard object image data in advance;
The virtual camera module is used for generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
The pose generation module is used for generating the pose information of the current object according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on the PNP principle
Specifically, the system further comprises:
The system comprises a refinement model module, a camera view angle module and a camera view angle module, wherein the refinement model module is used for acquiring object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
the image rendering module is used for rendering the object model data image and generating a standard two-dimensional sample image;
the image scaling module is used for scaling the standard two-dimensional sample image to a specific scale and setting a training data set and a test data set according to a specific quantity scale;
The model training module is used for training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training is completed, and generating a specific object detection model after testing is completed.
Specifically, the system further comprises:
the real object module is used for importing the real object image into a preset specific object detection model and scaling the real object image to a size matched with the standard two-dimensional sample image;
And the specific number module is used for generating specific number of two-dimensional image coordinate data according to the scaled actual object image.
In order to achieve the above object, according to a third aspect of the present invention, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the above-mentioned method for estimating object pose of a monocular camera based on key points when the processor executes the computer program.
In order to achieve the above object, according to a fourth aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps described in the above-described keypoint-based monocular camera object pose estimation method.
The invention has the technical effects that:
According to the monocular camera object pose estimation method based on the key points, the actual size information of the actual object and the actual object image of the actual object are acquired based on the monocular camera in sequence, wherein camera internal reference data are obtained after calibration based on the actual size information; the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated by training according to standard object image data in advance; generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera; the method has the advantages that the current object pose information is generated according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on the PNP principle, namely, a specific object detection model is generated by training according to standard object image data in advance, a large number of training samples are preset, manual sample collection is not needed, the problems of insufficient sample collection and difficulty in picture marking are solved, the pose of an object is calculated by combining the PNP principle, the effect of positioning the three-dimensional coordinate of the object is achieved, and the pose information acquisition efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a method for estimating object pose of a monocular camera based on key points in one embodiment;
FIG. 2 is a block diagram of a system for estimating object pose of a monocular camera based on keypoints in one embodiment;
FIG. 3 is an internal block diagram of a computer device in one embodiment;
FIG. 4 is a diagram of the transformation relationship between coordinate systems in a camera imaging system, in one embodiment.
FIG. 5 is an example of rendering an image of the object model data and generating a standard two-dimensional sample image in one embodiment;
FIG. 6 is an example of rendering the object model data image and generating a standard two-dimensional sample image in one embodiment.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, based on the embodiments of the invention, which are obtained without inventive effort by a person of ordinary skill in the art, shall fall within the scope of the invention.
It should be noted that, in the description and claims of the present invention and the above figures, the terms "step S100", "step S200", and the like are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion.
As shown in fig. 1, the method for estimating the object pose of the monocular camera based on the key points according to the present invention, in a preferred embodiment, includes the steps of:
step S100: acquiring actual size information of an actual object and acquiring an actual object image of the actual object based on a monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
specifically, the actual object image is an image acquired by a common RGB camera.
Further, after the actual size information of the actual object is obtained, camera internal reference data of the monocular camera can be obtained by a person skilled in the art according to calibration of the monocular camera.
In addition, the camera internal parameter data depends on the internal parameters of the monocular camera, and after the monocular camera is selected, the internal parameters corresponding to the monocular camera can be obtained.
Step S200: the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated by training according to standard object image data in advance;
Specifically, the specific object detection model is an object detection model CENTERNET generated after training in advance. The specific number is nine.
Further, CENTERNET model framework is an anchor-free target detection model, CENTERNET does not need NMS post-processing, and training is simplified.
CENTERNET are compatible with a variety of basic models, including ResNet, hourglass, DLA. For the followingIs to generate the input image of the target/>
Wherein W, H is the width and height of the image, and R is the output size scale;
C represents the number of key point types, and in the target detection task, represents the number of categories of the target.
To achieve the objective detection task, the CENTERNET network model contains a number of partial optimization objectives, including, for example, image thermal losses, local offset losses at the center point, and dimensional offset losses at the objective box.
Further, the loss function of pixel logistic regression of thermodynamic diagrams is as follows:
Wherein the method comprises the steps of Target output through activation function,/>Is a gaussian distribution of key points.
For the true key point c, its position is,
For the down-sampled keypoints, α, β are the hyper-parameters of the loss function.
The loss function of the image downsampling center point offset is as follows:
Wherein the method comprises the steps of Is a local offset;
Total network training target Loss:
Wherein the method comprises the steps of To adjust the coefficients, default settings are 0.1 and 1.
The object detection aims at detecting the object type and the boundary box position in the image, the CENTERNET network outputs thermodynamic diagrams of each type, and the peak point needs to be extracted to obtain the center position of the boundary box.
And comparing all response points of the thermodynamic diagram with surrounding adjacent points (8), if the response points are larger than or equal to the surrounding adjacent points, reserving the points, and finally reserving the first N peak points meeting the requirements.
The resulting bounding box:
Wherein the method comprises the steps of Is set/>Is a key point of/>Representing the local bias of the point, wi and h i represent the width and height of the bounding box predicted by the point.
It should be noted that CENTERNET is a mature technology, the foregoing is merely an example, and those skilled in the art should be familiar with and grasp the principle, and after setting a specific number of key points, correspondingly generate two-dimensional image coordinate data.
Step S300: generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
specifically, the world coordinates point Conversion to image coordinates/>The formula of (2) is as follows:
Wherein the method comprises the steps of Is a coefficient,/>Is an internal parameter and an external parameter of the camera.
Under the condition that the parameters inside and outside the camera are known, the three-dimensional coordinate point corresponds to the unique two-dimensional image coordinate.
That is, conversion of three-dimensional coordinate data with two-dimensional image coordinate data and the actual size information can be achieved by the above formula.
Step S400: and generating current object pose information according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on a PNP principle.
Specifically, the camera intrinsic data can be obtained after the monocular camera is used.
Further, in the camera imaging system, four coordinate systems are included in total: world coordinate system, camera coordinate system, image coordinate system and pixel coordinate system.
The world coordinate system, the camera coordinate system, the image coordinate system, and the pixel coordinate system are mutually convertible. The specific conversion relationship is shown in fig. 4.
In addition, the conversion between the above coordinate systems is prior art, and the present application is not specifically described.
On the other hand, when the current object pose information is generated through calculation, the current object pose information can be calculated by a person skilled in the art according to knowledge and by using a calculation function of an OpenCV algorithm library, and the present application is not specifically described.
In one embodiment, step S200: the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated in advance according to standard object image data training, and the method further comprises the following steps:
step S201: obtaining object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is an environment formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
Specifically, this step is carried out based on Blender, in order to guarantee the variety of sample picture, through setting up the standard object of a plurality of quantity to cooperate to build multiple environment scene, a plurality of needs cooperate to build multiple environment scene promptly. A suitable texture map and background picture are set for the standard object.
Step S202: rendering the object model data image and generating a standard two-dimensional sample image;
Step S203: scaling the standard two-dimensional sample image to a specific scale size, and setting a training data set and a test data set according to a specific number scale;
Specifically, the specific ratio size is 640X384. The specific number ratio is an 8:2 ratio, i.e. divided into a training data set and a test data set according to the 8:2 ratio.
Step S204: training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training, and generating a specific object detection model after testing.
In one embodiment, step S200: importing the actual object image into a preset specific object detection model, and generating two-dimensional image coordinate data of a specific number of key points specifically comprises:
step S210: importing the actual object image into a preset specific object detection model, and scaling the actual object image to a size matched with the standard two-dimensional sample image;
specifically, by scaling the actual object image to a size matching the standard two-dimensional sample image, matching and data processing for subsequent processing are achieved quickly.
Step S220: and generating a specific number of two-dimensional image coordinate data according to the scaled actual object image.
In another embodiment, step S202: rendering the object model data image and generating a standard two-dimensional sample image, further comprising:
And presetting a specific number and key points at specific positions according to the object model data. The characteristic number and the specific position can be set according to the angular points and the number of the actual object model and the unique center points of the staggered corners. The specific location includes: the specific number is the sum of the number of the corner points and the center points of the object model.
Specifically, as shown in fig. 5 to 6, the shelf in the form of a rectangular parallelepiped in the example has a specific number of nine, that is, eight corner points dispersed and a center point of the eight corner points. By the arrangement, the deviation of the PNP calculation pose can be reduced as much as possible.
In summary, the invention sequentially acquires the actual size information of the actual object and the actual object image of the actual object acquired based on the monocular camera, wherein camera internal reference data is acquired after calibration based on the actual size information; the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated by training according to standard object image data in advance; generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera; the method has the advantages that the current object pose information is generated according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on the PNP principle, namely, a specific object detection model is generated by training according to standard object image data in advance, a large number of training samples are preset, manual sample collection is not needed, the problems of insufficient sample collection and difficulty in picture marking are solved, the pose of an object is calculated by combining the PNP principle, the effect of positioning the three-dimensional coordinate of the object is achieved, and the pose information acquisition efficiency is improved.
In one embodiment, as shown in fig. 2, a system for estimating the pose of an object by a monocular camera based on key points, the system comprising:
the information acquisition module is used for acquiring the actual size information of the actual object and acquiring an actual object image of the actual object based on the monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
The image importing module is used for importing the actual object image into a preset specific object detection model and generating two-dimensional image coordinate data of a specific number of key points, wherein the specific object detection model is generated by training according to standard object image data in advance;
The virtual camera module is used for generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
The pose generation module is used for generating the pose information of the current object according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on the PNP principle
In one embodiment, the system further comprises:
The system comprises a refinement model module, a camera view angle module and a camera view angle module, wherein the refinement model module is used for acquiring object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
the image rendering module is used for rendering the object model data image and generating a standard two-dimensional sample image;
the image scaling module is used for scaling the standard two-dimensional sample image to a specific scale and setting a training data set and a test data set according to a specific quantity scale;
The model training module is used for training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training is completed, and generating a specific object detection model after testing is completed.
In one embodiment, the system further comprises:
the real object module is used for importing the real object image into a preset specific object detection model and scaling the real object image to a size matched with the standard two-dimensional sample image;
And the specific number module is used for generating specific number of two-dimensional image coordinate data according to the scaled actual object image.
In one embodiment, as shown in fig. 3, a computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps described in the above-described method for estimating object pose of a monocular camera based on key points when the processor executes the computer program.
In one embodiment, a computer readable storage medium has stored thereon a computer program which when executed by a processor implements the steps described above for a keypoint-based monocular camera object pose estimation method.
It should be further noted that, it should be understood by those skilled in the art that all or part of the procedures in implementing the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer readable storage medium, and the computer program may include the procedures of the embodiments of the methods described above when executed.
Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (8)

1. The utility model provides a monocular camera object position appearance estimation method based on key point, its characterized in that, the method includes:
step S100: acquiring actual size information of an actual object and acquiring an actual object image of the actual object based on a monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
step S201: obtaining object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is an environment formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
Step S202: rendering the object model data image and generating a standard two-dimensional sample image;
Step S203: scaling the standard two-dimensional sample image to a specific scale size, and setting a training data set and a test data set according to a specific number scale;
step S204: training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training is completed, and generating a specific object detection model after testing is completed;
Step S200: the actual object image is imported into a preset specific object detection model, and two-dimensional image coordinate data of a specific number of key points are generated, wherein the specific object detection model is generated by training according to standard object image data in advance;
step S300: generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
step S400: and generating current object pose information according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on a PNP principle.
2. The method for estimating the pose of an object by a monocular camera based on key points according to claim 1, wherein step S200: importing the actual object image into a preset specific object detection model, and generating two-dimensional image coordinate data of a specific number of key points specifically comprises:
step S210: importing the actual object image into a preset specific object detection model, and scaling the actual object image to a size matched with the standard two-dimensional sample image;
step S220: and generating a specific number of two-dimensional image coordinate data according to the scaled actual object image.
3. The method for estimating the pose of an object by a monocular camera based on key points according to claim 1, wherein step S202: rendering the object model data image and generating a standard two-dimensional sample image, further comprising: and presetting a specific number and key points at specific positions according to the object model data.
4. The keypoint-based monocular camera object pose estimation method of claim 3, wherein the specific location comprises: the specific number is the sum of the number of the corner points and the center points of the object model.
5. A system for estimating object pose of a monocular camera based on key points, the system comprising:
the information acquisition module is used for acquiring the actual size information of the actual object and acquiring an actual object image of the actual object based on the monocular camera, wherein camera internal reference data are obtained after calibration based on the actual size information;
The image importing module is used for importing the actual object image into a preset specific object detection model and generating two-dimensional image coordinate data of a specific number of key points, wherein the specific object detection model is generated by training according to standard object image data in advance;
The virtual camera module is used for generating a specific number of three-dimensional coordinate data according to the actual size information based on a virtual camera coordinate system of the monocular camera;
The pose generation module is used for generating current object pose information according to the three-dimensional coordinate data, the camera internal reference data and the two-dimensional image coordinate data based on a PNP principle;
The system comprises a refinement model module, a camera view angle module and a camera view angle module, wherein the refinement model module is used for acquiring object model data of a preset standard object in a specific preset environment, wherein the specific preset environment comprises a plurality of refinement model environments, and each refinement model environment is formed by combining a plurality of environment backgrounds, environment illumination and camera view angles;
the image rendering module is used for rendering the object model data image and generating a standard two-dimensional sample image;
the image scaling module is used for scaling the standard two-dimensional sample image to a specific scale and setting a training data set and a test data set according to a specific quantity scale;
The model training module is used for training a preset initial detection model based on the training data set, testing the trained initial detection model according to the testing data set after training is completed, and generating a specific object detection model after testing is completed.
6. The keypoint-based monocular camera object pose estimation system of claim 5, further comprising:
the real object module is used for importing the real object image into a preset specific object detection model and scaling the real object image to a size matched with the standard two-dimensional sample image;
And the specific number module is used for generating specific number of two-dimensional image coordinate data according to the scaled actual object image.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
CN202111025418.4A 2021-09-02 2021-09-02 Monocular camera object pose estimation method, system, equipment and storage medium Active CN113724330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111025418.4A CN113724330B (en) 2021-09-02 2021-09-02 Monocular camera object pose estimation method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111025418.4A CN113724330B (en) 2021-09-02 2021-09-02 Monocular camera object pose estimation method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113724330A CN113724330A (en) 2021-11-30
CN113724330B true CN113724330B (en) 2024-04-30

Family

ID=78680891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111025418.4A Active CN113724330B (en) 2021-09-02 2021-09-02 Monocular camera object pose estimation method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113724330B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012422B (en) * 2023-03-23 2023-06-09 西湖大学 Monocular vision-based unmanned aerial vehicle 6D pose estimation tracking method and application thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
US10417781B1 (en) * 2016-12-30 2019-09-17 X Development Llc Automated data capture
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112652016A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Point cloud prediction model generation method, pose estimation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10417781B1 (en) * 2016-12-30 2019-09-17 X Development Llc Automated data capture
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112652016A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Point cloud prediction model generation method, pose estimation method and device

Also Published As

Publication number Publication date
CN113724330A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN110568447B (en) Visual positioning method, device and computer readable medium
CN109063301B (en) Single image indoor object attitude estimation method based on thermodynamic diagram
Pandey et al. Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information
CN110567441B (en) Particle filter-based positioning method, positioning device, mapping and positioning method
CN114143519B (en) Method and device for automatically matching projection image with curtain area and projector
CN111246098B (en) Robot photographing method and device, computer equipment and storage medium
CN112380926B (en) Weeding path planning system of field weeding robot
CN113689578A (en) Human body data set generation method and device
CN113724330B (en) Monocular camera object pose estimation method, system, equipment and storage medium
Streiff et al. 3D3L: Deep learned 3D keypoint detection and description for lidars
CN116630442B (en) Visual SLAM pose estimation precision evaluation method and device
CN117036756B (en) Remote sensing image matching method and system based on variation automatic encoder
CN116921932A (en) Welding track recognition method, device, equipment and storage medium
CN116128919A (en) Multi-temporal image abnormal target detection method and system based on polar constraint
US11699303B2 (en) System and method of acquiring coordinates of pupil center point
Wan et al. A performance comparison of feature detectors for planetary rover mapping and localization
JP2014178967A (en) Three-dimensional object recognition device and three-dimensional object recognition method
CN113112547A (en) Robot, repositioning method thereof, positioning device and storage medium
AU2017300877B2 (en) Method and device for aiding the navigation of a vehicle
Kaess et al. MCMC-based multiview reconstruction of piecewise smooth subdivision curves with a variable number of control points
CN113657327B (en) Non-living body attack discrimination method, device, equipment and medium suitable for image
CN115965759B (en) Method for building digital modeling by using laser radar
US11734855B2 (en) Rotation equivariant orientation estimation for omnidirectional localization
CN116137039B (en) Visual and laser sensor external parameter correction method and related equipment
CN117495938B (en) Foldable hollow plate production data extraction method based on image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant