WO2023016182A1

WO2023016182A1 - Pose determination method and apparatus, electronic device, and readable storage medium

Info

Publication number: WO2023016182A1
Application number: PCT/CN2022/105549
Authority: WO
Inventors: 王彬
Original assignee: 北京迈格威科技有限公司
Priority date: 2021-08-13
Filing date: 2022-07-13
Publication date: 2023-02-16
Also published as: CN113793251A

Abstract

Provided by the present application are a pose determination method and apparatus, an electronic device, and a readable storage medium. A specific embodiment of the method comprises: obtaining an image set to be processed, the image set to be processed comprising a plurality of target images; for each of the target images, on the basis of target key point information of the target image, determining initial pose information of a target object; and on the basis of an initial model and the initial pose information corresponding to each of the plurality of target images, determining a target model and a target pose. In the described method, a target model and target pose can be determined at the same time, so that there is no need to generate an object model in advance and no need to design a matching algorithm between the target object and the object model, optimizing the processing steps of a pose estimation process.

Description

Pose determination method, device, electronic device and readable storage medium

Cross References to Related Applications

This application claims the priority of the Chinese patent application with the application number 202110932980.9 titled "Pose Determination Method, Device, Electronic Equipment, and Readable Storage Medium" submitted to the State Intellectual Property Office of China on August 13, 2021, the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the field of information processing, and in particular, relates to a pose determination method, device, electronic equipment, and readable storage medium.

Background technique

With the continuous development of intelligent systems, the requirements for automatic processing of surveillance videos are also getting higher and higher. When the pose of the target object is determined in the surveillance video or captured image, since the image data loses the depth information in the three-dimensional world during the acquisition process, it is difficult to restore the pose parameters of the target object.

In related technologies, the pose parameters of the target object can often be estimated in the following two ways. One is to first render its images in different poses through the 3D model of the target object, and then use the rendered images as the input of the convolutional neural network, and use the attitude parameters corresponding to different poses as the expected output of the convolutional neural network , to train a convolutional neural network. After the convolutional neural network converges, it can be used for pose estimation. In this way, when performing target pose estimation, the image to be processed can be used as the input of the network, and the output is the corresponding pose parameters. Since this image recognition-based method does not have a strict projection relationship equation, it is difficult to obtain accurate estimates of the target pose parameters, and its generalization is poor. The second is to use the deep learning method to predict the two-dimensional key points of the target object, and combine the three-dimensional model of the target object and the camera calibration information to estimate the projection relationship between the model points of the three-dimensional model and the two-dimensional key points of the target object in the image pose. This method can help improve the accuracy and robustness of the estimation results, but this method needs to pre-generate the 3D model of the target object and an accurate model matching algorithm, which brings great difficulty to the estimation process.

Contents of the invention

The purpose of the embodiments of the present application is to provide a pose determination method, device, electronic equipment, and readable storage medium to simultaneously obtain the target model and the target pose, thereby solving the problem of the target object during the pose estimation process of the target object. The target model cannot be pre-generated and the matching between the target object and the target model is difficult.

An embodiment of the present application provides a pose determination method, which may include: acquiring a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to a target object; for each of the target images, based on the The target key point information of the target image, determining the initial pose information of the target object; determining the initial model of the target object; based on the initial model and the initial pose information corresponding to each of the plurality of target images, A target model of the target object and a target pose of the target object are determined. In this way, the target model and the target pose can be determined at the same time, so that the object model does not need to be generated in advance and the matching algorithm between the target object and the object model does not need to be designed, and the processing steps of the pose estimation process are optimized.

Optionally, the pose determination method may further include: acquiring prior information that matches the target object; the prior information is used to characterize the structure information and/or size information of the target object; the Based on the initial model and the initial pose information corresponding to each of the plurality of target images, determining the target model of the target object and the target pose of the target object includes: based on the prior information, the The initial model and the initial pose information corresponding to each of the plurality of target images are used to determine the target model of the target object and the target pose of the target object. In this way, the structure and/or size of the target object can be constrained by adding prior information to improve the optimization accuracy and make the obtained target model and target pose more accurate.

Optionally, determining the target model of the target object and the target position of the target object based on the prior information, the initial model, and the initial pose information corresponding to each of the plurality of target images pose, may include: calculating a minimum error value based on the prior information, the initial model, and a plurality of the initial pose information; wherein, the minimum error value is characterized by an object specification error value and a reprojection error value The object specification error value represents the error between the structure information and/or size information corresponding to the obtained object model and the prior information; the reprojection error value represents the obtained object model in the corresponding target image The reprojection error between the projected point and the corresponding key point; the object model optimized when the minimum error value is obtained is determined as the target model, and the pose optimized when the minimum error value is obtained is determined as The target pose. Here, an implementation manner is provided that can simultaneously determine the target model and the target pose.

Optionally, the minimum error value can be obtained based on the following steps: when it is detected that the error value is smaller than the error threshold, the error value is determined as the minimum error value; or when the number of iterations reaches the upper limit of iterations, a plurality of errors The minimum error value among the values is determined as the minimum error value; wherein, the iteration number threshold corresponding to the iteration upper limit is matched with the number of the target image included in the image set to be processed. Here, two implementation manners for determining the minimum error value are provided, and one can be used in an actual application process.

Optionally, the calculating the minimum error value based on the prior information, the initial model, and a plurality of initial pose information may include: for each iteration calculation, based on the prior information and the current optimizing the current model of the target object obtained, and calculating the current object specification error value; and based on the current model and the current corresponding initial pose information, calculating the current reprojection error value; and based on the current time The second object specification error value and the current reprojection error value determine the minimum error value. In this way, the minimum error value can be determined based on the object specification error value and the reprojection error value, so that the determined target model is closer to the target object, and the determined target pose is consistent with the actual position of the target object at the time when the target image is taken. posture is closer.

Optionally, the determining the minimum error value based on the current object specification error value and the current reprojection error value may include: combining the current object specification error value with the current reprojection error value The product or sum corresponding to the projection error value is determined as the error value calculated by the current iteration; and the minimum error value is determined based on the error value calculated by the current iteration and the error value calculated by the previous iteration. Here, a way is provided in which the minimum error value can be determined.

Optionally, the target key point information includes position information of the target key point; and for each of the target images, based on the target key point information of the target image, determining the initial pose information of the target object, It may include: based on the position information of the target key point and the camera calibration information, determine the initial position information of the target key point in the world coordinate system; the world coordinate system includes the motion plane of the target object as the coordinate plane A coordinate system; determining initial yaw angle information matching the initial position information; determining initial pose information of the target object based on the initial position information and the initial yaw angle information. In order to help determine more accurate target pose information.

Optionally, the determining the initial yaw angle information matching the initial position information may include: within the target angle range, selecting a plurality of angles to match the initial position information, and determining that the target key point corresponds to The reprojection error values corresponding to the key points of the model at various angles; the angle information corresponding to the smallest reprojection error value is determined as the initial yaw angle information. In order to make the initial yaw angle information closer to the target yaw angle information.

Optionally, the initial model may be pre-determined based on the following steps: for each object model in the object model library, determine the projection point of the model key point representing the object model in the target image; and determine the projection point and Reprojection error values between corresponding key points of the target image; determining the object model corresponding to the smallest reprojection error value as the initial model. In this way, the initial model can be closer to the target model, and the rate of determining the target model can be accelerated.

Optionally, the set of images to be processed may include a plurality of target images determined from the moving path of the target object. In this way, relatively close initial pose information can be obtained through multiple target images that can represent the motion trajectory of the target object.

The embodiment of the present application also provides a device for determining a pose, which may include: an acquisition module configured to acquire an image set to be processed; the image set to be processed includes a plurality of target images corresponding to a target object; A determining module, configured to determine the initial pose information of the target object based on the target key point information of the target image for each target image; a second determining module, configured to determine the an initial model of the target object; a third determining module, configured to determine the target model of the target object and the initial pose information corresponding to each of the multiple target images based on the initial model and The target pose of the target object.

The embodiment of the present application also provides an electronic device, the electronic device includes a processor and a memory, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the operation The steps in the method are as provided above.

The embodiment of the present application also provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the method provided above are performed.

The embodiment of the present application also provides a computer program product, the computer program product includes a computer program, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the above-mentioned steps in the method described above.

Other features and advantages of the present application will be set forth in the ensuing description and, in part, will be apparent from the description, or can be learned by practicing the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Description of drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that need to be used in the embodiments of the present application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application, so It should not be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings according to these drawings without creative work.

FIG. 1 is a flow chart of a pose determination method provided in an embodiment of the present application;

FIG. 2 is a flow chart of another pose determination method provided in the embodiment of the present application;

Fig. 3 is a structural block diagram of a pose determination device provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an electronic device for performing a pose determination method provided by an embodiment of the present application.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application.

In recent years, artificial intelligence-based computer vision, deep learning, machine learning, image processing, image recognition and other technologies have made important progress. Artificial Intelligence (AI) is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence. The subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks. As an important branch of artificial intelligence, computer vision is specifically to allow machines to recognize the world. Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc. Recognition, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, behavior recognition, 3D reconstruction, virtual reality, augmented reality, simultaneous localization and map construction (SLAM), computational photography, robotics Navigation and positioning technologies. With the research and progress of artificial intelligence technology, this technology has been applied in many fields, such as security, urban management, traffic management, building management, park management, face access, face attendance, logistics management, warehouse management, robots , smart marketing, computational photography, mobile imaging, cloud services, smart home, wearable devices, unmanned driving, automatic driving, smart medical care, face payment, face unlock, fingerprint unlock, witness verification, smart screen, smart TV, Cameras, mobile Internet, webcasting, beauty, cosmetics, medical beauty, intelligent temperature measurement and other fields.

In the related art, there are problems that the object model cannot be generated in advance and the matching between the target object and the target model is difficult in the pose estimation process of the target object. In order to solve the above problems, the present application provides a pose determination method, device, electronic equipment and readable storage medium. By first obtaining an initial model that is different from the target model, and then based on the initial model and the initial pose information of the target object shown in multiple target images, simultaneously determine the target model and target pose of the target object, so that The above-mentioned problems are solved without pre-generating the target model and without designing a matching algorithm between the target object and the target model. In practical applications, this application can be applied to the corresponding pose estimation process of vehicles, drones, etc. Exemplarily, the present application takes the pose estimation process applied to a vehicle as an example to illustrate the pose determination method. That is, the aforementioned target object may include a target vehicle. It is necessary to ensure the accuracy of the target pose of the target vehicle. For example, in the field of intelligent traffic monitoring, situations such as counting traffic flow and judging whether the driver is driving illegally can be performed through the determined target pose.

The defects in the solutions in the above related technologies are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the embodiments of the present application below, All should be the inventor's contribution to the application during the application process.

Please refer to FIG. 1 , which shows a flowchart of a first method for determining a pose provided by an embodiment of the present application. As shown in FIG. 1 , the pose determination method may include the following steps 101 to 104 .

Step 101, acquiring a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to a target object;

The aforementioned target image may include, for example, images corresponding to target objects such as a target truck, a target van, and a target car.

Further, after acquiring multiple images of the target object, these images may be sorted to obtain the above image set to be processed. In some application scenarios, the above-mentioned target image can be obtained through network resources, can also be obtained by using a camera on the spot, and can also be obtained through pre-recorded video. Here, the intercepted image including the target object may be regarded as the target image when it is acquired through video recording.

In some optional implementation manners, the set of images to be processed may include multiple target images determined from a moving path of the target object.

In some application scenarios, multiple images may be captured in the moving path of the target object, and the captured images may be regarded as target images. In this way, the multiple captured target images can be used to characterize the movement trajectory of the target object. The closer initial pose information can be obtained through multiple target images that can represent the motion trajectory of the target object.

Step 102, for each target image, determine the initial pose information of the target object based on the target key point information of the target image;

After the above image set to be processed is acquired, the initial pose information of the target object can be determined for the target key point information of each target image in the image set to be processed.

The above target key point information can be regarded as information of key points in the target image that can be used to characterize the position of the target object. The key points may include, for example, corresponding points in the image of the target vehicle, such as the front logo of the vehicle, the left-view mirror of the vehicle, and the right-view mirror of the vehicle. In some application scenarios, one or more of the above-mentioned key points such as the vehicle front logo, vehicle left-view mirror, and vehicle right-view mirror can be selected as target key points, and the image coordinate information corresponding to the target key points can be regarded as The key point information of the above target. The image coordinate information here may be (u, v), for example. The coordinate parameters "u" and "v" here can be any value in the image coordinate system.

The above initial pose information can be regarded as being able to roughly represent the position information and attitude information of the target object in the actual application scene. The initial pose information may be represented by coordinate information (X, Y, θ) of the target object in the world coordinate system, for example. Optionally, the above-mentioned coordinate parameters "X", "Y", and "θ" may be any corresponding values in the world coordinate system. Optionally, "θ" can be regarded as the pose information of the target object.

Step 103, determining the initial model of the target object;

The aforementioned initial model may be, for example, a predetermined model similar to the target object. The size of the model can be quite different from the size corresponding to the target model, so that the target model does not need to be generated in advance. For example, the target object is a car, and the initial model can be, for example, a van model. In some application scenarios, a vehicle model library may be prepared in advance, and the vehicle model library may, for example, be designed with reference to various vehicle types in actual applications.

In some optional implementation manners, the initial model is pre-determined based on the following steps:

Step A, for each object model in the object model library, determine the projection point of the model key point representing the object model in the target image;

The above-mentioned object model library may be, for example, the above-mentioned vehicle model library. Correspondingly, the aforementioned key points of the model may include, for example, key points of the vehicle model that can substantially represent the object model, such as the logo of the vehicle model and the left-view mirror.

For example, for each vehicle model in the vehicle model library, the projection point of the key point of the front vehicle logo representing the vehicle model in the target vehicle image may be determined.

Further, the above projection point can be obtained, for example, by a projection formula. For example, given the target vehicle model information (x _w , y _w , z _w ) and its pose information (X, Y, θ), the image coordinates corresponding to the projection point can be calculated as (u, v) using the projection formula. In these application scenarios, the above-mentioned u and v can represent any number in the coordinate system to which they belong, and the above-mentioned x _w , y _w , and z _w can respectively represent the length information, width information and height l information of the target vehicle model. The above-mentioned X, Y can be any value in the coordinate system to which they belong, and the above-mentioned θ can be any value within (0, 2π). The projection formula above may include, for example:

Among them, λ is the scale factor, K is the internal parameter of the camera, and P is the external parameter of the camera.

Step B, determining the reprojection error value between the projection point and the corresponding key point of the target image;

After the above projection point is determined, the reprojection error value between the projection point and the corresponding key point in the target image can be calculated. For example, after it is determined to use the vehicle front logo as the vehicle model key point A, the projection point A' of the model key point A in the target image can be obtained based on the above projection formula. The reprojection error value between projected point A' and its corresponding keypoint a can then be determined. In some application scenarios, for example, the above reprojection error value may be calculated by the least square method. Here, the manner of calculating the reprojection error value by using the least square method is a related technology, which will not be described in detail here.

Step C, determining the object model corresponding to the smallest reprojection error value as the initial model.

After the reprojection error values corresponding to each object model are determined, the magnitudes of each reprojection error value can be compared, and then the smallest reprojection error value can be determined.

After the minimum reprojection error value is determined, the object model corresponding to the minimum reprojection error value may be determined as the above initial model. For example, when it is determined that the reprojection error value corresponding to vehicle model A is the smallest, vehicle model A may be determined as the initial model.

Through the above steps A to C, the initial model can be determined through the calculated reprojection error value. This makes the initial model closer to the target model and speeds up the determination of the target model.

Step 104: Determine a target model of the target object and a target pose of the target object based on the initial model and the initial pose information corresponding to each of the plurality of target images.

After the initial model and multiple pieces of initial pose information are determined, for example, based on different initial pose information, the least square method can be used for continuous iterative optimization to obtain the above-mentioned target model and target pose. In some application scenarios, the above target model may be applied to measure length information, height information, and/or width information of the vehicle, for example.

Through the above steps 101 to 104, the target model and the target pose can be determined at the same time, so that there is no need to pre-generate the object model and design a matching algorithm between the target object and the object model, and optimize the processing steps of the pose estimation process.

Please refer to FIG. 2 , which shows a flowchart of another image processing method provided by an embodiment of the present application. As shown in FIG. 2 , the image processing method may include the following steps 201 to 205 .

Step 201, acquiring a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to a target object;

The implementation process of the above step 201 and the obtained technical effect may be the same as or similar to the step 101 of the embodiment shown in FIG. 1 , and details are not described here.

Step 202, for each target image, based on the target key point information of the target image, determine the initial pose information of the target object;

The implementation process of the above-mentioned step 202 and the technical effect obtained may be the same as or similar to the step 102 of the embodiment shown in FIG. 1 , and will not be repeated here.

Step 203, determining the initial model of the target object;

The implementation process of the above step 203 and the obtained technical effect may be the same as or similar to the step 103 of the embodiment shown in FIG. 1 , and details are not described here.

Step 204, acquiring prior information matching the target object; the prior information is used to characterize the structure information and/or size information of the target object;

In some application scenarios, prior information of the target object can be obtained. The above-mentioned prior information may include, for example, size information of the target object such as length information, width information, and height information, as well as coplanar information between the license plate and the front logo, and symmetry information between the left-view mirror and the right-view mirror. information.

Step 205: Determine the target model of the target object and the target pose of the target object based on the prior information, the initial model, and the initial pose information corresponding to each of the plurality of target images.

After obtaining the prior information, the initial model, and multiple initial pose information, the target model and target pose can be determined simultaneously.

In practice, when the target image is acquired, there is often a lack of image information of the target object at various angles, and then some key points in the image have limited optimization accuracy due to different acquisition angles of view. Through the above steps 201 to 205, the structure and/or size of the target object can be constrained by adding prior information to improve the optimization accuracy and make the obtained target model and target pose more accurate.

In some optional implementations, the target model and target pose can be determined simultaneously through the following sub-steps:

Sub-step 2051, calculate a minimum error value based on the prior information, the initial model, and a plurality of the initial pose information; wherein, the minimum error value is characterized by an object specification error value and a reprojection error value; The object specification error value represents an error between the structure information and/or size information corresponding to the obtained object model and the prior information; the reprojection error value represents the difference between the obtained object model in the corresponding target image The reprojection error between the projected point and the corresponding keypoint;

In some application scenarios, iterative calculation can be performed based on multiple initial pose information to obtain the minimum error value. For example, the minimum error value can be calculated using a non-linear least squares method. In these application scenarios, the minimum error value can be characterized by object specification error value and reprojection error value. For example, the minimum product obtained by multiplying the object specification error value and the reprojection error value can be used as the minimum error value; or the minimum sum obtained by adding the object specification error value and the reprojection error value can be used as the minimum error value. Here, the sum obtained by adding the object specification error value and the reprojection error value can be the algebraic sum of the two, or the weighted sum of the two, which can be selected according to the actual situation, and is not limited here. The aforementioned weighted sum may, for example, be realized by assigning different weight values to the two.

In some optional implementation manners, the above substep 2051 may include the following steps:

Step 1, for each iterative calculation, based on the prior information and the current model of the target object obtained by the current optimization, calculate the current object specification error value;

In some application scenarios, for each iterative calculation, the specification error value of the current object can be determined based on the prior information and the currently obtained current model of the target object. The above current target specification error value can be regarded as representing the error between the target model obtained at the current time and the prior information, for example. For example, the prior information of the target vehicle is: the height is 3 meters, the width is 2 meters, and the length is 5 meters. At this time, if the obtained object model represents the target vehicle: the height is 3 meters, the width is 1.8 meters, and the length is 5 meters. Meters, you can calculate the current object specification error values: the height error is 0 meters, the width error is 0.2 meters, and the length error is 0 meters.

Step 2, based on the current model and the current corresponding initial pose information, calculate the current reprojection error value;

In some application scenarios, for each iterative calculation, the current corresponding reprojection error value can be determined based on the currently obtained object model and the current corresponding initial pose information. For example, the target image corresponding to the current corresponding initial pose information can be determined, and then the projection point corresponding to the key point of the model that can characterize the currently obtained object model can be determined, and the relationship between the projection point and the corresponding key point of the target image can be calculated. The reprojection error value between . Here, reference may be made to relevant parts of step B in the foregoing embodiments, and details are not repeated here.

Step 3: Determine the minimum error value based on the current object specification error value and the current reprojection error value.

After the object specification error value of the current time and the reprojection error value of the current time are determined, the above minimum error value can be determined. For example, different weights may be assigned to the current specification error value and the current reprojection error value to determine the aforementioned minimum error value.

In the related art, there are cases where the reprojection error value is small, but the obtained pose is quite different from the actual pose. Therefore, this embodiment introduces the object specification error value through the above sub-steps 1 to 3, and then the minimum error value can be determined based on the object specification error value and the reprojection error value, so that the determined target model is closer to the target object. Closeness, the determined target pose is closer to the actual pose of the target object at the moment when the target image is captured.

In some optional implementation manners, the above step three may include: firstly, determining the product or sum corresponding to the current object specification error value and the current reprojection error value as the current iteration calculation difference;

That is to say, in some application scenarios, the current specification error value and the current reprojection error value can be accumulated to obtain the sum of the two, and then the error value corresponding to the current iteration can be obtained. In other application scenarios, the current specification error value can also be multiplied by the current reprojection error value to obtain the product of the two, and then the error value corresponding to the current iteration can be obtained.

Then, the minimum error value is determined based on the error value calculated by the current iteration and the error value calculated by historical iterations.

Each iterative calculation can correspond to an error value. After obtaining the error value corresponding to the current iteration, the error value can be compared with the error value obtained by historical iterative calculations to determine the current corresponding minimum error value.

In sub-step 2052, the object model optimized when the minimum error value is obtained is determined as the target model, and the pose optimized when the minimum error value is obtained is determined as the target pose.

After the minimum error is obtained, the object model corresponding to the minimum error value can be determined as the target model, and the corresponding pose at this time can be determined as the target pose.

In some optional implementation manners, the minimum error value is obtained based on the following steps: when it is detected that the error value is less than an error threshold, the error value is determined as the minimum error value; or when the number of iterations reaches the upper limit of iterations, Determining the smallest error value among the plurality of error values as the minimum error value; wherein, when the iteration upper limit is reached, the corresponding iteration number threshold matches the number of the target image included in the image set to be processed.

In some application scenarios, the iterative calculation may be stopped when it is detected that the error value is smaller than the error threshold. Further, the object model optimized at this time may be determined as the target model, and the corresponding pose information obtained at this time may be determined as the target pose information. In this way, the calculation amount of the iterative calculation can be reduced while the current target model basically matches the target object, and the target pose basically matches the actual pose of the target object. In these application scenarios, the error threshold may include, for example, 0.08, 0.1, etc., which can substantially make the object model corresponding to the error value be regarded as the target model.

In other application scenarios, the iterative calculation can also be stopped when the maximum number of iterations is reached. For example, the initial pose information corresponding to each target image is used, and when the iterative calculation process can no longer continue, it can be regarded as reaching the maximum number of iterations. The object model obtained at this time can be determined as the target model, and the currently obtained pose information can be determined as the target pose.

In some optional implementation manners, the target key point information includes position information of the target key point; and step 102 in the above embodiment shown in FIG. 1 or step 202 in the embodiment shown in FIG. 2 may include the following sub-steps:

Sub-step 1, based on the position information of the target key point and the camera calibration information, determine the initial position information of the target key point in the world coordinate system; the world coordinate system includes the motion plane of the target object as the coordinate plane coordinate system;

The above camera calibration information may include an internal reference matrix and an external reference matrix of the camera, so as to correct the captured image and obtain a target image with less distortion.

The above-mentioned world coordinate system may include a coordinate system using the motion plane of the target object as a coordinate plane. For example, the road surface coordinate system in which the moving road surface of the target vehicle is taken as the horizontal and vertical coordinate plane can be regarded as the world coordinate system.

Through the position information of the target key point and the camera calibration information, the initial position information of the target key point in the world coordinate system can be determined. For example, when the vehicle front logo is used as the target key point, based on the image coordinates (u, v) of the target key point and the camera calibration information, the projection formula can be used to determine the initial position information of the target key point in the world coordinate system ( X, Y, Z). When the world coordinate system is the aforementioned road surface coordinate system, the initial position information may be (X, Y, 0).

Sub-step 2, determining the initial yaw angle information matching the initial position information;

After the initial position information of the target key point in the world coordinate system is determined, the initial yaw angle information matching the initial position information can be determined further.

In some optional implementations, the above initial yaw angle information can be determined based on the following steps: first, within the target angle range, select multiple angles to match the initial position information, and determine the target key point corresponding The reprojection error values corresponding to the key points of the model at each angle;

The aforementioned target angle range may include (0, 2π), for example.

In some application scenarios, multiple angles can be selected within the target angle range to match with the initial position information respectively. For example, 90°, 180°, 270°, 360°, etc. may be selected at equal intervals to match the initial position information respectively. Further, after selecting 90° as the angle for matching with the initial position information, the pose information of the key points of the above model can be (X, Y, 90°), at this time, the projection point corresponding to the key point of the model can be calculated The reprojection error from the target keypoint. Similarly, the reprojection errors corresponding to multiple model key points at their respective angles can be calculated respectively.

Then, the angle information corresponding to the smallest reprojection error value is determined as the initial yaw angle information.

After multiple reprojection error values are determined, angle information corresponding to the smallest reprojection error value may be determined as initial yaw angle information, so that the initial yaw angle information is closer to the target yaw angle information. For example, when it is determined that the reprojection error value corresponding to the key point of the model whose pose information is (X, Y, 90°) is the smallest, 90° may be determined as the initial yaw angle information.

Sub-step 3, based on the initial position information and the initial yaw angle information, determine the initial pose information of the target object.

After the initial position information and the initial yaw angle information are determined, the initial pose information of the target object can be determined. For example, when the initial position information (X, Y, 0) and the initial yaw angle information of 90° are determined, the initial pose information (X, Y, 90°) can be determined.

Through the above sub-steps 1 to 3, the initial pose information of the target object can be roughly determined, which is beneficial to determine more accurate target pose information.

Please refer to FIG. 3 , which shows a structural block diagram of an apparatus for determining a pose provided by an embodiment of the present application. The apparatus for determining a pose may be a module, program segment or code on an electronic device. It should be understood that the device corresponds to the above-mentioned method embodiment in FIG. 1 , and can execute various steps involved in the method embodiment in FIG. 1 . The specific functions of the device can refer to the description above. To avoid repetition, detailed descriptions are appropriately omitted here.

Optionally, the above-mentioned device for determining a pose may include an acquisition module 301 , a first determination module 302 , a second determination module 303 and a third determination module 304 . Wherein, the obtaining module 301 is configured to obtain a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to the target object; the first determination module 302 is configured to image, based on the target key point information of the target image, determine the initial pose information of the target object; the second determining module 303 is configured to determine the initial model of the target object; the third determining module 304 is configured to It is configured to determine a target model of the target object and a target pose of the target object based on the initial model and the initial pose information respectively corresponding to the plurality of target images.

Optionally, the apparatus for determining pose and posture may further include an information acquisition module configured to: acquire prior information matching the target object; the prior information is used to characterize the Structural information and/or size information of the target object; the third determination module 304 is further configured to: based on the prior information, the initial model, and the initial position corresponding to each of the plurality of target images pose information, and determine the target model of the target object and the target pose of the target object.

Optionally, the third determining module 304 may be further configured to: calculate a minimum error value based on the prior information, the initial model, and a plurality of initial pose information; wherein, the minimum The error value is characterized by an object specification error value and a reprojection error value; the object specification error value represents the error between the structure information and/or size information corresponding to the obtained object model and the prior information; the reprojection The projection error value characterizes the reprojection error between the projected point of the object model in the corresponding target image and the corresponding key point; the object model optimized when the minimum error value is obtained is determined as the target model, and The pose optimized when the minimum error value is obtained is determined as the target pose.

Optionally, the minimum error value can be obtained based on the following steps: when it is detected that the error value is smaller than the error threshold, the error value is determined as the minimum error value; or when the number of iterations reaches the upper limit of iterations, a plurality of errors The minimum error value among the values is determined as the minimum error value; wherein, the iteration number threshold corresponding to the iteration upper limit is matched with the number of the target image included in the image set to be processed.

Optionally, the third determination module 304 may be further configured to: for each iterative calculation, based on the prior information and the current model of the target object obtained by current optimization, calculate the current object A specification error value; and based on the current model and the current corresponding initial pose information, calculate the current reprojection error value; and based on the current object specification error value and the current reprojection error value, determine The minimum error value.

Optionally, the third determination module 304 may be further configured to: determine the product or sum corresponding to the current object specification error value and the current reprojection error value as the current iterative calculation The obtained error value; and determining the minimum error value based on the error value calculated by the current iteration and the error value calculated by historical iterations.

Optionally, the first determination module 302 may be further configured to: determine the initial position information of the target key point in the world coordinate system based on the position information of the target key point and the camera calibration information; The world coordinate system includes a coordinate system using the motion plane of the target object as a coordinate plane; determining initial yaw angle information matching the initial position information; based on the initial position information and the initial yaw angle information, determining The initial pose information of the target object.

Optionally, the first determining module 302 may be further configured to: within the target angle range, select multiple angles to match the initial position information, and determine that the model key points corresponding to the target key points are at Re-projection error values corresponding to each angle; the angle information corresponding to the smallest re-projection error value is determined as the initial yaw angle information.

Optionally, the initial model may be determined in advance based on the following steps: for each object model in the object model library, determine the projection point of the model key point representing the object model in the target image; and determine the projection point and Reprojection error values between corresponding key points of the target image; determining the object model corresponding to the smallest reprojection error value as the initial model.

Optionally, the set of images to be processed may include a plurality of target images determined from the moving path of the target object.

It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, and the description will not be repeated here .

Please refer to FIG. 4. FIG. 4 is a schematic structural diagram of an electronic device for performing a pose determination method provided by an embodiment of the present application. The electronic device may include: at least one processor 401, such as a CPU, and at least one communication interface 402 , at least one memory 403 and at least one communication bus 404 . Wherein, the communication bus 404 is used to realize the direct connection and communication of these components. Wherein, the communication interface 402 of the device in the embodiment of the present application is used for signaling or data communication with other node devices. The memory 403 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 403 may also be at least one storage device located far away from the aforementioned processor. Computer-readable instructions are stored in the memory 403 , and when the computer-readable instructions are executed by the processor 401 , the electronic device may, for example, execute the above-mentioned method process shown in FIG. 1 .

It can be understood that the structure shown in FIG. 4 is only for illustration, and the electronic device may also include more or less components than those shown in FIG. 4 , or have a configuration different from that shown in FIG. 4 . Each component shown in FIG. 4 may be implemented by hardware, software or a combination thereof.

An embodiment of the present application provides a readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method process performed by the electronic device in the method embodiment shown in FIG. 1 is executed.

This embodiment discloses a computer program product. The computer program product includes a computer program, and the computer program includes program instructions. When the program instructions are executed by a computer, the computer can execute the methods provided in the above method embodiments. For example, the method may include: acquiring a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to the target object; for each target image, based on the target key point information of the target image, determine the The initial pose information of the target object; determining the initial model of the target object; based on the initial model and the initial pose information corresponding to each of the plurality of target images, determining the target model of the target object and the The target pose of the target object.

In the embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

In addition, a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Furthermore, each functional module in each embodiment of the present application may be integrated to form an independent part, each module may exist independently, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second etc. are used only to distinguish one entity or operation from another without necessarily requiring or implying any such relationship between these entities or operations. Actual relationship or sequence.

The above descriptions are only examples of the present application, and are not intended to limit the scope of protection of the present application. For those skilled in the art, various modifications and changes may be made to the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Industrial Applicability

The present application provides a pose determining method, device, electronic equipment and readable storage medium. By first obtaining an initial model that is different from the target model, and then based on the initial model and the initial pose information of the target object shown in multiple target images, simultaneously determine the target model and target pose of the target object, so that There is no need to pre-generate the target model and no need to design a matching algorithm between the target object and the target model, thereby solving the problems that the target object model cannot be pre-generated and the matching between the target object and the target model is difficult during the attitude estimation process of the target object.

In addition, it can be understood that the pose determining method, device, electronic device and readable storage medium of the present application are reproducible and can be used in various industrial applications. For example, the pose determination method, device, electronic device, and readable storage medium of the present application can be used in corresponding pose estimation processes such as vehicles and drones.

Claims

A pose determination method, characterized in that, comprising:

Acquire a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to the target object;

For each of the target images, based on the target key point information of the target image, determine the initial pose information of the target object;

determining an initial model of the target object;

Based on the initial model and the initial pose information corresponding to each of the plurality of target images, determine a target model of the target object and a target pose of the target object.
The method according to claim 1, further comprising:

Obtain prior information matching the target object; the prior information is used to characterize the structure information and/or size information of the target object;

The determining the target model of the target object and the target pose of the target object based on the initial model and the initial pose information corresponding to each of the plurality of target images includes:

Based on the prior information, the initial model, and the initial pose information corresponding to each of the multiple target images, determine a target model of the target object and a target pose of the target object.
The method according to claim 2, wherein the target of the target object is determined based on the prior information, the initial model, and the initial pose information corresponding to each of the plurality of target images. Model and target pose of the target object, including:

Calculate a minimum error value based on the prior information, the initial model, and a plurality of the initial pose information; wherein, the minimum error value is characterized by an object specification error value and a reprojection error value; the object specification The error value represents the error between the structure information and/or size information corresponding to the obtained object model and the prior information; the reprojection error value represents the projected point of the obtained object model in the corresponding target image and the corresponding reprojection error between keypoints;

The object model optimized when the minimum error value is obtained is determined as the target model, and the pose optimized when the minimum error value is obtained is determined as the target pose.
The method according to claim 3, wherein the minimum error value is obtained based on the following steps:

When it is detected that the error value is less than the error threshold, the error value is determined as the minimum error value; or

When the number of iterations reaches the upper limit of iterations, the smallest error value among the plurality of error values is determined as the minimum error value; wherein, when the upper limit of iterations is reached, the threshold of the number of iterations corresponding to the set of images to be processed includes the target image The number matches.
The method according to claim 3 or 4, wherein the calculation of the minimum error value based on the prior information, the initial model and a plurality of initial pose information includes:

For each iterative calculation, based on the prior information and the current model of the target object obtained by the current optimization, calculate the current object specification error value; and

Based on the current model and the current corresponding initial pose information, calculate the current reprojection error value; and

The minimum error value is determined based on the current object specification error value and the current reprojection error value.
The method according to claim 5, wherein the determining the minimum error value based on the current object specification error value and the current reprojection error value comprises:

determining the product or sum corresponding to the object specification error value of the current time and the reprojection error value of the current time as the error value calculated by the current iteration; and

The minimum error value is determined based on the error value calculated by the current iteration and the error value calculated by historical iterations.
The method according to any one of claims 1 to 6, wherein the target key point information includes position information of the target key point; and

For each of the target images, determining the initial pose information of the target object based on the target key point information of the target image includes:

Based on the position information of the target key point and the camera calibration information, determine the initial position information of the target key point in the world coordinate system; the world coordinate system includes a coordinate system with the motion plane of the target object as a coordinate plane;

determining initial yaw angle information matching the initial position information;

Based on the initial position information and the initial yaw angle information, determine initial pose information of the target object.
The method according to claim 7, wherein the determining the initial yaw angle information matching the initial position information comprises:

Within the target angle range, select a plurality of angles to match the initial position information, and determine the reprojection error values corresponding to the model key points corresponding to the target key points at each angle;

The angle information corresponding to the smallest reprojection error value is determined as the initial yaw angle information.
The method according to any one of claims 1 to 8, wherein the initial model is determined in advance based on the following steps:

For each object model in the object model library, determine the projection points of the model key points representing the object model in the target image; and

determining a reprojection error value between the projected point and a corresponding keypoint of the target image;

The object model corresponding to the smallest reprojection error value is determined as the initial model.
The method according to any one of claims 1 to 9, wherein the set of images to be processed includes a plurality of target images determined from the moving path of the target object.
A pose determining device is characterized in that it comprises:

An acquisition module configured to acquire a set of images to be processed; the set of images to be processed includes a plurality of target images corresponding to the target object;

The first determination module is configured to, for each of the target images, determine the initial pose information of the target object based on the target key point information of the target image;

a second determining module configured to determine an initial model of the target object;

The third determination module is configured to determine the target model of the target object and the target pose of the target object based on the initial model and the initial pose information respectively corresponding to the plurality of target images.
An electronic device, characterized by comprising a processor and a memory, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the operation according to claims 1 to 10 any one of the methods described.
A readable storage medium on which a computer program is stored, wherein the computer program executes the method according to any one of claims 1 to 10 when executed by a processor.
A computer program product, the computer program product comprising a computer program, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer is capable of performing the Methods.