CN112800806B

CN112800806B - Object pose detection tracking method and device, electronic equipment and storage medium

Info

Publication number: CN112800806B
Application number: CN201911105092.9A
Authority: CN
Inventors: 张惊涛; 程骏; 胡淑萍; 王东; 郭渺辰; 庞建新; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Beijing Youbixuan Intelligent Robot Co ltd
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2023-10-13
Anticipated expiration: 2039-11-13
Also published as: CN112800806A

Abstract

The application provides an object pose detection tracking method, an object pose detection tracking device, electronic equipment and a storage medium, wherein the object pose detection tracking method comprises the following steps: acquiring a boundary box area containing a target object based on the acquired image so as to acquire a target object image according to the boundary box area; acquiring a first characteristic point of the target object and a characteristic descriptor corresponding to the first characteristic point from the target object image; inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching so as to obtain second feature points matched with the three-dimensional model; performing pose solving on the target object according to the second feature points to obtain a pose estimation result of the target object, wherein the pose solving mode comprises iterative solving; and tracking the pose of the target object according to the pose estimation result. The method effectively distinguishes the same or similar objects in the same scene, avoids the error of pose calculation, and is time-saving and accurate.

Description

Object pose detection tracking method and device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to an object pose detection tracking method, an object pose detection tracking device, electronic equipment and a storage medium.

Background

With the continuous development of the field of computer vision, the visual detection technology has wide application in various fields of industry, and the visual guidance and positioning technology becomes a main means for obtaining the surrounding environment of the operation in the robot grabbing technology.

Currently, a robot gripping method based on monocular vision generally adopts a camera installed above a robot working space, so that a target object and the tail end of a manipulator are simultaneously present in the field of view of the camera, and further a positional relationship between the target object and the robot is established by taking the camera as a medium. However, in practical application, when a plurality of identical objects exist in the same scene, because the features of the identical objects are identical or similar, one-to-many imagination exists when the features are matched, so that errors occur in pose calculation, and the pose of the target object cannot be accurately obtained; because a large number of same or similar characteristic points exist, the number of times of iteration needed in the pose iteration solving process cannot be estimated, the algorithm efficiency is reduced, a large amount of computing resources are occupied, and real-time grabbing is difficult to achieve.

Disclosure of Invention

In view of the above, embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for detecting and tracking a pose of an object, which are intended to solve the problem in the prior art that when a plurality of identical objects exist in the same scene, because identical objects have identical or similar features, there is a one-to-many imagination when the features match, the pose calculation is easy to be caused to be wrong, and the pose of a target object cannot be accurately obtained; and because a large number of same or similar characteristic points exist, the number of times of iteration needed in the pose iteration solving process cannot be estimated, the algorithm efficiency is reduced, a large amount of computing resources are occupied, and one of technical defects such as real-time grabbing and the like is difficult to realize.

A first aspect of an embodiment of the present application provides an object pose detection and tracking method, where the object pose detection and tracking method includes:

acquiring a boundary box area containing a target object based on the acquired image so as to acquire a target object image according to the boundary box area;

acquiring a first characteristic point of the target object and a characteristic descriptor corresponding to the first characteristic point from the target object image;

inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching so as to obtain second feature points matched with the three-dimensional model;

Performing pose solving on the target object according to the second feature points to obtain a pose estimation result of the target object, wherein the pose solving mode comprises iterative solving;

and tracking the pose of the target object according to the pose estimation result.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before the step of inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching to obtain second feature points matched with the three-dimensional model, the method further includes:

and filtering the first characteristic points obtained from the target object image.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the step of performing filtering processing on the first feature point obtained from the target object image includes:

eliminating first characteristic points which do not meet preset contrast conditions; and/or

And eliminating the first characteristic point with the edge response.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the step of rejecting the first feature point that does not meet the preset contrast condition includes:

Calculating a contrast value of the first feature point;

comparing the contrast value with a preset contrast threshold value;

and when the contrast value is smaller than a preset contrast threshold value, eliminating the first characteristic point.

With reference to the first aspect, in a fourth possible implementation manner of the first aspect, before the step of performing pose solving on the target object according to the second feature point to obtain a pose estimation result of the target object, the method includes:

counting the number of the second feature points;

comparing the number of the second characteristic points with a preset characteristic point number condition to determine whether to perform pose solving on the target object according to the second characteristic points, wherein when the number of the second characteristic points meets the preset characteristic point number condition, the pose solving is performed on the target object according to the second characteristic points.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the step of performing pose tracking on the target object according to the pose estimation result includes:

acquiring a third feature point for determining the pose estimation result;

calculating a convex hull surrounding frame of the third characteristic point in the original target object image;

And performing edge expansion processing on the convex hull surrounding frame according to a preset step length to obtain a boundary frame area for acquiring the target object image in the next frame of acquired image.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, pose solving is performed on the target object according to the second feature point to obtain a pose estimation result of the target object, where a pose solving manner includes an iterative solving step, including:

and carrying out iterative solution on the target object through a Levenberg-Marquardt algorithm.

A second aspect of an embodiment of the present application provides an object pose detection tracking device, including:

the first acquisition module is used for acquiring a boundary box area containing the target object based on the acquired image so as to acquire the target object image according to the boundary box area;

the second acquisition module is used for acquiring a first characteristic point of the target object and a characteristic descriptor corresponding to the first characteristic point from the target object image;

the matching module is used for inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching so as to obtain second feature points matched with the three-dimensional model;

The solving module is used for carrying out pose solving on the target object according to the second characteristic points so as to obtain a pose estimation result of the target object, wherein the pose solving mode comprises iteration solving;

and the tracking module is used for tracking the pose of the target object according to the pose estimation result.

A third aspect of an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the object pose detection tracking method according to any one of the first aspects when the computer program is executed.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the object pose detection tracking method according to any of the first aspects.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

the method comprises the steps of acquiring a boundary box area containing a target object based on an acquired image so as to acquire a target object image according to the boundary box area; acquiring a first characteristic point of the target object and a characteristic descriptor corresponding to the first characteristic point from the target object image; inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching so as to obtain second feature points matched with the three-dimensional model; performing pose solving on the target object according to the second feature points to obtain a pose estimation result of the target object, wherein the pose solving mode comprises iterative solving; and tracking the pose of the target object according to the pose estimation result. According to the method, the image information corresponding to the target object is locked in advance by acquiring the boundary box area of the target object, so that the same or similar objects in the same scene can be effectively distinguished, the extraction range of the characteristic points is reduced, the time consumption of pose estimation and calculation is reduced, the pose calculation error is avoided, and the pose of the target object is accurately obtained. In addition, the pose estimation result is obtained according to the acquired image of the previous frame to track the pose of the target object in the image of the next frame, repeated target detection operation aiming at the same target object can be avoided, the object pose detection tracking process is optimized, the detection time is saved, and the detection efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a basic method of an object pose detection tracking method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a method for filtering a first feature point in the object pose detection tracking method according to the embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for verifying whether pose solving is performed in an object pose detection tracking method according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for tracking the pose of a target object in the method for detecting and tracking the pose of an object according to the present embodiment;

fig. 5 is a schematic structural diagram of an object pose detection tracking device according to an embodiment of the present application;

fig. 6 is a schematic diagram of an electronic device for implementing an object pose detection tracking method according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

In some embodiments of the present application, referring to fig. 1, fig. 1 is a schematic flow chart of a basic method of detecting and tracking an object pose according to an embodiment of the present application, which is described in detail below:

in step S101, a bounding box region containing a target object is acquired based on the acquired image to acquire a target object image from the bounding box region.

In this embodiment, an image is captured in real time on line by a calibrated monocular camera, where the captured image may be a photograph or an image frame in a video image. And detecting a boundary box area containing a target object in the acquired image by utilizing a pre-trained target detection model, and further carrying out image interception from the acquired image according to the boundary box area to obtain a target object image. The bounding box area is a bounding box area and is obtained through a frame regression algorithm, and the representation format is the left upper corner coordinate and the right lower corner coordinate of the bounding box. The border regression algorithm (Bounding Box Regression) can be used for fine-tuning an object candidate region (namely a tuning box region) to be output, so that the border of the object candidate region after fine-tuning is closer to the real border of the object.

The target detection model is obtained by collecting and manufacturing positive and negative sample sets of detected objects, and further performing offline training on the collected and manufactured positive and negative sample sets of the detected objects based on a Mobilene-SSD model structure. It will be appreciated that the object detection model is not limited to the training method described above, and in other embodiments may be obtained by training the object offline using conventional feature plus classifier methods (e.g., hog+svm).

In step S102, a first feature point of the target object and a feature descriptor corresponding to the first feature point are acquired from the target object image.

In this embodiment, a sift algorithm is used to perform feature extraction processing on the target object image, so as to obtain a two-dimensional feature point of the target object, and then a feature descriptor corresponding to the two-dimensional feature point is obtained by performing feature point description on the two-dimensional feature point.

In this embodiment, the sift algorithm performs scale transformation on the target object image to construct a gaussian pyramid image with a plurality of different scale spaces, and identifies a spatial extremum point in the gaussian pyramid image by using a gaussian differential function, where the point corresponding to the spatial extremum point in the target object image is the first feature point of the target object.

In this embodiment, the feature descriptor is a 128-dimensional feature vector, which not only includes feature points, but also includes neighboring pixel points around the feature points that contribute to the feature points. In this embodiment, a gradient direction histogram may be generated by calculating gradient directions of all pixel points in a neighborhood centered on the first feature point, and then a stable direction of the local structure of the first feature point may be obtained according to the gradient direction histogram, and the stable direction may be used as the main direction of the first feature point. For example, gradient directions in the range of 0 to 360 °, normalization processing may be performed for these gradient directions into 36 directions, each representing a range of 10 °. And generating a gradient direction histogram by accumulating the number of the first characteristic points falling in each direction, and taking the maximum value of the statistical values in the histogram and the directions exceeding 80% of the maximum value as main directions of the first characteristic points. And then, the coordinate axes of the pixel points in the neighborhood of the first characteristic point are rotated to be consistent with the main direction of the first characteristic point, so that the rotation invariance of the image is ensured. Since the three information of the position, the scale and the direction of the first feature point can be obtained after the main direction of the first feature point is determined, a neighborhood of 16 x 16 can be taken as a sampling window by taking the position of the first feature point as the center, the pixel points in the sampling window are subjected to block processing to obtain 4*4 total 16 block areas, and the statistical value of the gradient value of each block area in 8 directions is counted by using a histogram, so that a 128-dimensional feature descriptor can be obtained.

Wherein, the characteristic descriptor comprises the following properties:

the method has the advantages of keeping invariance to rotation, scale scaling and brightness change, and keeping a certain degree of stability to video angle change, affine transformation and noise.

The rotation invariance can be used for matching after the rotation transformation of the image.

The size independence, when constructing the scale space, the pictures under different scale shrinkage are constructed, and when the feature points are obtained, the detection is carried out under various scale spaces.

Resistance to brightness variation the normalization of the various dimensions in the descriptor will reduce the effect of brightness variation.

In step S103, the feature descriptors are input into a pre-established three-dimensional model to perform feature point matching, so as to obtain second feature points matched with the three-dimensional model.

The pre-established three-dimensional model comprises three-dimensional feature points of the measured object under the self coordinate system and feature descriptors corresponding to the three-dimensional feature points. In the application, the feature descriptors corresponding to the first feature points are called as first feature descriptors, the feature descriptors corresponding to the three-dimensional feature points are called as second feature descriptors, and the three-dimensional model is generated by performing 360-degree surrounding photographing modeling on the object to be measured by using an SFM (English full name Structure from Motion) technology, wherein the SFM technology can calculate three-dimensional information from time series two-dimensional images. The detected object is an object needing to be subjected to pose detection. In this embodiment, the first feature descriptors are input into a pre-established three-dimensional model to perform feature point matching, so that the first feature descriptors are matched with the second feature descriptors in the three-dimensional model. If the first feature descriptor is successfully matched with the second feature descriptor, the first feature point corresponding to the first feature descriptor can be matched with the three-dimensional feature point in the three-dimensional model. At this time, only the first feature point which can be successfully matched in the target object image is required to be recorded. And carrying out matching screening on the first characteristic points through a three-dimensional model to obtain second characteristic points.

In step S104, pose solving is performed on the target object according to the second feature point, so as to obtain a pose estimation result of the target object, where the pose solving mode includes iterative solving.

In computer vision, three coordinate systems are typically included, an image coordinate system, a camera coordinate system, and a world coordinate system, respectively. In the present embodiment, the conversion relationship between the image coordinate system and the camera coordinate system and the conversion relationship between the camera coordinate system and the world coordinate system can be determined by performing calibration processing on the monocular camera. In particular by obtaining the internal and external parameters of the monocular camera. The conversion relation between the image coordinate system and the camera coordinate system can be determined according to internal parameters, wherein the internal parameters comprise a pixel focal length parameter and an optical axis offset parameter along the pixel arrangement horizontal direction, a pixel focal length parameter and an optical axis offset parameter along the pixel arrangement vertical direction, and a radial distortion parameter and a tangential distortion parameter in the monocular camera imaging process. The external parameters include rotation parameters and translation parameters.

In this embodiment, the two-dimensional first feature descriptors of the first feature points are matched with the three-dimensional second feature descriptors of the three-dimensional model to obtain second feature points. And carrying out pose solving on the target object according to the second characteristic points, and converting the position information of the second characteristic points under an image coordinate system into the position information under a world coordinate system, so as to obtain the pose relation between the target object and the monocular camera, and obtaining the pose estimation result of the target object 6D according to the pose relation. In this embodiment, when pose solving is performed on the target object according to the second feature points, based on a PNP problem, the number of the second feature points is greater than 4, by using 4 projection relation solutions calculated by 3 second feature points, and then substituting each point into the 4 projection relation solutions from the 4 th second feature point to obtain 4 projection relations, and obtaining the projection relation with the smallest error as the pose relation between the target object and the monocular camera. In the process of acquiring the pose relationship between the target object and the monocular camera, for the first feature points extracted from the acquired image, obtaining corresponding second feature points after matching and screening, using the second feature points as a feature point set, further fitting noise feature points in the set, for example, obtaining the second feature points corresponding to the pose of the characterization target object through solving through a random consistency sampling RANSAC algorithm, filtering the second feature points not representing the pose of the target object in the set through fitting, and further determining the pose of the target object according to the second feature points corresponding to the maximum fitting feature points.

In this embodiment, for an image acquired by one frame, a pose estimation result obtained by performing pose solving on a target object based on the image is a result close to a real pose, and an error exists between the pose estimation result and the real pose of the target object in a world coordinate system. Therefore, in this embodiment, the pose estimation result of the target object in the acquired image may be obtained by means of iterative solution. The iterative solution comprises the step of circularly solving the pose of the target object by taking the pose estimation result obtained by the current solution as an initial value in the next solution, so that the obtained pose estimation result is more similar to the real pose of the target object.

In some embodiments of the present application, the target object may be iteratively solved by a Levenberg-Marquardt algorithm, so that the pose estimation result obtained later is closer to the true pose. It will be appreciated that the method of iterative solution includes, but is not limited to, numerical solution implementations that provide numerical nonlinear minimization (local minimization) by the Levenberg-Marquardt algorithm. In this embodiment, the process of performing iterative solution is performed in a loop, for example, the pose estimation result obtained by the previous frame of image is used as an initial value for solving the pose estimation result corresponding to the next frame of image, and performing Levenberg-Marquardt iterative solution, so that in the pose estimation process, the pose estimation accuracy is improved, and the number of iterative solutions is shortened.

In step S105, pose tracking is performed on the target object according to the pose estimation result.

In this embodiment, after the pose estimation result of the target object is obtained by performing pose solving on the target object according to the matched feature points obtained from the image acquired by the monocular camera, pose tracking is further performed on the target object according to the second feature points for determining the pose of the target object in the pose estimation result, specifically, the position of the target object in the image acquired by the next frame is predicted based on the feature points for characterizing the position of the target object obtained from the image acquired by the previous frame, and a boundary box region (ROI region) for acquiring the image of the target object in the image acquired by the next frame is generated, and then the image of the target object in the image acquired by the next frame is directly acquired according to the boundary box region generated by prediction. It can be appreciated that in this embodiment, when the pose tracking of the target object fails, the image acquisition operation is returned to acquire an image again and perform target detection on the target object.

According to the object pose detection tracking method, the image information corresponding to the target object is locked in advance by acquiring the boundary box area of the target object, so that the same or similar objects in the same scene can be effectively distinguished, the feature point extraction range is shortened, the time consumption of pose estimation calculation is reduced, the pose calculation error is avoided, and the pose of the target object is accurately obtained. In addition, the pose estimation result is obtained according to the acquired image of the previous frame to track the pose of the target object in the image of the next frame, repeated target detection operation aiming at the same target object can be avoided, the object pose detection tracking process is optimized, the detection time is saved, and the detection efficiency is improved.

In some embodiments of the present application, a spatial extreme point in the gaussian pyramid image is obtained by identifying a gaussian derivative function using a sift (spatial scale transformation) algorithm, where the spatial extreme point is a first feature point obtained from the target object image. Since some of these spatial extreme points are obtained in the discrete space of the gaussian derivative function, they are not the first feature points in the true sense. Therefore, in this embodiment, before the feature point matching is performed with the three-dimensional model, the filtering process may be performed on the first feature point obtained from the target object image to filter the nonsensical first feature point, so as to improve the matching speed and save the time consumption in the pose detection process. In this embodiment, the filtering process may be performed by two ways of eliminating the first feature points that do not meet the preset contrast condition and/or eliminating the first feature points with edge response.

In some embodiments of the present application, referring to fig. 2, fig. 2 is a flow chart of a method for filtering a first feature point in an object pose detection tracking method according to an embodiment of the present application. The details are as follows:

In step S201, calculating a contrast value of the first feature point;

in step S202, the contrast value is compared with a preset contrast threshold;

in step S203, when the contrast value is smaller than a preset contrast threshold, the first feature point is rejected.

In this embodiment, when the first feature points that do not meet the preset contrast condition are removed, the accurate positions of the first feature points may be obtained by deriving the taylor expansion of the spatial scale function and making the taylor expansion zero, and the formula for calculating the contrast values of the feature points may be obtained by substituting the obtained accurate positions into the taylor expansion. And calculating a contrast value of the first feature point aiming at the obtained first feature point, comparing the calculated contrast value with a preset contrast threshold value to judge whether the first feature point meets a preset contrast condition, and when the contrast value is smaller than the preset contrast threshold value, characterizing that the contrast value of the first feature point does not meet the preset contrast condition, and eliminating the first feature point.

In some embodiments of the present application, when the first feature point with the edge response is eliminated, the spatial extreme point in the gaussian pyramid image is obtained by identifying the first feature point with the gaussian derivative function, the extreme value has a larger principal curvature at the place crossing the edge, and has a smaller principal curvature at the place perpendicular to the edge. Based on this feature, a 2×2 Hessian matrix can be used to characterize the principal curvature of the first feature point, where the principal curvature of the first feature point is proportional to two feature values in the Hessian matrix, so that in this embodiment, by using the relationship between the principal curvature of the first feature point and two feature values in the Hessian matrix to detect whether the principal curvature of the first feature point meets a threshold requirement, when the principal curvature of the first feature point does not meet the threshold requirement, the first feature point is characterized as a point of edge response, and the first feature point is eliminated.

In some embodiments of the present application, based on the PNP problem, before the pose relationship between the target object and the camera, it may be further verified whether the number of second feature points satisfies the need of pose solving, and then it is determined whether to perform pose solving, so that time consumption in the pose estimation process is reduced, and calculation burden of the device is reduced. Referring to fig. 3, fig. 3 is a flow chart of a method for verifying whether pose solving is performed in the object pose detection tracking method according to the embodiment of the present application.

In step S301, the number of the second feature points is counted;

in step S302, the number of the second feature points is compared with a preset feature point number condition to determine whether to perform pose solving on the target object according to the second feature points, where when the number of the second feature points meets the preset feature point number condition, pose solving is performed on the target object according to the second feature points.

Based on the PNP solving problem existing in the pose relation process of the target object and the camera, in order to solve the closest projection relation between the feature points in the world and the pixel points in the image imaging, a sufficient number of two-dimensional feature points are required to be obtained as second feature points for solving.

In this embodiment, the number of feature points required to be satisfied for pose solving is configured to be greater than 4 based on the PNP problem. And comparing the counted number of the second characteristic points with the preset characteristic point number condition by counting the number of the two-dimensional characteristic points which can be used as the second characteristic points in the target object image, and only when the number of the second characteristic points meets the preset characteristic point number condition, carrying out pose solving on the target object according to the second characteristic points, otherwise, returning to an initial image acquisition step, and extracting enough number of the second characteristic points for pose solving by re-acquiring the image. And when the number of the second feature points obtained through statistics is smaller than or equal to 3, the image acquisition and the feature extraction processing are needed to be carried out again so as to obtain enough second feature points for carrying out pose solving.

In some embodiments of the present application, referring to fig. 4, fig. 4 is a flowchart of a method for tracking a pose of a target object in the method for detecting and tracking a pose of an object according to the present embodiment. The details are as follows:

In step S401, a third feature point for determining the pose estimation result is acquired;

in step S402, calculating a convex hull frame of the third feature point in the original target object image;

in step S403, edge expansion processing is performed on the convex hull frame according to a preset step length, so as to obtain a bounding box area for obtaining the target object image in the next frame of collected image.

And filtering out some second characteristic points which are not in pose by fitting processing in the pose solving process of the target object according to the second characteristic points, and obtaining the pose of the target object according to the second characteristic points corresponding to the most fitting characteristic points, so as to determine the pose estimation result of the target object. In the present embodiment, the second feature point of the determination target object pose is referred to as a third feature point. Therefore, when the pose tracking is performed on the target object, a third feature point for determining the pose estimation result can be obtained according to the pose estimation result of the target object obtained by detecting the image acquired in the previous frame. The third characteristic point is a group of two-dimensional characteristic points representing the pose of the target object in the image acquired in the previous frame, the position of the outermost convex point in the group of two-dimensional characteristic points is calculated, and connecting the points raised by the outermost raised points to obtain a region which is a convex hull region of the third characteristic point in the original target object image, wherein the points raised by the outermost raised points are used as vertexes. After the convex hull region is obtained, edge expansion processing is carried out on the convex hull frame according to the edge line of the convex hull region and a preset step length, so that the convex hull region is expanded, the expanded convex hull region is used as a prediction region where a target object possibly appears, and the prediction region is used as a boundary frame region for acquiring the image of the target object in a next frame acquisition image. According to the method, the boundary box area for acquiring the target object image in the next frame of acquisition image is acquired by utilizing the convex hull area corresponding to the characteristic points for representing the pose of the target object in the previous frame of image, so that the characteristic point extraction area is reduced, and the time consumption for calculating the characteristic points is reduced.

It should be understood that, the sequence number of each step in the foregoing embodiment does not mean the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

In some embodiments of the present application, referring to fig. 5, fig. 5 is a schematic structural diagram of an object pose detection tracking device according to an embodiment of the present application, and the detailed description is as follows:

the object pose detection tracking device comprises: a second acquisition module 501, a second acquisition module 502, a matching module 503, a solving module 504, and a tracking module 505. Wherein the first obtaining module 501 is configured to obtain a bounding box area containing a target object based on the acquired image, so as to obtain an image of the target object according to the bounding box area; the second obtaining module 502 is configured to obtain a first feature point of the target object and a feature descriptor corresponding to the first feature point from the target object image; the matching module 503 is configured to input the feature descriptors into a pre-established three-dimensional model to perform feature point matching, so as to obtain second feature points matched with the three-dimensional model; the solving module 504 is configured to perform pose solving on the target object according to the second feature point, so as to obtain a pose estimation result of the target object, where a pose solving manner includes iterative solving; the tracking module 505 is configured to perform pose tracking on the target object according to the pose estimation result.

The object pose detection and tracking device is in one-to-one correspondence with the object pose detection and tracking method.

In some embodiments of the present application, referring to fig. 6, fig. 6 is a schematic diagram of an electronic device for implementing an object pose detection tracking method according to an embodiment of the present application. As shown in fig. 6, the electronic device 6 of this embodiment includes: a processor 61, a memory 62 and a computer program 63, such as an object pose detection tracking program, stored in the memory 62 and executable on the processor 61. The processor 61, when executing the computer program 62, implements the steps in the above-described respective object pose detection tracking method embodiments. Alternatively, the processor 61, when executing the computer program 63, performs the functions of the modules/units in the above-described device embodiments.

Illustratively, the computer program 63 may be partitioned into one or more modules/units that are stored in the memory 62 and executed by the processor 61 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions for describing the execution of the computer program 63 in the electronic device 6. For example, the computer program 63 may be split into:

The electronic device may include, but is not limited to, a processor 61, a memory 62. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the electronic device 6 and is not meant to be limiting as the electronic device 6 may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.

The processor 61 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 62 may be an internal storage unit of the electronic device 6, such as a hard disk or a memory of the electronic device 6. The memory 62 may also be an external storage device of the electronic device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 6. Further, the memory 62 may also include both an internal storage unit and an external storage device of the electronic device 6. The memory 62 is used to store the computer program as well as other programs and data required by the electronic device. The memory 62 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. . Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The object pose detection and tracking method is characterized by comprising the following steps of:

Performing pose tracking on the target object according to the pose estimation result, wherein the pose tracking process comprises the steps of obtaining a third characteristic point for determining the pose estimation result, wherein the third characteristic point is a group of two-dimensional characteristic points for representing the pose of the target object in an image acquired in the previous frame; calculating the position of the outermost convex point in the group of two-dimensional feature points, and determining a region obtained by connecting the outermost convex point as a vertex as a convex hull region of the third feature point in the original target object image; and performing edge expansion processing on the convex hull frame according to a preset step length to obtain a boundary frame area for acquiring the target object image in the next frame of acquired image.

2. The object pose detection tracking method according to claim 1, wherein before the step of inputting the feature descriptors into a pre-established three-dimensional model to perform feature point matching to obtain second feature points matched with the three-dimensional model, further comprising:

3. The object pose detection tracking method according to claim 2, wherein the step of performing filtering processing on the first feature point obtained from the target object image includes:

And eliminating the first characteristic point with the edge response.

4. The object pose detection tracking method according to claim 3, wherein the step of eliminating the first feature points that do not satisfy the preset contrast condition comprises:

calculating a contrast value of the first feature point;

comparing the contrast value with a preset contrast threshold value;

5. The object pose detection tracking method according to claim 1, wherein before the step of performing pose solving on the target object according to the second feature point to obtain a pose estimation result of the target object, the method comprises:

counting the number of the second feature points;

6. The object pose detection tracking method according to claim 1, wherein pose solving is performed on the target object according to the second feature point to obtain a pose estimation result of the target object, and the pose solving method includes an iterative solving step, including:

7. An object pose detection tracking device, characterized in that the object pose detection tracking device comprises:

The tracking module is used for tracking the pose of the target object according to the pose estimation result, wherein the pose tracking process comprises the steps of obtaining a third characteristic point for determining the pose estimation result, wherein the third characteristic point is a group of two-dimensional characteristic points representing the pose of the target object in the image acquired in the previous frame; calculating the position of the outermost convex point in the group of two-dimensional feature points, and determining a region obtained by connecting the outermost convex point as a vertex as a convex hull region of the third feature point in the original target object image; and performing edge expansion processing on the convex hull surrounding frame according to a preset step length to obtain a boundary frame area for acquiring the target object image in the next frame of acquired image.

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the object pose detection tracking method according to any of claims 1 to 6 when the computer program is executed.

9. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the object pose detection tracking method according to any of claims 1 to 6.