CN107403440B

CN107403440B - Method and apparatus for determining a pose of an object

Info

Publication number: CN107403440B
Application number: CN201610329835.0A
Authority: CN
Inventors: 刘振华; 刘殿超; 师忠超
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-05-18
Filing date: 2016-05-18
Publication date: 2020-09-08
Anticipated expiration: 2036-05-18
Also published as: JP6361775B2; JP2017208080A; CN107403440A

Abstract

A method and apparatus for determining a pose of an object is provided, the method comprising: acquiring a sequence of image frames acquired by an object during motion; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and determining the pose of the object based on the result of the motion estimation.

Description

Method and apparatus for determining a pose of an object

Technical Field

The present invention relates to the field of image processing, and more particularly, to a method and apparatus for determining a pose of an object.

Background

In the three-dimensional reconstruction of a scene by image processing, a visual measurement process is usually required. In the vision measurement process, the pose of an object, such as a robot, a vehicle, etc., including the position and orientation of the object, is determined by analyzing the images captured by the associated cameras.

In a vision measuring method, a correspondence between image features is determined by based on information of a current frame and a previous frame of the current frame in an image frame sequence. The corresponding relation obtained by the method is not reliable enough, so that the results of the subsequent motion estimation and visual measurement are not accurate enough, and obvious accumulative errors exist.

Disclosure of Invention

In view of the above, the present invention provides a method and apparatus for determining the pose of an object, which can significantly improve the accuracy of motion estimation, thereby significantly improving the accuracy of determining the pose of an object and reducing the accumulated error.

According to an embodiment of the invention, there is provided a method for determining a pose of an object, comprising: acquiring a sequence of image frames acquired by an object during motion; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and determining the pose of the object based on the result of the motion estimation.

According to another embodiment of the invention, an apparatus for determining a pose of an object, comprises: an image acquisition unit that acquires a sequence of image frames acquired by an object during motion; a feature detection unit that detects a feature of a current frame in the image frame sequence; the relation determining unit is used for determining the corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; a motion estimation unit for performing motion estimation based on the correspondence, the image of the current frame, and the image of the specific frame; and a posture determination unit that determines a posture of the object based on a result of the motion estimation.

According to another embodiment of the present invention, there is provided an object posture determination apparatus including: an image acquisition module that acquires a sequence of image frames acquired by an object during motion; a processor; a memory; and computer program instructions stored in the memory that, when executed by the processor, perform the steps of: obtaining the sequence of image frames from the image acquisition module; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and determining the pose of the object based on the result of the motion estimation

According to another embodiment of the invention, there is provided a computer program product comprising a computer readable storage medium having stored thereon computer program instructions which, when executed by a computer, perform the steps of: acquiring a sequence of image frames acquired by an object during motion; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and determining the pose of the object based on the result of the motion estimation.

In the method and the device for determining the posture of the object, which are provided by the embodiment of the invention, the corresponding relation between the characteristics is determined by utilizing the image information of at least two frames before the current frame and motion estimation is carried out, so that the precision of the motion estimation can be obviously improved, the precision of determining the posture of the object can be obviously improved, and the accumulated error is reduced.

Drawings

FIG. 1 is a diagram illustrating a scenario to which a method and apparatus for determining a pose of an object according to an embodiment of the present invention is applied;

FIG. 2 is a flow chart illustrating the main steps of a method for determining the pose of an object according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating the steps of the main process of determining a correspondence between a feature in a current frame and a feature in that particular frame in a method according to an embodiment of the invention;

FIG. 4 is a flow chart illustrating the steps of a detailed process of determining a correspondence between a feature in a current frame and a feature in the particular frame in a method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the determination of a correspondence between a feature in a current frame and a feature in the particular frame in a method according to an embodiment of the invention;

fig. 6 is a block diagram illustrating a main configuration of an apparatus for determining a posture of an object according to an embodiment of the present invention;

fig. 7 is a block diagram illustrating a detailed configuration of a relationship determination unit in the apparatus for determining the posture of an object shown in fig. 6;

fig. 8 is a block diagram illustrating a detailed configuration of a correspondence relationship determination unit among the relationship determination units shown in fig. 7; and

fig. 9 is a block diagram illustrating a main hardware configuration of an object posture determining apparatus according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

First, a scenario to which the method and apparatus according to an embodiment of the present invention are applied is described with reference to fig. 1.

Fig. 1 is a schematic diagram illustrating a scene to which a method and apparatus for determining a posture of an object (hereinafter, also referred to as an object posture determination method and an object posture determination apparatus, respectively, as appropriate) according to an embodiment of the present invention are applied.

As shown in fig. 1, the method and apparatus for determining the pose of an object according to an embodiment of the present invention are applied to an object 100. The object 100 may include, for example, but is not limited to, a smart robot, a car, a wearable device, and the like. In an example, the object 100 itself may be moving within the scene in which it is located. In another example, the object 100 may be moved within the scene in which it is located by user manipulation or wearing. Hereinafter, the above two cases are collectively referred to as the motion of the object 100.

The object 100 comprises an imaging unit 110. The imaging unit 110 is configured by various image pickup elements such as a still camera, a motion camera, and the like. As the object 100 moves within the scene in which it is located, the imaging unit 110 may take a photograph of the scene, thereby acquiring a static image or a dynamic image (a sequence of image frames).

Thereby, by analyzing the image acquired by the imaging unit 110, the pose of the object 100 may be determined. The pose may include the position and orientation of the object 100.

Next, the process of the method for determining the posture of the object will be described in detail with reference to fig. 2.

FIG. 2 is a flow chart illustrating the main steps of a method for determining the pose of an object according to an embodiment of the present invention.

As shown in fig. 2, first, at step S1, a sequence of image frames acquired by a subject during motion is acquired. In particular, the method may acquire the sequence of image frames acquired by the object in real time during motion of the object. Alternatively, the method may also acquire the sequence of image frames acquired by the object during motion after the object motion ends.

Next, in step S2, a feature of the current frame in the image frame sequence is detected.

In particular, the image frame sequence may comprise t frames (t is a natural number), i.e. the image frame sequence may be represented as image frames (0, 1, …, t-1, t). Here, assuming that the t-th frame is a frame to be processed currently (i.e., the current frame), the 0 th frame to the t-1 th frame are frames before the current frame, and may also be referred to as history frames.

More specifically, the feature may include, for example, at least one of position information and description information of the feature point. As known to those skilled in the art, the description information may be information capable of representing characteristics of a pixel region, such as gradient histogram information, gray histogram information, and the like. Illustratively, the method may employ various Feature extraction algorithms such as SIFT (Scale-Invariant Feature Transform), orb (organized FAST and rotatedtree), etc. to detect the features. Of course, the foregoing is merely exemplary. Those skilled in the art may detect various other features in the image of the current frame using various other feature detection methods known in the art and developed in the future.

Assume that N features are detected in the current frame t by the processing of step S2

I is 1,2, …, N (N is a natural number). Illustratively, each feature includes location information

And description information

. Thereafter, the method proceeds to step S3.

In step S3, a corresponding relationship between the feature in the current frame and the feature in the specific frame is determined based on each feature in the specific frame before the current frame and its corresponding feature in at least one frame before the specific frame, and the motion parameter information between the specific frame and the at least one frame before the specific frame.

Specifically, the specific frame is any one of the historical frames. As an example, to make the result of the pose determination more accurate, the specific frame may be a frame that is temporally closer to the current frame. For example, the specific frame may be a frame previous to the current frame, i.e., the t-1 th frame.

In addition, at least one frame before the specific frame may be one or more frames from the 0 th frame to the t-2 th frame. As an example, to make the result of the pose determination more accurate, the at least one frame preceding the specific frame may be each of the 0 th frame to the t-2 th frame.

Further, for a certain feature in a specific frame, its corresponding features in at least one frame preceding the specific frame represent features determined to correspond to each other by motion estimation processing that has been performed previously. Hereinafter, for convenience of description, features that have been determined to correspond to each other between two frames may sometimes be regarded as the same feature.

Next, a case will be described as an example where the specific frame is the t-1 th frame and at least one frame preceding the specific frame is each of the 0 th frame to the t-2 th frame. However, as described above, those skilled in the art will appreciate that the method of the embodiments of the present invention is not limited thereto.

Thus, in step S3, the correspondence between the feature in the t-th frame and the feature in the t-1-th frame is determined based on the features in the 0-th to t-1-th frames and the motion parameter information between the 0-th to t-1-th frames.

Specifically, fig. 3 shows a specific process of determining the correspondence between the feature in the current frame and the feature in the specific frame in step S3.

As shown in fig. 3, first, in step S31, a feature model of each feature in the specific frame is acquired.

Specifically, a feature model for each feature is formed based on the feature in the particular frame and its corresponding feature in at least one frame preceding the particular frame. Illustratively, a feature model for each feature in the t-1 th frame may be formed based on the feature in the t-1 th frame and its corresponding features in the 0 th to t-2 th frames.

More specifically, as described above, the feature may include at least one of location information and description information. Illustratively, the feature model may be formed by description information of the features.

For example, assume that M features remain after motion estimation of the t-1 th frame ends, where M is a natural number. Thus, the feature model for each feature in the t-1 th frame can be expressed as

Wherein the content of the first and second substances,

representing the feature j in the t-1 th frame and the corresponding features thereof in the t-2 th to 0 th frames; f (-) represents a feature model function, and the concrete expression form thereof differs depending on the detected scene, and is not particularly limited.

It is noted that the feature model of a certain feature in the t-1 th frame can also be understood as description information of the feature expected to correspond to the feature in the t-1 th frame. In other words, description information of a feature corresponding to the feature j in the tth frame is expected

Can be represented by the following expression (1):

in addition, it should be noted that, although the feature model is described above by taking the description information as an example, those skilled in the art can understand that the method of the embodiment of the present invention is not limited thereto, but the feature model may be formed by other information (such as position information) included in the feature, and will not be described in detail herein.

On the other hand, in step S32, a historical posture model of the object is acquired.

The historical pose model of the object is formed based on motion parameter information between the particular frame and at least one frame preceding the particular frame.

Illustratively, a historical pose model of the object may be formed based on motion parameter information between frame 0 and frame t-1. The motion parameter information may include a motion parameter of the object in each dimension of a three-dimensional space. For example, the motion parameter information may include translation parameters, rotation parameters, and the like. Illustratively, the historical pose model of the object is formed by the location information of the features.

Thus, the historical pose model of the object after the t-1 frame may be represented as g (P)_t-1,t-2,…,P_2,1,P_1,0). Wherein, P_t-1,t-2Representing motion parameter information between the t-2 th frame and the t-1 th frame, P_2,1Representing motion parameter information between frame 1 and frame 2, P_1,0Representing motion parameter information between frame 0 and frame 1, and so on. The motion parameter information may be derived based on location information of the feature. g (-) represents a posture model function, and the concrete expression form thereof differs depending on the detected scene, and is not particularly limited.

It should be noted that, although step S32 is shown in fig. 3 as being subsequent to step S31, step S31 and step S32 may be executed in any order (e.g., in parallel or in reverse).

Then, in step S33, a current pose model of the object is predicted based on the historical pose model

Wherein, in the step (A),

representing the prediction of the pose model (motion parameters) between frame t-1 and frame t.

More specifically, in the first example, the current pose model may be predicted by a regression method. In a second example, the current pose model may be predicted by a maximum likelihood approach. In a third example, the current pose model may be predicted by a maximum a posteriori probability method. Of course, the foregoing is merely exemplary. Other suitable methods may be employed by those skilled in the art to predict the current pose model of the object in light of the teachings herein.

Further, those skilled in the art will appreciate that the current pose model of the object as contemplated herein is a model in three-dimensional space. When a calculation of a two-dimensional image plane is involved, a corresponding two-dimensional model can be obtained by means of a transformation function.

Specifically, for example, the position information of the feature corresponding to the feature j in the t-th frame can be predicted by the following expression (2)

Wherein, as described above,

is the predicted current pose model of the object in three-dimensional space, in other words, the predicted motion parameter information between the t-1 th frame and the t-th frame;

is the location information of the jth feature in the t-1 th frame in two-dimensional space (image plane);

h (-) is a transformation function for two-dimensional position information of the jth feature in the t-1 th frame

Is converted intothe three-dimensional position information corresponding to the jth feature in the t-1 frame is then based on the three-dimensional position information corresponding to the jth feature in the t-1 frame and the predicted motion parameter information between the t-1 frame and the t frame

Predicting three-dimensional position information of a feature corresponding to the feature j in the tth frame and inversely converting the predicted three-dimensional position information into two-dimensional position information, i.e.

. The transformation and inverse transformation of the position coordinates between the two-dimensional space and the three-dimensional space are known to those skilled in the art and will not be described in detail herein.

Furthermore, it should be noted that, although the historical or current posture model is described above by taking the position information as an example, a person skilled in the art can understand that the method of the embodiment of the present invention is not limited thereto, but the historical or current posture model may be formed by other information (such as description information) included in the feature, and will not be described in detail herein.

Next, in step S34, the correspondence relationship is determined based on the feature model and the predicted current posture model.

The process for determining the correspondence relationship will be further described below with reference to fig. 4.

Fig. 4 is a flowchart illustrating the steps of a detailed process of determining a correspondence between a feature in a current frame and a feature in the specific frame in a method according to an embodiment of the present invention.

As shown in fig. 4, first, in step S341, a first matching degree between each feature in the current frame and the feature model of each feature in the specific frame is calculated.

Specifically, a first degree of match between the features in the t-th frame and the feature model of the t-1 th frame

Can be calculated by the following expression (3):

wherein the content of the first and second substances,

description information obtained by detection representing the ith feature in the t-th frame;

description information representing the predicted feature i in the t-th frame corresponding to the feature j in the t-1 th frame, that is, the feature model described above; w is a₁(. cndot.) is a calculation function of the first matching degree, which can be appropriately designed by those skilled in the art according to needs, and is not particularly limited herein. In an exemplary manner, the first and second electrodes are,

and

the closer they are, that is, the more each feature in the current frame fits to the feature model, the greater the first degree of match is calculated.

And

the further apart between them, that is, the less each feature in the current frame does not fit into the feature model, the smaller the first degree of match calculated.

Then, in step S342, a second degree of matching between each feature in the current frame and the predicted current pose model is calculated.

In particular, a second degree of match between the features in the tth frame and the predicted current pose model

Can be calculated by the following expression (4):

wherein the content of the first and second substances,

position information representing a feature corresponding to the feature j in the predicted tth frame, which may represent the predicted current pose model as described above;

position information indicating an ith feature in the t-th frame;

w₂(. cndot.) is a calculation function of the second matching degree, which can be appropriately designed by those skilled in the art according to needs, and is not particularly limited herein.

In an exemplary manner, the first and second electrodes are,

and

the closer the position information of each feature in the current frame is to the expected current pose model, i.e., the greater the calculated second degree of matching.

And

the further apart from each other, that is, the less the position information of each feature in the current frame is not consistent with the predicted current posture model, the smaller the calculated second matching degree is.

After the first matching degree is obtained through step S341 and the second matching degree is obtained through step S342, the comprehensive matching degree of each feature in the current frame is calculated based on the first matching degree and the second matching degree in step S343.

Illustratively, at least one of the first matching degree and the second matching degree is larger, and the calculated comprehensive matching degree is larger. And the first matching degree and the second matching degree are smaller, and the calculated comprehensive matching degree is smaller. Specifically, in the first example, the integrated matching degree may be calculated by summing the first matching degree and the second matching degree. In a second example, the composite degree of match may be calculated by adding the first degree of match to the second degree of match. Of course, the above description is only an example, and those skilled in the art can design other various calculation methods of the comprehensive matching degree based on the above description, and the calculation method is not limited in detail here.

Then, in step S344, based on the integrated matching degree, the correspondence between the features in the current frame and the features in the specific frame is determined.

An exemplary manner of determining the correspondence relationship will be described below with reference to fig. 5.

Fig. 5 is a schematic diagram illustrating the determination of the correspondence between a feature in the current frame and a feature in the specific frame in the method according to an embodiment of the present invention.

As shown in FIG. 5, X₁、X₂And X₃Respectively representing features in the t-1 th frame, Y₁、Y₂And Y₃Respectively representing the features in the t-th frame, and w representing the comprehensive matching degree between the features in the t-1 th frame and the features in the t-th frame. For example, w₂₃Represents X₂And Y₃The comprehensive matching degree between the two; and so on.

That is, the overall degree of match may be calculated for all combinations between each feature in the t-th frame and each feature in the t-1 th frame. Then, an overall composite match metric for all features in the tth frame with different combinations selected can be calculated. For example, when (X) is selected₁,Y₁)、(X₂,Y₂) And (X)₃,Y₃) When the first candidate is used as the corresponding relation, the first overall comprehensive matching degree w is calculated₁₁+w₂₂+w₃₃. When selecting (X)₁,Y₁)、(X₂,Y₃) And (X)₃,Y₂) When the second candidate is used as the corresponding relation, the second overall comprehensive matching degree w is calculated₁₁+w₂₃+w₃₂(ii) a And so on. After the overall comprehensive matching degrees of the candidates of all the corresponding relations are calculated, the maximum overall comprehensive matching degree is selected from the overall comprehensive matching degrees, and the corresponding relations are obtained.

Of course, the above-described manner of determining the correspondence relationship is merely an example. Those skilled in the art can determine the corresponding relationship between the features in the current frame and the features in the specific frame through other various dynamic planning ways. For example, and still taking FIG. 5 as an example, one of the features (e.g., X) in the t-1 th frame may be selected first₁) Features of maximum overall degree of matching, e.g. Y₃(ii) a Then, the next feature (e.g., X) is selected from the remaining combinations₂) Features of maximum overall degree of matching, e.g. Y₂(ii) a And so on, thereby obtaining the correspondence (X)₁,Y₃)、(X₂,Y₂) And (X)₃,Y₁)。

The process of determining the correspondence relationship in step S3 of fig. 1 is described in detail above with reference to fig. 3 to 5.

It should be noted that the processing manner of determining the correspondence relationship by the feature model and the pose model is described above. However, the method of the embodiments of the present invention is not limited thereto. The person skilled in the art can determine the correspondence by other suitable means on the basis of this. For example, the motion parameter information may be obtained directly based on the position information of the features of the 0 th frame to the t-1 th frame, and the corresponding relationship between the features in the t-th frame and the features in the t-1 th frame may be determined based on the description information of the 0 th frame to the t-1 th frame and the motion parameter information.

Further, it is to be noted that, as for the image frame sequence of the 0 th frame to the t th frame as described above, when t is 1, the correspondence relationship is determined based on only the feature of the 0 th frame, as can be understood by those skilled in the art.

Next, returning to FIG. 2, the method for determining the pose of an object according to an embodiment of the present invention will be described. After determining the correspondence, the method proceeds to step S4. In step S4, motion estimation is performed based on the correspondence, the image of the current frame, and the image of the specific frame.

Specifically, those skilled in the art may use various known and future developed motion estimation algorithms, such as a 3D-2D algorithm, etc., to perform motion estimation based on the correspondence, the image of the current frame, and the image of the specific frame, which is not limited in detail herein.

Further, by the determination of the correspondence as described above, the features in the current frame can be classified into a first type of feature corresponding to the features in the specific frame and a second type of feature not corresponding to any of the features in the specific frame. The first class of features may be colloquially referred to as old features, for example, corresponding to a situation in which a subject takes an image with a similar view. The second class of features may be colloquially referred to as new features or features that are newly present in the current frame, for example, corresponding to a situation in which a subject takes an image with a widely different view, e.g., the subject moves from taking a building to taking an open space, etc.

Therefore, in the motion estimation in step S4, the second type of feature may not be considered, and the motion estimation result may be obtained by performing motion estimation based on the first type of feature, the image of the current frame, and the image of the specific frame.

Thus, in step S5, based on the result of the motion estimation, the posture of the object is determined. Specifically, those skilled in the art may employ various algorithms, known and developed in the future, to determine the pose of the object based on the result of the motion estimation, which will not be described in detail herein.

Further, optionally, after obtaining the result of the motion estimation, the method of the embodiment of the present invention may further update either or both of the feature model and the pose model to prepare for processing of the next frame.

Specifically, for example, the result of motion estimation includes motion parameter information between the current frame and the specific frame. Thus, in one aspect, with respect to the updating of the pose model, the historical pose model may be updated based on motion parameter information between the current frame and the particular frame. In other words, the current posture model of the object may be established based on the motion parameter information between the current frame and the specific frame, and the historical posture model, and the specific process thereof is similar to the process described above in step S32 and will not be repeated here.

On the other hand, regarding the updating of the feature model, first, the first-class feature may be subdivided into a first-class sub-feature and a second-class sub-feature based on the motion parameter information between the current frame and the specific frame. The first-class sub-feature is a feature, which is consistent with the motion parameter information between the current frame and the specific frame, among the old features as described above, and is also referred to as an intra-office feature, which represents a feature in which a feature predicted based on the history frame is consistent with a feature actually detected in the current frame. The second type of sub-feature is a feature of the old features that is inconsistent with the motion parameter information between the current frame and the specific frame, as described above, and is also referred to as an outlier feature, which represents a feature in which a feature predicted based on the history frame is inconsistent with a feature actually detected in the current frame.

Next, on the one hand, for the intra-office feature, the feature model thereof may be updated with the motion parameter information between the current frame and the specific frame, and the specific process thereof is similar to the process described above in step S31 and will not be repeated here. On the other hand, for outlier features, their feature models may be discarded.

Further, corresponding to a case where a subject takes images with widely different framing, there may be a feature in a specific frame that fails to correspond to a feature in the current frame, that is, a feature that disappears in the current frame. Therefore, feature models of such features in a particular frame may also be discarded.

Further, for a feature newly appearing in the current frame as described above, a feature model of the feature may be initialized, and the specific processing thereof is similar to that described above in step S31 and will not be repeated here.

The method for determining the pose of an object of an embodiment of the present invention is described above with reference to fig. 1-5. In the object posture determining method according to the embodiment of the present invention, since the correspondence between the features is determined by using the image information of at least two frames (for example, the 0 th frame to the t-1 th frame) before the current frame (for example, the t-th frame) and motion estimation is performed, the precision of motion estimation can be significantly improved, so that the precision of determining the posture of the object can be significantly improved, and the accumulated error can be reduced.

It is to be noted that, although the method for determining the posture of the object of the embodiment of the present invention is described above, the steps in the method may be modified, combined, changed, added, or deleted as appropriate depending on the application. For example, when only motion estimation is required, the method of the embodiment of the present invention may omit step S5. That is, an embodiment of the present invention may provide a method for motion estimation, including: acquiring a sequence of image frames acquired by an object during motion; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; and performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame to obtain a result of motion estimation.

Next, an apparatus for determining the pose of an object according to an embodiment of the present invention will be described with reference to fig. 6.

Fig. 6 is a block diagram illustrating a main configuration of an apparatus for determining a posture of an object according to an embodiment of the present invention.

As shown in fig. 6, an object posture determining apparatus 600 of an embodiment of the present invention includes: an image acquisition unit 610, a feature detection unit 620, a relationship determination unit 630, a motion estimation unit 640, and a pose determination unit 650.

The image acquisition unit 610 acquires a sequence of image frames acquired by the subject during motion.

The feature detection unit 620 detects a feature of a current frame in a sequence of image frames.

The relationship determining unit 630 determines the corresponding relationship between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and its corresponding feature in at least one frame before the specific frame, and the motion parameter information between the specific frame and the at least one frame before the specific frame.

The motion estimation unit 640 performs motion estimation based on the correspondence, the image of the current frame, and the image of the specific frame.

The pose determination unit 650 determines the pose of the object based on the result of the motion estimation.

Next, an exemplary configuration of the relationship determination unit 630 in an embodiment will be described in detail with reference to fig. 7.

Fig. 7 is a block diagram illustrating a detailed configuration of a relationship determination unit in the apparatus for determining the posture of an object shown in fig. 6.

As shown in fig. 7, the relationship determination unit 630 includes: a feature model acquisition unit 6310, an attitude model acquisition unit 6320, an attitude model prediction unit 6330, and a correspondence relationship determination unit 6340.

Specifically, the feature model acquisition unit 6310 acquires a feature model of each feature in the specific frame. A feature model for each feature is formed based on the feature in the particular frame and its corresponding feature in at least one frame preceding the particular frame.

The posture model acquisition unit 6320 acquires a history posture model of the subject. The historical pose model of the object is formed based on motion parameter information between the particular frame and at least one frame preceding the particular frame.

The posture model prediction unit 6330 predicts a current posture model of the object based on the historical posture model.

The correspondence determination unit 6340 determines the correspondence based on the feature model and the historical posture model.

Next, a detailed configuration of the correspondence relation determining unit 6340 in an embodiment will be described with reference to fig. 8.

Fig. 8 is a block diagram illustrating a detailed configuration of a correspondence relationship determination unit in the relationship determination unit illustrated in fig. 7.

As shown in fig. 8, the correspondence determining unit 6340 includes: a first matching degree calculation unit 6340A, a second matching degree calculation unit 6340B, a comprehensive matching degree calculation unit 6340C, and a feature correspondence determination unit 6340D.

The first matching degree calculation unit 6340A calculates a first matching degree between each feature in the current frame and the feature model of each feature in the specific frame. The second matching degree calculation unit 6340B calculates a second matching degree between each feature in the current frame and the predicted current posture model. The integrated matching degree calculation unit 6340C calculates an integrated matching degree of each feature in the current frame based on the first matching degree and the second matching degree. The feature correspondence determination unit 6340D determines the correspondence between the features in the current frame and the features in the specific frame based on the integrated matching degree.

In another embodiment, the correspondence relation determining unit is configured to: the features in the current frame are divided into a first class of features corresponding to the features in the specific frame and a second class of features not corresponding to any feature in the specific frame, and the first class of features and the second class of features are used as the corresponding relation. Accordingly, the motion estimation unit is configured to: and performing motion estimation based on the first class of features, the image of the current frame and the image of the specific frame.

In a further embodiment, the result of the motion estimation comprises motion parameter information between the current frame and the particular frame. The apparatus further comprises at least one of: an attitude model updating unit that establishes a current attitude model of the object based on motion parameter information between the current frame and the specific frame and the historical attitude model; and a feature model updating unit that, based on the motion parameter information between the current frame and the specific frame, subdivides the first-class features into first-class sub-features consistent with the motion parameter information between the current frame and the specific frame and second-class sub-features inconsistent with the motion parameter information between the current frame and the specific frame, updates the feature model of the first-class sub-features with the motion parameter information between the current frame and the specific frame, and discards the feature model of the second-class sub-features.

The detailed configuration and operation of each unit of the apparatus 600 for determining the posture of the object according to the embodiment of the present invention have been described in detail in the object posture determining method with reference to fig. 1 to 5, and will not be repeated here.

It is to be noted that, although the apparatus for determining the posture of the object of the embodiment of the present invention is described above, the units in the apparatus may be modified, combined, changed, added, or deleted as appropriate depending on the application. For example, the method of embodiments of the present invention may omit the pose determination module when only motion estimation is required. That is, an embodiment of the present invention may provide an apparatus for motion estimation, including: an image acquisition unit that acquires a sequence of image frames acquired by an object during motion; a feature detection unit that detects a feature of a current frame in the image frame sequence; the relation determining unit is used for determining the corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; and a motion estimation unit which performs motion estimation based on the correspondence, the image of the current frame and the image of the specific frame to obtain a motion estimation result.

Next, an object posture determining apparatus of an embodiment of the present invention will be described with reference to fig. 9.

As shown in fig. 9, the object posture determining apparatus 900 according to the embodiment of the present invention mainly includes: one or more processors 910, memory 920, image acquisition module 940, and output module 950, which are interconnected via a bus system 930 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the object pose determination apparatus 900 shown in fig. 9 are merely exemplary and not limiting, and the object pose determination apparatus 900 may have other components and structures as necessary.

The image acquisition module 940 may acquire a sequence of image frames during motion of the object. Illustratively, the image capture module 940 may be constituted by various image pickup elements of a still camera, a motion camera, and the like. The output module 950 may be used to output the results of the object pose determination. Illustratively, the output module 950 may be an image output module such as a display or a voice output module such as a speaker.

Processor 910 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in object pose determination apparatus 900 to perform desired functions.

Memory 920 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 910 to implement the functions of the object pose determination methods of embodiments of the present invention and/or other desired functions. Further, for example, as shown in fig. 9, the processor 910 may call the respective units described above with reference to fig. 6, that is, the image acquisition unit 610, the feature detection unit 620, the relationship determination unit 630, the motion estimation unit 640, and the posture determination unit 650, by executing program instructions to implement the corresponding functions.

Illustratively, the processor 910 may execute the program instructions to perform the following processes: acquiring the image frame sequence from an image acquisition module; detecting features of a current frame in a sequence of image frames; determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame; performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and determining the pose of the object based on the result of the motion estimation.

It should be noted that, although the above describes the apparatus for determining the posture of the object by means of processor call with reference to fig. 9, it can be understood by those skilled in the art that this is only an example. The apparatus for determining the posture of the object according to the embodiment of the present invention may also be implemented by other hardware circuits, such as an embedded system, and will not be described in detail herein.

The method and apparatus for determining the pose of an object according to an embodiment of the present invention are described above with reference to fig. 1 to 9.

It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Furthermore, it should be noted that in this specification, expressions like "first … unit" and "second. In fact, the unit may be integrally implemented as one unit or may be implemented as a plurality of units, as necessary.

Finally, it should be noted that the series of processes described above includes not only processes performed in time series in the order described herein, but also processes performed in parallel or individually, rather than in time series.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.

In embodiments of the present invention, the units/modules may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the unit/module and achieve the stated purpose for the unit/module.

When a unit/module can be implemented by software, considering the level of existing hardware technology, the unit/module can be implemented by software, and those skilled in the art can build corresponding hardware circuits to implement corresponding functions, without considering the cost, the hardware circuits include conventional Very Large Scale Integration (VLSI) circuits or gate arrays and existing semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

The present invention has been described in detail, and the principle and embodiments of the present invention are explained herein by using specific examples, which are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for determining a pose of an object used to acquire an image, comprising:

acquiring a sequence of image frames of an object acquired by the object during motion;

detecting features of a current frame in a sequence of image frames;

determining a corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame;

performing motion estimation based on the corresponding relation, the image of the current frame and the image of the specific frame; and

based on the result of the motion estimation, the pose of the object is determined.

2. The method of claim 1, wherein the step of determining the correspondence between the features in the current frame and the features in the particular frame comprises:

obtaining a feature model of each feature in the specific frame, wherein the feature model of each feature is formed based on the feature in the specific frame and a corresponding feature in at least one frame before the specific frame;

acquiring a historical posture model of an object, wherein the historical posture model of the object is formed on the basis of motion parameter information between the specific frame and at least one frame before the specific frame;

predicting a current attitude model of the object based on the historical attitude model;

and determining the corresponding relation based on the characteristic model and the predicted current attitude model.

3. The method of claim 2, wherein,

the step of determining the correspondence between the features in the current frame and the features in the particular frame comprises:

dividing the features in the current frame into a first class of features corresponding to the features in the specific frame and a second class of features which do not correspond to any feature in the specific frame, and taking the first class of features and the second class of features as the corresponding relation;

the step of performing motion estimation comprises:

and performing motion estimation based on the first class of features, the image of the current frame and the image of the specific frame.

4. The method of claim 3, wherein the result of the motion estimation comprises motion parameter information between the current frame and the particular frame, the method further comprising at least one of:

establishing a current attitude model of the object based on the motion parameter information between the current frame and the specific frame and the historical attitude model; and

based on the motion parameter information between the current frame and the specific frame, the first class of features is subdivided into first class sub-features consistent with the motion parameter information between the current frame and the specific frame and second class sub-features inconsistent with the motion parameter information between the current frame and the specific frame, the feature model of the first class of sub-features is updated by the motion parameter information between the current frame and the specific frame, and the feature model of the second class of sub-features is discarded.

5. The method of claim 2, wherein the step of determining the correspondence between the features in the current frame and the features in the particular frame comprises:

calculating a first matching degree between each feature in the current frame and the feature model of each feature in the specific frame;

calculating a second matching degree between each feature in the current frame and the predicted current attitude model;

calculating the comprehensive matching degree of each feature in the current frame based on the first matching degree and the second matching degree; and

and determining the corresponding relation between the features in the current frame and the features in the specific frame based on the comprehensive matching degree.

6. An apparatus for determining a pose of an object used to acquire an image, comprising:

an image acquisition unit that acquires a sequence of image frames of an object acquired by the object during motion;

a feature detection unit that detects a feature of a current frame in the image frame sequence;

the relation determining unit is used for determining the corresponding relation between the features in the current frame and the features in the specific frame based on each feature in the specific frame before the current frame and the corresponding feature in at least one frame before the specific frame and the motion parameter information between the specific frame and the at least one frame before the specific frame;

a motion estimation unit for performing motion estimation based on the correspondence, the image of the current frame, and the image of the specific frame; and

and an attitude determination unit that determines an attitude of the object based on a result of the motion estimation.

7. The apparatus of claim 6, wherein the relationship determining unit comprises:

a feature model acquisition unit that acquires a feature model of each feature in the specific frame, wherein the feature model of each feature is formed based on the feature in the specific frame and a corresponding feature in at least one frame preceding the specific frame;

a pose model acquisition unit that acquires a historical pose model of an object, wherein the historical pose model of the object is formed based on motion parameter information between the specific frame and at least one frame preceding the specific frame;

an attitude model prediction unit which predicts a current attitude model of the object based on the historical attitude model;

and a correspondence relation determination unit configured to determine the correspondence relation based on the feature model and the predicted current posture model.

8. The apparatus of claim 7, wherein,

the correspondence relation determining unit is configured to:

the motion estimation unit is configured to:

9. The apparatus of claim 8, wherein the result of the motion estimation includes motion parameter information between the current frame and the particular frame, the apparatus further comprising at least one of:

an attitude model updating unit that establishes a current attitude model of the object based on motion parameter information between the current frame and the specific frame and the historical attitude model; and

and the characteristic model updating unit is used for subdividing the first class of characteristics into first class of sub-characteristics consistent with the motion parameter information between the current frame and the specific frame and second class of sub-characteristics inconsistent with the motion parameter information between the current frame and the specific frame based on the motion parameter information between the current frame and the specific frame, updating the characteristic model of the first class of sub-characteristics by using the motion parameter information between the current frame and the specific frame, and abandoning the characteristic model of the second class of sub-characteristics.

10. The apparatus of claim 7, the correspondence relation determining unit comprising:

a first matching degree calculation unit which calculates a first matching degree between each feature in the current frame and the feature model of each feature in the specific frame;

the second matching degree calculation unit is used for calculating the second matching degree between each feature in the current frame and the predicted current attitude model;

a comprehensive matching degree calculation unit which calculates the comprehensive matching degree of each feature in the current frame based on the first matching degree and the second matching degree; and

and the characteristic corresponding relation determining unit is used for determining the corresponding relation between the characteristics in the current frame and the characteristics in the specific frame based on the comprehensive matching degree.