WO2010121568A1 - 训练方法、设备和估计图像中对象姿势视角的方法、设备 - Google Patents
训练方法、设备和估计图像中对象姿势视角的方法、设备 Download PDFInfo
- Publication number
- WO2010121568A1 WO2010121568A1 PCT/CN2010/072150 CN2010072150W WO2010121568A1 WO 2010121568 A1 WO2010121568 A1 WO 2010121568A1 CN 2010072150 W CN2010072150 W CN 2010072150W WO 2010121568 A1 WO2010121568 A1 WO 2010121568A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- model
- view
- feature
- object pose
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 title abstract description 14
- 230000000007 visual effect Effects 0.000 title abstract 8
- 238000013507 mapping Methods 0.000 claims description 59
- 230000009466 transformation Effects 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000009467 reduction Effects 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 abstract description 14
- 238000012417 linear regression Methods 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 13
- 239000013598 vector Substances 0.000 description 13
- 239000011159 matrix material Substances 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000012821 model calculation Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 238000012847 principal component analysis method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/77—Determining position or orientation of objects or cameras using statistical methods
Definitions
- the present invention relates to object pose estimation, and more particularly to a training method, apparatus, and method and apparatus for estimating an object pose perspective in an image.
- Methods for estimating poses of objects in a single image can be technically based on model-based and learning-based.
- a learning-based approach directly infers the three-dimensional pose of an object from image features.
- the image features that are used more are object outline information.
- the present invention is directed to a method and apparatus for performing training based on an input image, and a method and apparatus for estimating an object posture angle of view in an image, to facilitate distinguishing an object posture in object pose estimation.
- Perspective a method and apparatus for performing training based on an input image, and a method and apparatus for estimating an object posture angle of view in an image, to facilitate distinguishing an object posture in object pose estimation.
- An embodiment of the present invention is a method for performing an image based on an input image, comprising: extracting image features from each input image of a plurality of input images having a view type; for a plurality of view categories For each view category, a map model for converting image features extracted in an input image of the view class into a three-dimensional object pose information corresponding to the input image is estimated by linear regression analysis; and based on by using the image feature Calculating a joint probability distribution model based on samples obtained by connecting the corresponding three-dimensional object posture information, wherein the joint probability distribution model is based on a single probability distribution model corresponding to different perspective categories, and each of the single probability distribution models is based on including A sample of image features extracted from the input image of the corresponding view category.
- Another embodiment of the present invention is an apparatus for training based on an input image, comprising: an extracting unit that extracts image features from each input image of a plurality of input images having a view type; a map estimating unit, It is linear for each of the multiple perspective categories - a regression analysis estimating a mapping model that converts image features extracted from an input image belonging to the view category into a three-dimensional object pose information corresponding to the input image; and a probability model calculation unit based on passing the image Calculating a joint probability distribution model based on samples obtained by connecting the features to the corresponding three-dimensional object posture information, wherein the single probability distribution model on which the joint probability distribution model is based corresponds to different perspective categories, and each of the single probability distribution models is based on inclusion A sample of the image features extracted from the input image of the eye view category.
- each input image has a respective view category.
- Image features can be extracted from each input image.
- the mapping model can be estimated by linear regression analysis. This mapping model acts as a function of converting the image features of the view category into a function of the corresponding three-dimensional object pose information.
- the image features can be linked to the corresponding three-dimensional object pose information to obtain samples, thereby calculating a joint probability distribution model based on the samples.
- the joint probability distribution model is based on several single probability distribution models, where each perspective category has a single probability distribution model.
- a corresponding single probability distribution model can be obtained based on samples of image features containing corresponding view categories. Therefore, a model for object pose view estimation, that is, a map model of each pose view and a joint probability distribution model can be trained by the embodiment of the present invention.
- the feature reduction model for reducing the dimensionality of the image features can be calculated by the dimensionality reduction method. Accordingly, the image features can be transformed using the feature transformation model for the estimation of the mapping model and the calculation of the joint probability distribution model.
- the transformed image features of the feature transformation model have lower dimensionality, which helps to reduce the amount of processing for subsequent estimation and calculation.
- Another embodiment of the present invention is a method for estimating an object pose perspective in an image, comprising: extracting an image feature from an input image; and corresponding to each view category of the plurality of view categories, based on the view category a mapping model for mapping image features to three-dimensional object pose information, obtaining corresponding three-dimensional object pose information of the image features; calculating each according to a joint probability distribution model based on a single probability distribution model for the view category a joint probability of a joint feature of the view category including the image feature and the corresponding three-dimensional object pose information; calculating a probability of the image feature under the condition of the corresponding three-dimensional object pose information according to the joint probability; and the condition
- the view type category corresponding to the largest conditional probability in the probability is estimated as the object pose view angle in the input image.
- Another embodiment of the present invention is an apparatus for estimating an object pose perspective in an image, comprising: an extracting unit that extracts an image feature from an input image; a mapping unit that is for each of a plurality of view categories a class, based on a mapping model corresponding to the view category for mapping image features to three-dimensional object pose information, obtaining corresponding three-dimensional object pose information of the image feature; a probability calculation unit based on the view based on the view category Single probability distribution model - a joint probability distribution model, calculating a joint probability of a joint feature of each view category including the image feature and corresponding three-dimensional object pose information, and calculating the joint probability according to the joint probability under the condition of the corresponding three-dimensional object pose information a conditional probability of the image feature; and an estimation unit that estimates a view type category corresponding to a maximum condition probability among the conditional probabilities as an object pose view angle in the input image.
- image features can be extracted from the input image. Since each view category has a corresponding mapping model for converting the image features of the view category into three-dimensional object pose information, it can be assumed that the image features have respective view categories, thereby obtaining corresponding image features by using corresponding mapping models. 3D object pose information. According to the joint probability distribution model, the joint probability of occurrence of the image feature and the corresponding three-dimensional object pose information under the assumed perspective categories can be calculated. Based on this joint probability, the H probability of the image feature appearing at frfr where the corresponding three-dimensional object pose information appears. It can be seen that the view category hypothesis corresponding to the maximum ⁇ probability can be estimated as the object pose view in the input image. Thus, embodiments of the present invention are capable of estimating an object pose perspective.
- the image feature can be transformed by the feature transformation model for dimensionality reduction for obtaining the three-dimensional object posture information.
- the transformed image features of the feature transformation model have lower dimensionality, which is beneficial to reduce the processing capacity of subsequent mapping and probability calculation.
- the existing method of object pose estimation does not distinguish the angle of view of the object pose, and due to the complexity of the pose change of the object, different perspectives of the pose of the object will bring great estimation ambiguity, so the image pose estimation of different perspectives The accuracy is much lower than the pose estimation of a single perspective.
- the purpose of the present invention is to estimate the object angle of view in the image and video, thereby further estimating the object pose in a single perspective.
- the experimental results show that the present invention can effectively estimate objects in images and videos. posture.
- FIG. 1 The block diagram of Fig. 1 illustrates the structure of an apparatus for training based on an input image, in accordance with one embodiment of the present invention.
- FIG. 3 shows a flow chart of a method for practicing based on an input graph, in accordance with one embodiment of the present invention. - -
- Figure 4 is a block diagram showing the structure of an apparatus for training based on an input image in accordance with a preferred embodiment of the present invention.
- Figure 5 shows a flow chart of a method for training based on an input image in accordance with a preferred embodiment of the present invention.
- Figure 6 is a block diagram showing the structure of an apparatus for estimating an object pose angle in an image, in accordance with one embodiment of the present invention.
- Figure 7 illustrates a flow chart of a method for estimating an object pose perspective in an image, in accordance with one embodiment of the present invention.
- Figure 8 is a block diagram showing the structure of an apparatus for estimating the angle of view of an object in an image in accordance with a preferred embodiment of the present invention.
- Figure 9 illustrates a flow chart of a method for estimating an object pose perspective in an image in accordance with a preferred embodiment of the present invention.
- FIG. 10 is a block diagram showing an exemplary structure of a computer in which an embodiment of the present invention is implemented.
- Figure 1 is a block diagram showing the structure of an apparatus 100 for training based on an input image, in accordance with one embodiment of the present invention.
- the device 100 includes an extracting unit 101, a map estimating unit 102, and a probability model calculating unit 103.
- An input image is an image that contains objects with various pose view categories.
- Each pose perspective category represents the different perspectives taken by the object.
- the posture view category may include -80°, -40°, 0°, +40°, and +80°, where -80° is a view angle category indicating that the camera is rotated right by 80 degrees to the camera lens, -40° It is a posture view type indicating that the object is rotated 40 degrees to the right of the camera lens, 0° is the view view type indicating that the object is facing the camera lens, and +40° is the view view type indicating that the object is rotated 40 degrees to the left of the camera lens.
- +80° is a view angle category indicating that the subject is rotated 80 degrees to the left of the camera lens.
- the pose view category can also represent the range of angles of view.
- the front view of the object - - The 180° range from the left side to the right side is divided into 5 viewing angle ranges: [-90°, -54°], [-54°, -18°], [-18°, 18°], [18°, 54°], [54°, 90°], that is, 5 posture view categories.
- both the input image and the corresponding gesture view category are provided to device 100.
- the input image includes an object image of various posture angles of view without background and an object image of various posture angles of view containing the background.
- the extracting unit 101 extracts an image feature from each of the input images of the plurality of input images having the view type.
- the image features can be various features for object pose estimation.
- the image features are statistical features of the edge direction in the input image, such as gradient direction histogram HOG features and scale invariant feature transform SIFT features.
- a gradient direction histogram is assumed as an image feature, and the input image has a uniform width and height (120 pixels X 100 pixels).
- the input image has a uniform width and height (120 pixels X 100 pixels).
- embodiments of the invention are not limited to the specific features and dimensions assumed.
- the extracting unit 101 can separately calculate the gradient of each pixel in the input image in the horizontal direction and the vertical direction, that is,
- / represents the gray value of the pixel
- x, y represent the coordinates of the pixel in the horizontal and vertical directions, respectively.
- the extracting unit 101 can separately calculate the gradient direction and the gradient size of the pixel according to the horizontal and vertical gradients of each pixel in the input image, that is,
- the extracting unit 101 can take 24 32 x 32 sized blocks from left to right and top to bottom in the input image, wherein the horizontal direction is 6 blocks per line, and the vertical direction is 4 columns. Squares. Halfway between any two squares adjacent in the horizontal direction and the vertical direction.
- Figure 2 shows the mode in which the squares are extracted from the input image.
- Figure 3 shows three - - 32 x 32 size blocks 201, 202 and 203.
- Block 202 overlaps pixel 201 by 16 pixels in the vertical direction
- block 203 overlaps pixel 201 by 16 pixels in the horizontal direction.
- the extracting unit 101 can divide each 32 X 32 square into 16 8 x 8 small squares, wherein the horizontal direction is 4 small squares per line, and the vertical direction is 4 small squares per column.
- the small squares are arranged in order of horizontal and vertical.
- the extracting unit 101 calculates a gradient direction histogram of 64 pixels in the small square, wherein the gradient direction is divided into 8 direction intervals, that is, from 0 to ⁇ in the range of 8 For a direction interval. That is, based on 64 pixels of each 8 x 8 small square, for each of the 8 direction intervals, the sum of the gradient sizes of the pixels whose gradient directions belong to the direction interval is calculated, thereby obtaining an 8-dimensional vector. Accordingly, each 32 X 32 block gets a 128-dimensional vector.
- embodiments of the present invention are not limited to the division modes and specific numbers of the blocks and the small blocks in the above examples, and other division modes and specific numbers may be employed.
- Embodiments of the present invention are not limited to the method of extracting features in the above examples, and other methods of extracting image features for object pose estimation may be used.
- the map estimation unit 102 converts image features extracted from an input image belonging to the view category into a corresponding image corresponding to the input image by linear regression analysis for each of the plurality of view categories.
- a mapping model of 3D object pose information That is to say, for each posture view category, it can be considered that there is a certain functional relationship or mapping relationship by which image features extracted from the input image of the pose view type can be converted or mapped into corresponding three-dimensional images of the input image.
- Object pose information Through linear regression analysis, such a function or mapping relationship, that is, a mapping model, can be estimated based on the extracted image features and the corresponding three-dimensional object posture information.
- an image feature (feature vector) extracted from an input image is represented as Xm , where m is the dimension of the image feature. All image features extracted from "one input image are represented as a matrix X m * Struktur.
- the three-dimensional object pose information (vector) corresponding to the extracted image feature X m is represented as Y P , where ⁇ is the dimension of the three-dimensional object pose information.
- the corresponding three-dimensional object pose information of all image features extracted from the "input images” is represented as a matrix ⁇ ⁇ * ⁇ .
- the probability model calculation unit 103 calculates a joint probability distribution model based on the samples obtained by connecting the image features with the corresponding three-dimensional object posture information, wherein the single probability distribution model on which the joint probability distribution model is based corresponds to Different view categories, and each single probability distribution model is based on samples containing image features extracted from input images of respective view categories.
- the above joint probability distribution model is based on a single probability distribution model for different perspective categories.
- a corresponding single probability distribution model ie, a model
- a joint probability distribution model capable of calculating a single probability distribution model of all posture view categories ie, Model parameters.
- Joint probability distribution models suitable for use include, but are not limited to, mixed Gaussian models, hidden Markov models, and frfr random fields.
- a mixed Gaussian model is employed.
- the image feature (vector) X and the three-dimensional object pose information (vector) are used to form a joint feature (ie, sample) [ ⁇ , 7] ⁇ .
- the joint feature [ ⁇ ,7] ⁇ satisfies the probability distribution formula:
- ⁇ is the number of pose view categories
- i? is a single Gaussian model for the pose view category i, ie the normal distribution model. And; is the parameter of the normal distribution model, and A represents the single Gaussian for the pose view category The weight of the model in the mixed Gaussian model.
- FIG. 3 illustrates a flow diagram of a method 300 for training based on an input image, in accordance with one embodiment of the present invention.
- method 300 begins at step 301.
- image features are extracted from each input image of a plurality of input images having a view type.
- the input image and gesture perspective categories may be the input image and gesture perspective categories previously described with respect to the embodiment of FIG.
- Image features can be various features for object pose estimation.
- the image features are statistical features of the edge direction in the input image, such as gradient direction histogram HOG features and scale invariant feature transform SIFT features.
- step 305 for each of the plurality of view categories, the image features extracted from the input image belonging to the view category are converted to the input by linear regression analysis.
- - - The mapping model of the corresponding 3D object pose information into the image. That is to say, for each pose view category, it can be considered that there is a certain functional relationship or mapping relationship by which image features extracted from the input image of the pose view category can be converted or mapped to corresponding three-dimensional images of the input image.
- Object pose information Through linear regression analysis, such a function or mapping relationship, that is, a mapping model, can be estimated according to the extracted image features and the corresponding three-dimensional object posture information.
- an image feature (feature vector) extracted from an input image is represented as Xm , where m is the dimension of the image feature. All image features extracted from "one input image are represented as a matrix X m * Struktur.
- the three-dimensional object pose information (vector) corresponding to the extracted image feature X m is represented as Y P , where ⁇ is the dimension of the three-dimensional object pose information.
- the corresponding three-dimensional object pose information of all image features extracted from the "input images” is represented as a matrix ⁇ ⁇ * ⁇ .
- a joint probability distribution model is calculated based on the samples obtained by connecting the image features with the corresponding three-dimensional object posture information, wherein the single probability distribution model on which the joint probability distribution model is based corresponds to different perspective categories, and Each single probability distribution model is based on a sample of image features extracted from an input image containing an eye view category.
- the above joint probability distribution model is based on a single probability distribution model for different perspective categories.
- a corresponding single probability distribution model ie, a model
- a joint probability distribution model capable of calculating a single probability distribution model of all posture view categories ie, Model parameters.
- Joint probability distribution models suitable for use include, but are not limited to, mixed Gaussian models, hidden Markov models, and ⁇ random fields.
- a mixed Gaussian model is employed.
- the image feature (vector) X and the three-dimensional object pose information (vector) are used to form a joint feature (ie, sample) [ ⁇ , 7] ⁇ .
- the joint feature [ ⁇ ,7] ⁇ satisfies the probability distribution formula:
- ⁇ is the number of pose view categories
- i? ;) is a single Gaussian model for the pose view category i, ie the normal distribution model.
- A means - - The weight of the single Gaussian model of the pose view category in the mixed Gaussian model.
- FIG. 4 is a block diagram showing the structure of an apparatus 400 for training based on an input image in accordance with a preferred embodiment of the present invention.
- the device 400 includes an extracting unit 401, a map estimating unit 402, a probability model calculating unit 403, a transform model calculating unit 404, and a feature transforming unit 405.
- the functions of the extraction unit 401, the map estimation unit 402, and the probability model calculation unit 403 are the same as those of the extraction unit 101, the map estimation unit 102, and the probability model calculation unit 103 in Fig. 1, and the description thereof will not be repeated.
- the extracting unit 401 is configured to output the extracted image features to the transform model calculating unit 404 and the feature transforming unit 405, and the image features of the input map estimating unit 402 and the probability model calculating unit 403 are derived from the feature transforming unit 405.
- the transformation model calculation unit 404 uses the dimensionality reduction method to calculate a feature transformation model that reduces dimensionality of image features.
- Dimensionality reduction methods include, but are not limited to, principal component analysis methods, factor analysis methods, single value decomposition, multidimensional scaling analysis, local linear interpolation, isometric mapping, linear discriminant analysis, partial cut space alignment, and maximum variance expansion.
- the resulting feature transformation model can be used to transform the image features extracted by the extraction unit 401 into image features having a smaller dimensionality.
- an image feature (feature vector) extracted from an input image is represented as Xm , where m is the dimension of the image feature. All image features extracted from "one input image are represented as a matrix X m * n .
- the principal component analysis method can be used to calculate the matrix from the image feature X Where d ⁇ m.
- the feature transform unit 405 transforms the image features using the feature transform model for the estimation of the map model and the calculation of the joint probability distribution model.
- the transformed image features can be calculated by:
- the transformed image features (the dimension is provided to the map estimation unit 402, the probability model calculation unit 403).
- the transformed image feature subjected to the feature transformation model since the transformed image feature subjected to the feature transformation model has a lower dimensionality, it is advantageous to reduce the processing amount of subsequent estimation and calculation.
- Figure 5 illustrates a training based on an input image in accordance with a preferred embodiment of the present invention.
- - The flow chart of the method 500 of practice.
- method 500 begins at step 501.
- step 502 as in step 303 of method 300, image features are extracted from each of the plurality of input images having the view category.
- a feature transformation model that reduces the dimensionality of the image features extracted at step 502 is calculated using a dimensionality reduction method.
- Dimensionality reduction methods include, but are not limited to, principal component analysis methods, factor analysis methods, single value decomposition, multidimensional scaling analysis, local linear interpolation, isometric mapping, linear discriminant analysis, local tangent spatial alignment, and maximum variance expansion.
- the resulting feature transformation model can be used to transform the extracted image features into image features with smaller dimensions.
- an image feature (feature vector) extracted from an input image is represented as Xm , where m is the dimension of the image feature. All image features extracted from "one input image are represented as a matrix X m * n .
- the principal component analysis method can be used to calculate the matrix from the image feature X
- the image features are transformed using the feature transformation model for the estimation of the mapping model and the calculation of the joint probability distribution model.
- the transformed image features can be calculated by:
- step 505 as with step 305 of method 300, for each of the plurality of view categories, the image features extracted from the input image belonging to the view category are estimated by linear regression analysis (has been transformed) A mapping model that is converted into three-dimensional object pose information corresponding to the input image.
- step 507 as in step 307 of method 300, a joint probability distribution model is calculated based on samples obtained by connecting image features (already transformed) with corresponding three-dimensional object pose information, wherein the joint probability distribution model
- the single probability distribution model on which it is based corresponds to different perspective categories, and each single probability distribution model is based on samples of image features extracted from input images containing the objective view category.
- Figure 6 is a block diagram showing the structure of an apparatus 600 for estimating an object pose angle in an image, in accordance with one embodiment of the present invention.
- the device 600 includes an extracting unit 601, a mapping unit 602, a probability calculating unit 603, and an estimating unit 604. - -
- the extracting unit 601 extracts image features from the input image.
- the specifications of the input image are the same as those described earlier with reference to the embodiment of Fig. 1.
- the image features and the method of extracting the image features are the same as the image features on which the mapping model to be employed is based and its extraction method (as previously described with reference to the embodiment of Fig. 1).
- the mapping unit 602 obtains corresponding three-dimensional object pose information of the image feature based on the mapping model for mapping the image feature to the three-dimensional object posture information corresponding to the view category for each of the plurality of view categories .
- the mapping model is the mapping model previously described with reference to the embodiment of Fig. 1.
- mapping unit 602 assumes that all of the view categories are possible for the input map.
- the mapping unit 602 obtains the corresponding three-dimensional object pose information by using the corresponding mapping model for each hypothetical view category ⁇ * ⁇
- the probability calculation unit 603 calculates a joint probability of the joint feature including the image feature and the corresponding three-dimensional object pose information for each view category based on the joint probability distribution model based on the single probability distribution model for the view type, and calculates according to the joint probability The probability of the image feature under the corresponding three-dimensional object pose information.
- the joint probability distribution model is the joint probability distribution model described above with reference to the embodiment of Fig. 1.
- the probability calculation unit 603 uses the image feature X and the corresponding three-dimensional object pose information to form a joint feature [X, ⁇ ⁇ , using the joint probability distribution model to calculate the joint feature [X, 7] [tau] value of joint probability ⁇ ([ ⁇ , 7] ⁇ ) ⁇ joint probability value ⁇ ([ ⁇ , 7] ⁇ ) according to the obtained probability calculation unit 603, for example using Bayes' rule ⁇ probabilities p (y
- X ), ie ⁇ ( ⁇ 1 ⁇ ) ⁇ ([ ⁇ , ⁇ ] ⁇ )/ ⁇ ([ ⁇ , 7] ⁇ ) ⁇ ⁇ .
- the estimation unit 604 estimates the view type category corresponding to the largest ⁇ probability among the conditional probabilities calculated for all possible view categories as the object pose view in the input image.
- Figure 7 illustrates a flow diagram of a method 700 for estimating an object pose perspective in an image, in accordance with one embodiment of the present invention.
- method 700 begins at step 701.
- image features are extracted from the input image.
- the input image is the same as the input image previously described with reference to the embodiment of Fig. 1.
- the image features and the method of extracting the image features are the same as the image features on which the mapping model to be employed is based and its extraction method (as previously described with reference to the embodiment of Fig. 1).
- step 705 for each of the plurality of view categories, a corresponding three-dimensional object pose of the image feature is obtained based on a mapping model corresponding to the view category for mapping the image features to the three-dimensional object pose information.
- the mapping model is the mapping model previously described with reference to the embodiment of FIG.
- m is a graph - -
- the mapping model is used for each hypothetical view category.
- step 707 according to the joint probability distribution model based on the single probability distribution model for the view category, the joint probability of the joint feature including the image feature and the corresponding three-dimensional object pose information of each view category is calculated, and is calculated according to the joint probability The conditional probability of the image feature under the condition of the corresponding three-dimensional object pose information.
- the joint probability distribution model is the joint probability distribution model described above with reference to the embodiment of Fig. 1.
- the joint feature [X, 7] ⁇ is composed of the image feature X and the corresponding three-dimensional object pose information in step 707, and the joint feature is calculated using the joint probability distribution model [ ⁇ , 7] ⁇
- step 708 the view category corresponding to the largest conditional probability among the conditional probabilities calculated for all possible view categories is estimated as the object pose view in the input image.
- the method 700 ends at step 709.
- Figure 8 is a block diagram showing the structure of an apparatus 800 for estimating the perspective of an object pose in an image in accordance with a preferred embodiment of the present invention.
- the apparatus 800 includes an extracting unit 801, a transforming unit 805, a mapping unit 802, a probability calculating unit 803, and an estimating unit 804.
- the extracting unit 801, the mapping unit 802, the probability calculating unit 803, and the estimating unit 804 have the same functions as the extracting unit 601, the mapping unit 602, the probability calculating unit 603, and the estimating unit 604 of the embodiment of Fig. 6, respectively, and the description thereof will not be repeated. It should be noted, however, that the extracting unit 801 is configured to output the extracted image features to the transform unit 805, and the image features of the mapping unit 802 and the probability calculating unit 803 are derived from the transform unit 805.
- the transform unit 805 transforms the image features by the feature transform model for dimensionality reduction for obtaining the three-dimensional object pose information.
- the feature transformation model may be the feature transformation model described above with reference to the embodiment of Fig. 4.
- Figure 9 illustrates a flow diagram of a method 900 for estimating an object pose perspective in an image, in accordance with a preferred embodiment of the present invention.
- method 900 begins at step 901. At step 903, with step 703 - - Same as, extract image features from the input image.
- the image features are transformed by the feature transformation model for dimensionality reduction for obtaining the three-dimensional object pose information.
- the feature transformation model may be the feature transformation model described above with reference to the embodiment of Fig. 4.
- step 905 as in step 705, for each of the plurality of view categories, image features are obtained based on a mapping model corresponding to the view category for mapping the image features to the three-dimensional object pose information. Corresponding 3D object pose information.
- step 907 as in step 707, according to the joint probability distribution model based on the single probability distribution model for the view category, the joint probability of the joint feature including the image feature and the corresponding three-dimensional object pose information of each view category is calculated, And calculating the conditional probability of the image feature under the condition of the corresponding three-dimensional object posture information according to the joint probability.
- step 908 as in step 708, the view category corresponding to the largest conditional probability among the conditional probabilities calculated for all possible view categories is estimated as the object pose view in the input image.
- the method 900 ends at step 909.
- embodiments of the present invention have been described above with respect to images, embodiments of the present invention are also applicable to video in which video is processed as a sequence of images.
- FIG. 10 is a block diagram showing an exemplary structure of a computer in which an embodiment of the present invention is implemented.
- the central processing unit (CPU) 1001 executes various processes in accordance with a program stored in the read-only mapping data (ROM) 1002 or a program loaded from the storage portion 1008 to the random access mapping data (RAM) 1003. .
- ROM read-only mapping data
- RAM random access mapping data
- data required when the CPU 1001 executes various processes and the like is also stored as needed.
- the CPU 1001, the ROM 1002, and the RAM 1003 are connected to each other via a bus 1004.
- Input/output interface 1005 is also coupled to bus 1004.
- the following components are connected to the input/output interface 1005: an input portion 1006 including a keyboard, a mouse, etc.; an output portion 1007 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker And so on; a storage portion 1008 including a hard disk or the like; and a communication portion 1009 including a network interface card such as a LAN card, a modem, and the like.
- the communication section 1009 performs communication processing via a network such as the Internet.
- the drive 1010 is also connected to the input/output interface 1005 as needed.
- the removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, semiconductor mapping data, and the like are mounted on the drive 1010 as needed, so that the calculations read therefrom are installed to the storage portion as needed.
- a program constituting the software is installed from a network such as the Internet or a storage medium such as the removable medium 1011.
- such a storage medium is not limited to the removable medium 1011 shown in FIG. 10 in which a program is stored and distributed separately from the method to provide a program to a user.
- the detachable medium 1011 include a magnetic disk, an optical disk (including a CD-ROM and a digital versatile disk (DVD)), a magneto-optical disk (including a mini disk (MD), and semiconductor mapped data.
- the storage medium may It is a ROM 1002, a hard disk included in the storage section 1008, and the like, in which programs are stored, and distributed to the user together with the method including them.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Description
- -
训练方法、 设备和估计图像中对象姿势视角的方法、 设备
技术领域
[01] 本发明涉及对象姿势估计,尤其涉及旨在进行对象姿势视角估计的训 练方法、 设备和估计图像中对象姿势视角的方法、 设备。
背景技术
[02] 在单个图像中估计对象 (例如人、 动物、 物体等)姿势的方法从技术原 理上可以分为基于模型和基于学习的。基于学习的方法直接从图像特征推 断对象的三维姿势。 使用得较多的图像特征是对象轮廓信息。
[03] 现有的对象姿势估计的方法没有区分对象姿势的视角。由于对象姿势 变化的复杂性, 对象姿势的不同视角会带来更大的模糊性。 因此, 不同视 角的图像姿势估计的准确度要远低于单一视角的姿势估计。
发明内容
[04] 鉴于现有技术的上述不足,本发明旨在提供一种基于输入图像的进行 训练的方法、设备和估计图像中对象姿势视角的方法、设备, 以利于在对 象姿势估计中区分对象姿势视角。
[05] 本发明的一个实施例是一种基于输入图像进 1|练的方法, 包括:从 具有视角类别的多个输入图像的每个输入图像中提取图像特征;针对多个 视角类别中的每个视角类别,通过线性回归分析估计将 M于所述视角类 别的输入图像中提取的图像特征转换为与所述输入图像相应的三维对象 姿势信息的映射模型;和基于通过将所述图像特征与相应三维对象姿势信 息连接而得到的样本,计算联合概率分布模型,其中所述联合概率分布模 型所基于的单概率分布模型对应于不同视角类别,并且每个所述单概率分 布模型基于包含从相应视角类别的输入图像提取的图像特征的样本。
[06] 本发明的另一个实施例是一种基于输入图像进行训练的设备, 包括: 提取单元,其从具有视角类别的多个输入图像的每个输入图像中提取图像 特征; 映射估计单元, 其针对多个视角类别中的每个视角类别, 通过线性
- - 回归分析估计将从属于所述视角类别的输入图像中提取的图像特征转换 为与所述输入图像相应的三维对象姿势信息的映射模型;和概率模型计算 单元,其基于通过将所述图像特征与相应三维对象姿势信息连接而得到的 样本,计算联合概率分布模型,其中所述联合概率分布模型所基于的单概 率分布模型对应于不同视角类别,并且每个所述单概率分布模型基于包含 目应视角类别的输入图像提取的图像特征的样本。
[07] 根据本发明的上述实施例,各个输入图像具有各自的视角类别。可从 每个输入图像中提取图像特征。按照视角类别,可通过线性回归分析估计 出映射模型。这种映射模型充当将该视角类别的图像特征转换为相应三维 对象姿势信息的函数的作用。可将图像特征与相应三维对象姿势信息连接 以得到样本,从而基于这些样本计算联合概率分布模型。联合概率分布模 型基于若干单概率分布模型, 其中每个视角类别有一个单概率分布模型。 基于包含相应视角类别的图像特征的样本可得到相应的单概率分布模型。 因此,通过本发明的实施例可训练出用于对象姿势视角估计的模型, 即各 姿势视角的映射模型和联合概率分布模型。
[08] 进一步地,在上述实施例中,可以利用降维方法计算将图像特征降维 的特征变换模型。相应地, 可以利用特征变换模型变换图像特征, 以用于 映射模型的估计和联合概率分布模型的计算。经过特征变换模型的变换的 图像特征具有更低的维数, 利于降低后续估计和计算的处理量。
[09] 本发明的另一个实施例是一种估计图像中对象姿势视角的方法, 包 括: 从输入图像中提取图像特征; 针对多个视角类别中的每个视角类别, 基于与该视角类别对应的、用于将图像特征映射到三维对象姿势信息的映 射模型,获得所述图像特征的相应三维对象姿势信息;根据基于针对所述 视角类别的单概率分布模型的联合概率分布模型,计算每个视角类别的包 含所述图像特征和相应三维对象姿势信息的联合特征的联合概率;根据所 述联合概率计算在所述相应三维对象姿势信息的条件下所述图像特征的 概率;和将所述条件概率中最大的条件概率所对应的视角类别估计为 所述输入图像中的对象姿势视角。
[10] 本发明的另一个实施例是一种估计图像中对象姿势视角的设备, 包 括: 提取单元, 其从输入图像中提取图像特征; 映射单元, 其针对多个视 角类别中的每个视角类别,基于与该视角类别对应的、用于将图像特征映 射到三维对象姿势信息的映射模型,获得所述图像特征的相应三维对象姿 势信息;概率计算单元,其根据基于针对所述视角类别的单概率分布模型
- - 的联合概率分布模型,计算每个视角类别的包含所述图像特征和相应三维 对象姿势信息的联合特征的联合概率,并且根据所述联合概率计算在所述 相应三维对象姿势信息的条件下所述图像特征的条件概率; 和估计单元, 其将所述条件概率中最大的条件概率所对应的视角类别估计为所述输入 图像中的对象姿势视角。
[11] 根据本发明的上述实施例,可从输入图像中提取图像特征。 由于每个 视角类别均有相应的用于将该视角类别的图像特征转换为三维对象姿势 信息的映射模型,可分别假设图像特征具有各个视角类别,从而利用相应 的映射模型,获得图像特征的相应三维对象姿势信息。根据联合概率分布 模型可计算出在假设的各个视角类别下出现该图像特征和相应三维对象 姿势信息的联合概率。根据此联合概率可计算出在出现该相应三维对象姿 势信息的 frfr下出现该图像特征的 H 概率。可以看出,最大 ^概率所 对应的视角类别假设可以被估计为输入图像中的对象姿势视角。因而本发 明的实施例能够估计出对象姿势视角。
[12] 进一步地,在上述实施例中,可以通过用于降维的特征变换模型将图 像特征变换, 以用于获得三维对象姿势信息。经过特征变换模型的变换的 图像特征具有更低的维数, 利于降低后续映射和概率计算的处理量。
[13] 现有的对象姿势估计的方法没有区分对象姿势的视角,而由于对象姿 势变化的复杂性,对象姿势的不同视角会带来很大的估计模糊性, 因此不 同视角的图像姿势估计的准确度要远低于单一视角的姿势估计,本发明的 目的是估计图像和视频中的对象视角,从而进一步估计单一视角中的对象 姿势, 实验结果表明本发明能有效估计图像和视频中的对象姿势。
附图说明
[14] 参照下面结合附图对本发明实施例的说明,会更加容易地理解本发明 的以上和其它目的、特点和优点。 在附图中, 相同的或对应的技术特征或 部件将采用相同或对应的附图标记来表示。
[15] 图 1 的框图示出了根据本发明一个实施例的用于基于输入图像进行 训练的设备的结构。
[16] 图 2的示意图示出了从输入图像中提取方块的模式。
[17] 图 3 示出了根据本发明一个实施例的用于基于输入图 练的 方法的流程图。
- -
[18] 图 4 的框图示出了根据本发明一个优选实施例的用于基于输入图像 进行训练的设备的结构。
[19] 图 5 示出了根据本发明一个优选实施例的用于基于输入图像进行训 练的方法的流程图。
[20] 图 6 的框图示出了根据本发明一个实施例的用于估计图像中对象姿 势视角的设备的结构。
[21] 图 7 示出了根据本发明一个实施例的用于估计图像中对象姿势视角 的方法的流程图。
[22] 图 8 的框图示出了根据本发明一个优选实施例的用于估计图像中对 象姿势视角的设备的结构。
[23] 图 9 示出了根据本发明一个优选实施例的用于估计图像中对象姿势 视角的方法的流程图。
[24] 图 10是示出其中实现本发明实施例的计算机的示例性结构的框图。
具体实施方式
[25] 下面参照附图来说明本发明的实施例。 应当注意, 为了清楚的目的, 附图和说明中省略了与本发明无关的、本领域普通技术人员已知的部件和 处理的表示和描述。
[26] 图 1 的框图示出了根据本发明一个实施例的用于基于输入图像进行 训练的设备 100的结构。
[27] 如图 1所示, 设备 100包括提取单元 101、 映射估计单元 102和概率 模型计算单元 103。
[28] 输入图像是包含具有各种姿势视角类别的对象的图像。各个姿势视角 类别分别表示对象所取的不同视角。 例如, 姿势视角类别可以包括 -80°、 -40°、 0°、 +40°和 +80°, 其中 -80°是表示对 # 目对于摄像机镜头右转 80度 的姿势视角类别、 -40°是表示对象相对于摄像机镜头右转 40度的姿势视 角类别、 0°是表示对象正对摄像机镜头的姿势视角类别、 +40°是表示对象 相对于摄像机镜头左转 40度的姿势视角类别, 而 +80°是表示对象相对于 摄像机镜头左转 80度的姿势视角类别。
[29] 当然, 姿势视角类别也可以代表视角范围。 例如, 将对象的正面视角
- - 从左侧面至右侧面的 180° 范围划分为 5个视角范围: [-90°, -54°】, [-54°, -18°】, [-18°, 18°】, [18°, 54°】, [54°, 90°】, 即 5个姿势视角类别。
[30] 姿势视角类别的数目和所代表的具体姿势视角可以根据需要任意设 定, 并不限于上述例子。
[31] 在本发明的实施例中,输入图像和相应的姿势视角类别均被提供给设 备 100。
[32] 优选地,输入图像包含不含背景的各种姿势视角的对象图像和含有背 景的各种姿势视角的对象图像。
[33] 提取单元 101 从具有视角类别的多个输入图像的每个输入图像中提 取图像特征。 图像特征可以是各种用于对象姿势估计的特征。优选地, 图 像特征是输入图像中边缘方向的统计特征,例如梯度方向直方图 HOG特 征和尺度不变特征变换 SIFT特征。
[34] 在一个具体示例中,假定以梯度方向直方图作为图像特征,并且输入 图像具有统一的宽和高( 120像素 X 100像素)。然而本发明的实施例并不 限于所假定的具体特征和尺寸。
[35] 在这个示例中,提取单元 101可分别计算输入图像中每一个像素在水 平方向和垂直方向的梯度, 即,
水平梯度: ( y) = d(I(x, y))/dx = I(x + l, y) - I(x - 1, y)
垂直梯度: (, = d^ ) 1 dy = (, y + _ y _
其中 /( 表示像素的灰度值, x,y分别表示像素在水平方向和垂直方 向的坐标。
[36] 于是,提取单元 101可根据输入图像中每一个像素的水平和垂直梯度 分别计算该像素的梯度方向和梯度大小, 即,
梯度方向: , y) = arg (| /x|) 梯度大小: Grad(x, y) = ^Ix 2 + Iy 2 其中梯度方向 S 3 的范围为 [0, ]。
[37] 在这个示例中,提取单元 101可在输入图像中从左至右、从上至下依 次取 24个 32 x 32大小的方块,其中水平方向每行 6个方块,垂直方向每 列 4个方块。 在水平方向和垂直方向相邻的任意两个方块之间重叠一半。
[38] 图 2的示意图示出了从输入图像中提取方块的模式。图 2中示出了三
- - 个 32 x 32大小的方块 201、 202和 203。 方块 202在垂直方向与方块 201 重叠 16个像素, 而方块 203在水平方向与方块 201重叠 16个像素。
[39] 提取单元 101可将每一个 32 X 32的方块划分为 16个 8 x 8的小方块, 其中水平方向每行 4个小方块,垂直方向每列 4个小方块。 小方块按照先 水平再垂直的顺序排列。
[40] 对于每一个 8 x 8的小方块,提取单元 101计算小方块中 64个像素的 梯度方向直方图, 其中将梯度方向划分为 8个方向区间, 即从 0到 Γ范围 内每 ϊ 8为一个方向区间。 也就是说, 基于每个 8 x 8的小方块的 64个像 素,针对 8个方向区间中的每个方向区间,计算梯度方向属于该方向区间 的像素的梯度大小的和, 从而得到一个 8维向量。 相应地, 每一个 32 X 32的方块得到一个 128维向量。
[41] 对于每一个输入图像,提取单元 101将每一个方块的向量依次连接得 到图像特征, 因而图像特征的维数为 3072维, 即 128 X 24 = 3072。
[42] 应当注意,本发明的实施例并不限于上述示例中方块和小方块的划分 模式和具体数字,也可以采用其它划分模式和具体数字。本发明的实施例 并不限于上述示例中提取特征的方法,也可以使用其它提取用于对象姿势 估计的图像特征的方法。
[43] 回到图 1, 映射估计单元 102针对多个视角类别中的每个视角类别, 通过线性回归分析估计将从属于该视角类别的输入图像中提取的图像特 征转换为与该输入图像相应的三维对象姿势信息的映射模型。 也就是说, 对于每个姿势视角类别,可以认为存在某种函数关系或映射关系,通过该 关系,能够将从该姿势视角类别的输入图像提取的图像特征转换或映射为 该输入图像的相应三维对象姿势信息。通过线性回归分析,可根据所提取 的图像特征和相应的三维对象姿势信息, 估计出这样的函数或映射关系, 即映射模型。
[44] 对于每个输入图像,预先准备有与该输入图像所包含的对象的姿势相 应的三维对象姿势信息。
[45] 在一个具体示例中, 从输入图像中提取的图像特征 (特征向量)表示为 Xm, 其中 m是图像特征的维数。 从"个输入图像中提取的所有图像特征 表示为矩阵 Xm*„。 另外, 与提取的图像特征 Xm相应的三维对象姿势信息 (向量)表示为 YP, 其中 ρ是三维对象姿势信息的维数。从"个输入图像中 提取的所有图像特征的相应三维对象姿势信息表示为矩阵 Υρ*η。
[46] Y≠n=A≠m^XM 于是采用线性回归分析, 例如最小二乘方法可 计算使得 (1>„- * ^*„)2取最小值的 Α„ 就是映射模型。
[47] 回到图 1,概率模型计算单元 103基于通过将图像特征与相应三维对 象姿势信息连接而得到的样本,计算联合概率分布模型,其中联合概率分 布模型所基于的单概率分布模型对应于不同视角类别,并且每个单概率分 布模型基于包含从相应视角类别的输入图像提取的图像特征的样本。
[48] 也就是说,上述联合概率分布模型基于针对不同视角类别的单概率分 布模型。 通过已知的方法, 能够根据每个视角类别的样本的集合, 能够计 算出相应的单概率分布模型 (即模型 , 进而能够计算出所有姿势视角 类别的单概率分布模型的联合概率分布模型 (即模型参数)。
[49] 适合使用的联合概率分布模型包括但不限于混合高斯模型、隐马尔科 夫模型和 frfr随机场。
[50] 在一个具体示例中, 采用混合高斯模型。在这个示例中, 利用图像特 征 (向量 )X和三维对象姿势信息 (向量) Γ组成联合特征 (即样本) [Χ,7]τ。 假 设联合特征 [Χ,7]τ满足概率分布公式:
其中 Μ为姿势视角类别的数目, V( | i? 为针对姿势视角类别 i的单 高斯模型, 即正态分布模型。 和 ;是正态分布模型的参数, A表示针对 姿势视角类别 的单高斯模型在混合高斯模型中的权重。 根据所有姿势视 角类别的联合特征集,通过已知的估计方法,例如期望最大化方法 EM能 够计算最优的 A, ^∑ i=l, ..., , 即映射模型。
[51] 图 3 示出了根据本发明一个实施例的用于基于输入图像进行训练的 方法 300的流程图。
[52] 如图 3所示, 方法 300从步骤 301开始。 在步骤 303, 从具有视角类 别的多个输入图像的每个输入图像中提取图像特征。输入图像和姿势视角 类别可以是前面参照图 1的实施例描述的输入图像和姿势视角类别。图像 特征可以是各种用于对象姿势估计的特征。优选地, 图像特征是输入图像 中边缘方向的统计特征,例如梯度方向直方图 HOG特征和尺度不变特征 变换 SIFT特征。
[53] 在步骤 305, 针对多个视角类别中的每个视角类别, 通过线性回归分 析估计将从属于该视角类别的输入图像中提取的图像特征转换为与该输
- - 入图像相应的三维对象姿势信息的映射模型。也就是说,对于每个姿势视 角类别, 可以认为存在某种函数关系或映射关系, 通过该关系, 能够将从 该姿势视角类别的输入图像提取的图像特征转换或映射为该输入图像的 相应三维对象姿势信息。通过线性回归分析,可根据所提取的图像特征和 相应的三维对象姿势信息, 估计出这样的函数或映射关系, 即映射模型。
[54] 对于每个输入图像,预先准备有与该输入图像所包含的对象的姿势相 应的三维对象姿势信息。
[55] 在一个具体示例中, 从输入图像中提取的图像特征 (特征向量)表示为 Xm, 其中 m是图像特征的维数。 从"个输入图像中提取的所有图像特征 表示为矩阵 Xm*„。 另外, 与提取的图像特征 Xm相应的三维对象姿势信息 (向量)表示为 YP, 其中 ρ是三维对象姿势信息的维数。从"个输入图像中 提取的所有图像特征的相应三维对象姿势信息表示为矩阵 Υρ*η。
[56] i . Ύ≠η=Αρ^ΧΜ 于是采用线性回归分析, 例如最小二乘方法可 计算使得 (1>„- * ^*„)2取最小值的 Α„ 就是映射模型。如果有 Q 个视角类别, 则会产生 Q个相应的映射模型。
[57] 接着在步骤 307,基于通过将图像特征与相应三维对象姿势信息连接 而得到的样本,计算联合概率分布模型,其中联合概率分布模型所基于的 单概率分布模型对应于不同视角类别,并且每个单概率分布模型基于包含 目应视角类别的输入图像提取的图像特征的样本。
[58] 也就是说,上述联合概率分布模型基于针对不同视角类别的单概率分 布模型。 通过已知的方法, 能够根据每个视角类别的样本的集合, 能够计 算出相应的单概率分布模型 (即模型 , 进而能够计算出所有姿势视角 类别的单概率分布模型的联合概率分布模型 (即模型参数)。
[59] 适合使用的联合概率分布模型包括但不限于混合高斯模型、隐马尔科 夫模型和 ^随机场。
[60] 在一个具体示例中, 采用混合高斯模型。在这个示例中, 利用图像特 征 (向量 )X和三维对象姿势信息 (向量) Γ组成联合特征 (即样本) [Χ,7]τ。 假 设联合特征 [Χ,7]τ满足概率分布公式:
其中 Μ为姿势视角类别的数目, V( | i? ;)为针对姿势视角类别 i的单 高斯模型, 即正态分布模型。 和 ;是正态分布模型的参数, A表示针对
- - 姿势视角类别 的单高斯模型在混合高斯模型中的权重。 根据所有姿势视 角类别的联合特征集,通过已知的估计方法,例如期望最大化方法 EM能 够计算最优的 A, ^∑ i=l, ..., , 即映射模型。
[61] 接着, 方法 300在步骤 309结束。
[62] 图 4 的框图示出了根据本发明一个优选实施例的用于基于输入图像 进行训练的设备 400的结构。
[63] 如图 4所示, 设备 400包括提取单元 401、 映射估计单元 402、 概率 模型计算单元 403、 变换模型计算单元 404和特征变换单元 405。 提取单 元 401、 映射估计单元 402、 概率模型计算单元 403的功能与图 1中的提 取单元 101、 映射估计单元 102、 概率模型计算单元 103相同, 不再重复 说明。 然而应当注意, 提取单元 401 被配置为向变换模型计算单元 404 和特征变换单元 405输出所提取的图像特征,并且输入映射估计单元 402、 概率模型计算单元 403的图像特征来自于特征变换单元 405。
[64] 变换模型计算单元 404 利用降维方法计算将图像特征降维的特征变 换模型。 降维方法包括但不限于主成份分析方法、 因子分析方法、单值分 解、 多维尺度分析、 局部线性嵌入、 等距映射、 线性鉴别分析、 局部切空 间排列和最大方差展开。 所得到的特征变换模型可用来将提取单元 401 提取的图像特征变换为维数更小的图像特征。
[65] 在一个具体示例中, 从输入图像中提取的图像特征 (特征向量)表示为 Xm, 其中 m是图像特征的维数。 从"个输入图像中提取的所有图像特征 表示为矩阵 Xm*n。 可利用主成份分析方法根据图像特征 X 计算矩阵
其中 d<m。
[66] 特征变换单元 405利用特征变换模型变换图像特征,以用于映射模型 的估计和联合概率分布模型的计算。 例如, 在前面的示例中, 可通过下式 来计算变换的图像特征:
变换的图像特征 (维数为 被提供给映射估计单元 402、 概率模型计算单 元 403。
[67] 在上述实施例中,由于经过特征变换模型的变换的图像特征具有更低 的维数, 利于降低后续估计和计算的处理量。
[68] 图 5 示出了根据本发明一个优选实施例的用于基于输入图像进行训
- - 练的方法 500的流程图。
[69] 如图 5所示, 方法 500从步骤 501开始。 在步骤 502, 与方法 300的 步骤 303相同,从具有视角类别的多个输入图像的每个输入图像中提取图 像特征。
[70] 在步骤 503, 利用降维方法计算将在步骤 502提取的图像特征降维的 特征变换模型。 降维方法包括但不限于主成份分析方法、 因子分析方法、 单值分解、 多维尺度分析、 局部线性嵌入、 等距映射、 线性鉴别分析、 局 部切空间排列和最大方差展开。所得到的特征变换模型可用来将提取的图 像特征变换为维数更小的图像特征。
[71] 在一个具体示例中, 从输入图像中提取的图像特征 (特征向量)表示为 Xm, 其中 m是图像特征的维数。 从"个输入图像中提取的所有图像特征 表示为矩阵 Xm*n。 可利用主成份分析方法根据图像特征 X 计算矩阵
MapcPmi 其中 d<m。
[72] 在步骤 504, 利用特征变换模型变换图像特征, 以用于映射模型的估 计和联合概率分布模型的计算。 例如, 在前面的示例中, 可通过下式来计 算变换的图像特征:
[73] 在步骤 505, 与方法 300的步骤 305相同, 针对多个视角类别中的每 个视角类别,通过线性回归分析估计将从属于该视角类别的输入图像中提 取的图像特征 (已经过变换)转换为与该输入图像相应的三维对象姿势信 息的映射模型。
[74] 接着在步骤 507, 与方法 300的步骤步骤 307相同, 基于通过将图像 特征 (已经过变换)与相应三维对象姿势信息连接而得到的样本, 计算联合 概率分布模型,其中联合概率分布模型所基于的单概率分布模型对应于不 同视角类别,并且每个单概率分布模型基于包含 目应视角类别的输入图 像提取的图像特征的样本。
[75] 接着, 方法 500在步骤 509结束。
[76] 图 6 的框图示出了根据本发明一个实施例的用于估计图像中对象姿 势视角的设备 600的结构。
[77] 如图 6所示, 设备 600包括提取单元 601、 映射单元 602、 概率计算 单元 603和估计单元 604。
- -
[78] 提取单元 601从输入图像中提取图像特征。输入图像的规格与前面参 照图 1的实施例描述的输入图 目同。图像特征和提取图像特征的方法与 要采用的映射模型所基于的图像特征及其提取方法 (如前面参照图 1的实 施例所描述的)相同。
[79] 映射单元 602针对多个视角类别中的每个视角类别,基于与该视角类 别对应的、用于将图像特征映射到三维对象姿势信息的映射模型,获得图 像特征的相应三维对象姿势信息。映射模型是前面参照图 1的实施例描述 的映射模型。 这里, 对于从输入图像中提取的图像特征 ^, 其中 m是图 像特征的维数,映射单元 602假设所有的视角类别对于该输入图 是可 能的。相应地, 映射单元 602针对每个假设的视角类别, 用相应的映射模 型 获得相应的三维对象姿势信息 ί^Α * ^
[80] 概率计算单元 603根据基于针对视角类别的单概率分布模型的联合 概率分布模型,计算每个视角类别的包含图像特征和相应三维对象姿势信 息的联合特征的联合概率,并且根据联合概率计算在相应三维对象姿势信 息的 下图像特征的 概率。联合概率分布模型是前面参照图 1的实 施例描述的联合概率分布模型。也就是说, 对于每个假设的视角类别, 概 率计算单元 603用图像特征 X和相应的三维对象姿势信息 Γ组成联合特 征 [X, ητ,利用联合概率分布模型计算联合特征 [X, 7]τ的联合概率值 ρ([Χ, 7]τ)ο根据所得到的联合概率值 ρ([Χ, 7]τ),概率计算单元 603例如使用贝 叶斯法则计算 ^概率 p(y|X), 即 ρ(ί1Χ)= ρ([Χ, Υ]Ύ)/ίρ([Χ, 7]τ)ί Χ。
[81] 估计单元 604将针对所有可能视角类别计算的条件概率 中最大 的 ^概率所对应的视角类别估计为输入图像中的对象姿势视角。
[82] 图 7 示出了根据本发明一个实施例的用于估计图像中对象姿势视角 的方法 700的流程图。
[83] 如图 7所示, 方法 700从步骤 701开始。 在步骤 703, 从输入图像中 提取图像特征。输入图像的 与前面参照图 1的实施例描述的输入图像 相同。图像特征和提取图像特征的方法与要采用的映射模型所基于的图像 特征及其提取方法 (如前面参照图 1的实施例所描述的)相同。
[84] 在步骤 705, 针对多个视角类别中的每个视角类别, 基于与该视角类 别对应的、用于将图像特征映射到三维对象姿势信息的映射模型,获得图 像特征的相应三维对象姿势信息。映射模型是前面参照图 1的实施例描述 的映射模型。 这里, 对于从输入图像中提取的图像特征 ^, 其中 m是图
- - 像特征的维数,在步骤 705假设所有的视角类别对于该输入图像都是可能 的。 相应地, 在步骤 705针对每个假设的视角类别, 用相应的映射模型
A 获得相应的三维对象姿势信息 =A *Xm。
[85] 在步骤 707,根据基于针对视角类别的单概率分布模型的联合概率分 布模型,计算每个视角类别的包含图像特征和相应三维对象姿势信息的联 合特征的联合概率,并且根据联合概率计算在相应三维对象姿势信息的条 件下图像特征的条件概率。联合概率分布模型是前面参照图 1的实施例描 述的联合概率分布模型。 也就是说, 对于每个假设的视角类别, 在步骤 707用图像特征 X和相应的三维对象姿势信息 组成联合特征 [X, 7]τ,利 用联合概率分布模型计算联合特征 [Χ, 7]τ的联合概率值 ρ([Χ, 7]τ)„ 根据 所得到的联合概率值 p([X, Yf) > 例如使用贝叶斯法则计算条件概率 ρ(Υ\Χ), 即 ρ(ηΧ)= ρ([Χ, Υ]Ύ)/ίρ([Χ, ίΊΤ) 。
[86] 在步骤 708, 将针对所有可能视角类别计算的条件概率 中最大 的条件概率所对应的视角类别估计为输入图像中的对象姿势视角。 方法 700在步骤 709结束。
[87] 图 8 的框图示出了根据本发明一个优选实施例的用于估计图像中对 象姿势视角的设备 800的结构。
[88] 如图 8所示, 设备 800包括提取单元 801、 变换单元 805、 映射单元 802、 概率计算单元 803和估计单元 804。 提取单元 801、 映射单元 802、 概率计算单元 803和估计单元 804分别与图 6的实施例的提取单元 601、 映射单元 602、 概率计算单元 603和估计单元 604功能相同, 不再重复说 明。然而应当注意,提取单元 801被配置为向变换单元 805输出所提取的 图像特征, 并且映射单元 802、 概率计算单元 803的图像特征来自于变换 单元 805„
[89] 变换单元 805通过用于降维的特征变换模型将图像特征变换,以用于 获得三维对象姿势信息。特征变换模型可以是前面参照图 4的实施例描述 的特征变换模型。
[90] 在上述实施例中,由于经过特征变换模型的变换的图像特征具有更低 的维数, 利于降低后续映射和计算的处理量。
[91] 图 9 示出了根据本发明一个优选实施例的用于估计图像中对象姿势 视角的方法 900的流程图。
[92] 如图 9所示, 方法 900从步骤 901开始。 在步骤 903, 与步骤 703相
- - 同, 从输入图像中提取图像特征。
[93] 在步骤 904, 通过用于降维的特征变换模型将图像特征变换, 以用于 获得三维对象姿势信息。特征变换模型可以是前面参照图 4的实施例描述 的特征变换模型。
[94] 在步骤 905,与步骤 705相同,针对多个视角类别中的每个视角类别, 基于与该视角类别对应的、用于将图像特征映射到三维对象姿势信息的映 射模型, 获得图像特征的相应三维对象姿势信息。
[95] 在步骤 907, 与步骤 707相同, 根据基于针对视角类别的单概率分布 模型的联合概率分布模型,计算每个视角类别的包含图像特征和相应三维 对象姿势信息的联合特征的联合概率,并且根据联合概率计算在相应三维 对象姿势信息的条件下图像特征的条件概率。
[96] 在步骤 908, 与步骤 708相同, 将针对所有可能视角类别计算的条件 概率中最大的条件概率所对应的视角类别估计为输入图像中的对象姿势 视角。 方法 900在步骤 909结束。
[97] 虽然前面针对图像说明了本发明的实施例,然而本发明的实施例也可 以应用于视频, 其中将视频作为图像的序列来处理。
[98] 图 10是示出其中实现本发明实施例的计算机的示例性结构的框图。
[99] 在图 10中, 中央处理单元 (CPU)lOOl根据只读映射数据 (ROM)1002 中存储的程序或从存储部分 1008加载到随机存取映射数据 (RAM)1003的 程序执行各种处理。在 RAM 1003中, 也根据需要存储当 CPU 1001执行 各种处理等等时所需的数据。
[100] CPU 1001、 ROM 1002和 RAM 1003经由总线 1004彼此连接。 输入 /输出接口 1005也连接到总线 1004。
[101] 下述部件连接到输入 /输出接口 1005: 输入部分 1006, 包括键盘、 鼠 标等等; 输出部分 1007, 包括显示器, 比如阴极射线管 (CRT)、 液晶显示 器 (LCD)等等, 和扬声器等等; 存储部分 1008, 包括硬盘等等; 和通信部 分 1009, 包括网络接口卡比如 LAN卡、调制解调器等等。通信部分 1009 经由网络比如因特网执行通信处理。
[102] 根据需要, 驱动器 1010也连接到输入 /输出接口 1005。 可拆卸介质 1011 比如磁盘、 光盘、 磁光盘、 半导体映射数据等等根据需要被安装在 驱动器 1010上, 使得从中读出的计算才 ^序根据需要被安装到存储部分
- -
1008中。
[103] 在通过软件实现上述步骤和处理的情况下,从网络比如因特网或存储 介质比如可拆卸介质 1011安装构成软件的程序。
[104] 本领域的技术人员应当理解, 这种存储介盾不局限于图 10所示的其 中存储有程序、 与方法相分离地分发以向用户提供程序的可拆卸介质 1011。 可拆卸介质 1011 的例子包含磁盘、 光盘 (包含光盘只读映射数据 (CD-ROM)和数字通用盘 (DVD))、 磁光盘(包含迷你盘 (MD)和半导体映 射数据。 或者, 存储介质可以是 ROM 1002、 存储部分 1008中包含的硬 盘等等, 其中存有程序, 并且与包含它们的方法一起被分发给用户。
[105] 在前面的说明书中参照特定实施例描述了本发明。然而本领域的普通 技术人员理解,在不偏离如权利要求书限定的本发明的范围的前提下可以 进行各种修改和改变。
Claims
1. 一种估计图像中对象姿势视角的方法, 包括:
从输入图像中提取图像特征;
针对多个视角类别中的每个视角类别,基于与该视角类别对应的、用 于将图像特征映射到三维对象姿势信息的映射模型,获得所述图像特征的 相应三维对象姿势信息;
根据基于针对所述视角类别的单概率分布模型的联合概率分布模型, 计算每个视角类别的包含所述图像特征和相应三维对象姿势信息的联合 特征的联合概率;
根据所述联合概率计算在所述相应三维对象姿势信息的条件下所述 图像特征的 ^概率; 和
将所述条件概率中最大的条件概率所对应的视角类别估计为所述输 入图像中的对象姿势视角。
2. 如权利要求 1所述的方法, 还包括:
通过用于降维的特征变换模型将所述图像特征变换,以用于获得所述 三维对象姿势信息。
3. 如权利要求 1或 2所述的方法, 其中所述图像特征为图像边缘方 向的统计特征。
4. 如权利要求 1或 2所述的方法, 其中所述联合概率分布模型基于 混合高斯模型、 隐马尔科夫模型或条件随机场。
5. 一种估计图像中对象姿势视角的设备, 包括:
提取单元, 其从输入图像中提取图像特征;
映射单元,其针对多个视角类别中的每个视角类别,基于与该视角类 别对应的、用于将图像特征映射到三维对象姿势信息的映射模型,获得所 述图像特征的相应三维对象姿势信息;
概率计算单元,其根据基于针对所述视角类别的单概率分布模型的联 合概率分布模型,计算每个视角类别的包含所述图像特征和相应三维对象 姿势信息的联合特征的联合概率,并且根据所述联合概率计算在所 目应
三维对象姿势信息的条件下所述图像特征的条件概率; 和
估计单元,其将所述条件概率中最大的条件概率所对应的视角类别估 计为所述输入图像中的对象姿势视角。
6. 如权利要求 5所述的设备, 还包括:
变换单元,其通过用于降维的特征变换模型将所述图像特征变换, 以 用于获得所述三维对象姿势信息。
7. 如权利要求 5或 6所述的设备, 其中所述图像特征为图像边缘方 向的统计特征。
8. 如权利要求 5或 6所述的设备, 其中所述联合概率分布模型基于 混合高斯模型、 隐马尔科夫模型或条件随机场。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/266,057 US20120045117A1 (en) | 2009-04-24 | 2010-04-23 | Method and device for training, method and device for estimating posture visual angle of object in image |
JP2012506329A JP5500245B2 (ja) | 2009-04-24 | 2010-04-23 | トレーニング方法及び装置並びに画像における対象の姿勢視角を推定する方法及び装置 |
EP10766658A EP2423878A1 (en) | 2009-04-24 | 2010-04-23 | Method and device for training, method and device for estimating posture visual angle of object in image |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910137360A CN101872476A (zh) | 2009-04-24 | 2009-04-24 | 估计图像中对象姿势视角的方法、设备 |
CN200910137360.5 | 2009-04-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010121568A1 true WO2010121568A1 (zh) | 2010-10-28 |
Family
ID=42997321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2010/072150 WO2010121568A1 (zh) | 2009-04-24 | 2010-04-23 | 训练方法、设备和估计图像中对象姿势视角的方法、设备 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20120045117A1 (zh) |
EP (1) | EP2423878A1 (zh) |
JP (1) | JP5500245B2 (zh) |
CN (1) | CN101872476A (zh) |
WO (1) | WO2010121568A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101326230B1 (ko) * | 2010-09-17 | 2013-11-20 | 한국과학기술원 | 사용자 동적 기관 제스처 인식 방법 및 인터페이스와, 이를 사용하는 전기 사용 장치 |
KR101298024B1 (ko) * | 2010-09-17 | 2013-08-26 | 엘지디스플레이 주식회사 | 사용자 동적 기관 제스처 인식 방법 및 인터페이스와, 이를 사용하는 전기 사용 장치 |
KR101904203B1 (ko) * | 2012-06-20 | 2018-10-05 | 삼성전자주식회사 | 시프트 알고리즘을 이용하여 대용량 소스 이미지의 특징점 정보를 추출하는 장치 및 방법 |
CN104050712B (zh) * | 2013-03-15 | 2018-06-05 | 索尼公司 | 三维模型的建立方法和装置 |
US10254758B2 (en) * | 2017-01-18 | 2019-04-09 | Ford Global Technologies, Llc | Object tracking by unsupervised learning |
US12141937B1 (en) * | 2019-05-24 | 2024-11-12 | Apple Inc. | Fitness system |
CN114169393A (zh) * | 2021-11-03 | 2022-03-11 | 华为技术有限公司 | 一种图像分类方法及其相关设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040091153A1 (en) * | 2002-11-08 | 2004-05-13 | Minolta Co., Ltd. | Method for detecting object formed of regions from image |
CN101048799A (zh) * | 2004-10-25 | 2007-10-03 | 惠普开发有限公司 | 通过实时视频动作分析理解视频内容 |
CN101093582A (zh) * | 2006-06-19 | 2007-12-26 | 索尼株式会社 | 动作捕捉装置和动作捕捉方法、以及动作捕捉程序 |
US20080031492A1 (en) * | 2006-07-10 | 2008-02-07 | Fondazione Bruno Kessler | Method and apparatus for tracking a number of objects or object parts in image sequences |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003141538A (ja) * | 2001-11-07 | 2003-05-16 | Communication Research Laboratory | テンプレート・マッチング方法 |
JP2003150963A (ja) * | 2001-11-13 | 2003-05-23 | Japan Science & Technology Corp | 顔画像認識方法及び顔画像認識装置 |
JP4070618B2 (ja) * | 2003-01-15 | 2008-04-02 | 日本電信電話株式会社 | 物体追跡方法、物体追跡装置、物体追跡方法のプログラム並びにそのプログラムを記録した記録媒体 |
JP4600128B2 (ja) * | 2005-04-12 | 2010-12-15 | 株式会社デンソー | 演算回路及び画像認識装置 |
CN101271515B (zh) * | 2007-03-21 | 2014-03-19 | 株式会社理光 | 能识别多角度目标的图像检测装置 |
JP4850768B2 (ja) * | 2007-03-27 | 2012-01-11 | 独立行政法人情報通信研究機構 | 3次元の人の顔の表面データを再構築するための装置及びプログラム |
CN100485713C (zh) * | 2007-03-29 | 2009-05-06 | 浙江大学 | 基于集成隐马尔可夫模型学习方法的人体运动数据的识别方法 |
US7844105B2 (en) * | 2007-04-23 | 2010-11-30 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for determining objects poses from range images |
JP5617166B2 (ja) * | 2009-02-09 | 2014-11-05 | 日本電気株式会社 | 回転推定装置、回転推定方法およびプログラム |
-
2009
- 2009-04-24 CN CN200910137360A patent/CN101872476A/zh active Pending
-
2010
- 2010-04-23 US US13/266,057 patent/US20120045117A1/en not_active Abandoned
- 2010-04-23 WO PCT/CN2010/072150 patent/WO2010121568A1/zh active Application Filing
- 2010-04-23 JP JP2012506329A patent/JP5500245B2/ja not_active Expired - Fee Related
- 2010-04-23 EP EP10766658A patent/EP2423878A1/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040091153A1 (en) * | 2002-11-08 | 2004-05-13 | Minolta Co., Ltd. | Method for detecting object formed of regions from image |
CN101048799A (zh) * | 2004-10-25 | 2007-10-03 | 惠普开发有限公司 | 通过实时视频动作分析理解视频内容 |
CN101093582A (zh) * | 2006-06-19 | 2007-12-26 | 索尼株式会社 | 动作捕捉装置和动作捕捉方法、以及动作捕捉程序 |
US20080031492A1 (en) * | 2006-07-10 | 2008-02-07 | Fondazione Bruno Kessler | Method and apparatus for tracking a number of objects or object parts in image sequences |
Also Published As
Publication number | Publication date |
---|---|
EP2423878A1 (en) | 2012-02-29 |
CN101872476A (zh) | 2010-10-27 |
JP5500245B2 (ja) | 2014-05-21 |
JP2012524920A (ja) | 2012-10-18 |
US20120045117A1 (en) | 2012-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3576017B1 (en) | Method and system for determining pose of object in image, and storage medium | |
WO2010121568A1 (zh) | 训练方法、设备和估计图像中对象姿势视角的方法、设备 | |
CN107730515B (zh) | 基于区域增长和眼动模型的全景图像显著性检测方法 | |
CN104867111B (zh) | 一种基于分块模糊核集的非均匀视频盲去模糊方法 | |
WO2018137623A1 (zh) | 图像处理方法、装置以及电子设备 | |
CN112329663B (zh) | 一种基于人脸图像序列的微表情时刻检测方法及装置 | |
WO2023142602A1 (zh) | 图像处理方法、装置和计算机可读存储介质 | |
CN108510496B (zh) | 基于图像dct域的svd分解的模糊检测方法 | |
CN110570435A (zh) | 用于对车辆损伤图像进行损伤分割的方法及装置 | |
WO2022126674A1 (zh) | 立体全景图像的质量评价方法、系统 | |
CN101739690A (zh) | 多相机协同运动目标检测方法 | |
CN113436251A (zh) | 一种基于改进的yolo6d算法的位姿估计系统及方法 | |
CN113409287B (zh) | 人脸图像质量的评估方法、装置、设备及存储介质 | |
CN115131218A (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
US20230394833A1 (en) | Method, system and computer readable media for object detection coverage estimation | |
CN105931189B (zh) | 一种基于改进超分辨率参数化模型的视频超分辨率方法及装置 | |
US8351650B2 (en) | Foreground action estimating apparatus and foreground action estimating method | |
CN114663880A (zh) | 基于多层级跨模态自注意力机制的三维目标检测方法 | |
CN108805139A (zh) | 一种基于频域视觉显著性分析的图像相似性计算方法 | |
JP5848665B2 (ja) | 移動物体上動きベクトル検出装置、移動物体上動きベクトル検出方法、およびプログラム | |
CN113763313A (zh) | 文本图像的质量检测方法、装置、介质及电子设备 | |
JP2018097795A (ja) | 法線推定装置、法線推定方法、及び法線推定プログラム | |
Jiang et al. | Discriminative metric preservation for tracking low-resolution targets | |
Hajder et al. | Weak-perspective structure from motion for strongly contaminated data | |
CN105005965B (zh) | 基于最大期望算法的自然图像超分辨方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10766658 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13266057 Country of ref document: US Ref document number: 2012506329 Country of ref document: JP Ref document number: 2010766658 Country of ref document: EP |