JP5503510B2 - Posture estimation apparatus and posture estimation program - Google Patents

Posture estimation apparatus and posture estimation program Download PDF

Info

Publication number
JP5503510B2
JP5503510B2 JP2010260468A JP2010260468A JP5503510B2 JP 5503510 B2 JP5503510 B2 JP 5503510B2 JP 2010260468 A JP2010260468 A JP 2010260468A JP 2010260468 A JP2010260468 A JP 2010260468A JP 5503510 B2 JP5503510 B2 JP 5503510B2
Authority
JP
Japan
Prior art keywords
image
cg
frame
means
joint angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2010260468A
Other languages
Japanese (ja)
Other versions
JP2012113438A (en
Inventor
誠喜 井上
周平 秦
Original Assignee
日本放送協会
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本放送協会 filed Critical 日本放送協会
Priority to JP2010260468A priority Critical patent/JP5503510B2/en
Publication of JP2012113438A publication Critical patent/JP2012113438A/en
Application granted granted Critical
Publication of JP5503510B2 publication Critical patent/JP5503510B2/en
Application status is Expired - Fee Related legal-status Critical
Anticipated expiration legal-status Critical

Links

Description

  The present invention relates to a posture estimation device and a posture estimation program for estimating a posture or a motion of a target object by image processing from a single-point still image or a captured image showing a moving image in which the target object to be estimated is captured. .

  Conventionally, various motion capture methods using monocular images (single-viewpoint still images or moving images) captured by a single camera have been proposed. If the object to be estimated is a person, if the posture of the person can be estimated from a single-viewpoint person image, it is effective for analysis of person movements and for the production of character animation by computer graphics (CG).

In order to extract a person region from a photographed image and estimate the posture from the shape and pattern, the following methods and the like have been proposed.
(1) A three-dimensional CG model having a human skeleton structure is prepared, and posture estimation is performed by matching a CG image generated by variously moving the skeleton with a captured image. At this time, for example, a person region is extracted from the photographed image, and the image feature is compared with the image feature of the CG generation video (for example, see Non-Patent Document 1).
(2) A person region is extracted from the photographed image, the positions of the person's limbs, elbows, and knees are estimated from the shape (silhouette), and the internal skeleton is estimated (see, for example, Patent Document 1).
(3) A person region is extracted from the photographed image, and the shape (silhouette) is compared with the silhouette of the CG generation image. In this case, the comparison is performed by XOR (exclusive OR) of the two images.

JP 2004-164480 A

"3D human body posture estimation from monocular images based on HOG features", Image Recognition and Understanding Symposium (MIRU2008), July 2008

  However, if an image feature is used as in the method (1) described above, the estimation result is greatly influenced by the pattern of clothes worn by a person. If the clothes are different, the degree of matching between the captured image and the CG image changes, and accurate posture estimation cannot be performed.

  In addition, the above-described method (2) uses a silhouette so that the influence of clothing is small. However, it is difficult to recognize a part such as a person's limb, elbow, or knee, and the accuracy of the silhouette shape Whether or not the site can be accurately identified is greatly affected. That is, there is a high possibility that a part is erroneously detected due to an error in extracting a silhouette shape, that is, an error in the region extraction stage.

  In addition, the method (3) described above is a relatively robust method (robust method), but if the matching between silhouettes is simply performed by XOR, the influence of the difference in the position on the screen and the thickness of the limbs. Will receive. In other words, when compared simply by silhouette, for example, the orientation of the silhouette is slightly changed, or the overlapping of both feet and arms during walking is not suitable. In addition, when creating a model with CG, even if a foot model, for example, which is considered to be standard, is created, there are individual differences in how the muscles and thickness of a person, which is an object in the photographed image, are added. Therefore, even if the shape is the same, if the thickness is different, the desired matching result cannot be obtained. In short, the current state of the art has not yet been able to accurately reproduce the characteristics of various operations.

  The present invention has been made in view of the above-described problems, and can improve the accuracy of matching with a corresponding CG image when estimating the posture or motion from a captured image of an object to be estimated. It is an object of the present invention to provide an estimation device and a posture estimation program.

  In order to solve the above-described problem, the posture estimation apparatus according to claim 1 is configured to perform image processing on an object from a single-view still image or moving image showing a moving image captured from an object to be estimated. A posture estimation device for estimating a parameter characterizing the posture or motion of an object, comprising: an image input means, a specific area extraction means, a thinning means, an expansion processing means, a distance conversion means, a gradient feature value extraction means, And collation means.

  According to such a configuration, the posture estimation apparatus inputs the photographed image by the image input means, and also includes a CG character model obtained by modeling an object in the photographed image as a multi-joint object for computer graphics (CG), and Based on the joint angle parameter used in the CG character model, a CG image generated by pseudo-drawing an object in the captured image is input. Here, if the object to be estimated is, for example, a person, the CG character model includes a human body model. Then, the posture estimation apparatus extracts a silhouette obtained by binarizing the specific area of the object from the input photographed image by the specific area extraction unit, and also extracts the specific area of the object from the input CG image. Extract a valuated silhouette. Then, the posture estimation device performs thinning processing on each of the extracted silhouettes by thinning means, performs expansion processing on each of the thinned silhouettes by expansion processing means, and by distance conversion means, A grayscale image is generated by performing distance conversion on each of the expanded silhouettes.

Here, the thinning process, the expansion process, and the distance conversion can be realized by using a function stored in a library in general image processing software.
Further, the thinning process converts the silhouette of the binary image into a line image having a width of 1 pixel, and the expansion process widens the thin line to a uniform thickness. Therefore, for example, if an expansion process is performed on a thinned silhouette after being extracted from the photographed image, the silhouette extracted from the photographed image is not restored, but the silhouette in which the thin line is widened to an equal thickness It becomes. Thus, for example, when a silhouette of a person's foot in a photographed image is extracted without being affected by the thickness of an object in the image, it is created with a uniform width in advance regardless of individual differences in the foot. Matching with the silhouette of the foot portion of the CG model can be performed with high accuracy.

  The distance conversion is a conversion that gives the shortest distance from each pixel of a binary image having values 0 and 1 to a pixel having a value 0. For this reason, the shortest distance among the distances from the pixels in the silhouette of the binary image to the pixels at the contour edge of the silhouette can be given. Therefore, after the distance conversion, a grayscale image is obtained in which the edge of the original shape is appropriately cut according to the original shape of the silhouette of the binary image. When a shade image is generated by adding shading to a silhouette in this way, a brightness gradient appears as the directionality of the silhouette. Therefore, when the silhouette obtained by region extraction in the conventional technology is simply compared, the orientation of the silhouette changes slightly, or the problem of incompatibility due to the overlapping of silhouettes is solved. The directionality can be found, and a desired matching result can be obtained.

  Then, the posture estimation apparatus calculates HOG (Histogram of Oriented Gradient) as the feature value of each gray image by the gradient feature value extraction means. Here, HOG indicates a feature amount obtained by extracting a brightness difference between adjacent pixels in the horizontal direction and the vertical direction as a luminance gradient with respect to a pixel of interest of an image. Then, the posture estimation apparatus collates the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image by the collating unit, A joint angle parameter of an object in the captured image is estimated. As a result of the collation, the smaller the HOG difference is, the greater the similarity of the CG image to the captured image. Further, the posture estimation device can obtain the joint angle parameter used when generating such a CG image as the posture estimation result.

If a silhouette or grayscale image is compared without calculating the HOG, even if the object of the photographed image and the CG image have the same shape, matching is not possible because the position of the object is shifted. . In order to solve such a problem, the posture estimation apparatus compares the HOG based on the silhouette of the image by the matching unit, so the position of the object in the grayscale image obtained based on the silhouette of the captured image, Even if the position of the object in the grayscale image obtained based on the silhouette of the CG image is deviated, the feature amount is obtained from the inclination of the brightness of the object, so the influence of the difference in the position of the object on the screen It is possible to perform the collation with high accuracy without receiving.
Further, even if HOG is applied as in Non-Patent Document 1, if the silhouette of the image is not extracted, matching is not possible if the object to be estimated is a person and the clothes are different. For such a problem, the posture estimation apparatus generates a grayscale image after extracting the silhouette of the image, and further calculates the HOG from the grayscale image, so that it is not affected by the clothes pattern or the like. The captured image and the CG image can be collated with high accuracy while using the robustness of the silhouette collation.

  According to a second aspect of the present invention, there is provided a posture estimation apparatus according to the first aspect, in which the model sequence storage unit and the CG image generation unit are configured to generate the CG image to be input to the image input unit. It is preferable to further comprise.

  According to this configuration, the posture estimation apparatus stores, as a model sequence, the value of the joint angle parameter created in advance for each frame as a model for the target object to perform a series of predetermined operations in the model sequence storage unit. To do. Here, if the object is, for example, a person, the model sequence includes models corresponding to individual movements such as walking and running. In the posture estimation apparatus, the CG image generation unit reads the joint angle parameter read out from the model sequence storage unit for each frame in association with a captured frame image that is a captured image input to the image input unit for each frame. And a CG frame image as a CG image for each frame based on the value of CG and the CG character model. In the posture estimation apparatus, the specific area extracting unit extracts a silhouette obtained by binarizing the specific area of the object from the captured frame image, and binarizes the specific area of the object from the CG frame image. The silhouette is extracted, and the thinning unit, the expansion processing unit, the distance conversion unit, and the gradient feature amount extraction unit perform image processing for each frame of the captured frame image and the CG frame image.

  The posture estimation apparatus according to claim 3 is the posture estimation apparatus according to claim 2, further comprising parameter changing means, wherein the collating means comprises difference calculation means and spatial feature determination means. It is preferable.

  According to this configuration, the posture estimation apparatus changes the value of the joint angle parameter read for each frame from the model sequence storage unit with respect to the captured frame image by the parameter changing unit within a predetermined range. To do. Then, in the posture estimation apparatus, the CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model. Generate. Further, in the posture estimation apparatus, the collating unit may calculate the joint angle parameter read from the model sequence storage unit or the joint angle parameter changed by the parameter changing unit with respect to the captured frame image by the difference calculating unit. Difference data between each HOG based on the silhouette of the CG image generated using the value and the HOG based on the silhouette of the captured frame image is calculated. Then, in the posture estimation apparatus, the collating unit, based on the difference data of the HOG calculated for the captured frame image when the frame number of the model sequence is fixed by the spatial feature determining unit, The value of the joint angle parameter when the difference data is minimum is determined, and the frame number and the value of the joint angle parameter are output as an estimation result.

  In addition, in the posture estimation apparatus according to a fourth aspect, in the posture estimation apparatus according to the second aspect, the collating unit preferably includes a difference calculation unit and a temporal feature extraction unit.

  According to such a configuration, in the posture estimation apparatus, the collating unit is CG generated by the difference calculating unit using the value of the joint angle parameter read from the model sequence storage unit with respect to the captured frame image. Difference data between each HOG based on the silhouette of the image and the HOG based on the silhouette of the captured frame image is calculated. Then, in the posture estimation apparatus, the collation unit is configured to determine a joint angle parameter read from the model sequence storage unit when the frame number of the model sequence is changed with respect to the captured frame image by the temporal feature extraction unit. Based on the difference data of the HOG for the CG image generated using the value, the frame number of the model sequence when the difference data is minimized is extracted, and the frame number and the joint angle parameter Output the value.

  The posture estimation apparatus according to claim 5 is the posture estimation apparatus according to claim 2, further comprising parameter changing means, wherein the collating means includes a difference calculating means, a temporal feature extracting means, a spatial It is preferable to include a feature determination unit.

  According to this configuration, the posture estimation apparatus changes the value of the joint angle parameter read for each frame from the model sequence storage unit with respect to the captured frame image by the parameter changing unit within a predetermined range. To do. Then, in the posture estimation apparatus, the CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model. Generate. Further, in the posture estimation apparatus, the collating unit may be configured such that the difference calculating unit reads the joint angle parameter read from the model sequence storage unit with respect to the captured frame image or the joint angle parameter changed by the parameter changing unit. Difference data between each HOG based on the silhouette of the CG image generated using the value of H and the HOG based on the silhouette of the captured frame image is calculated. In the posture estimation apparatus, as a first step, the collating unit reads out from the model sequence storage unit when the frame number of the model sequence is changed with respect to the captured frame image by the temporal feature extracting unit. The frame number of the model sequence when the difference data is minimized is extracted based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter. Thereby, the timing of each frame of the model sequence created in advance and each frame of the captured image can be matched. In the posture estimation apparatus, as a second stage, the collating unit fixes the extracted frame number by the spatial feature determining unit and changes the value of the joint angle parameter by the parameter changing unit. Sometimes, based on the difference data of the HOG calculated for the captured frame image, the value of the joint angle parameter when the difference data is minimum is specified, and the frame number and the joint angle are obtained as an estimation result. Outputs the parameter value.

  According to a sixth aspect of the present invention, there is provided a posture estimation program that determines the posture or movement of an object by image processing from a single-view still image obtained by photographing the object to be estimated or a photographed image showing a moving image. This is a program for causing a computer to function as an image input unit, a specific region extraction unit, a thinning unit, an expansion processing unit, a distance conversion unit, a gradient feature amount extraction unit, and a collation unit in order to estimate a parameter to be characterized.

  According to this configuration, the posture estimation program inputs the photographed image by the image input unit, and also creates a CG character model obtained by modeling an object in the photographed image as a multi-joint object for computer graphics (CG), and Based on the joint angle parameter used in the CG character model, a CG image generated by pseudo-drawing an object in the captured image is input. Then, the posture estimation program extracts a silhouette obtained by binarizing the specific area of the object from the input photographed image by the specific area extraction unit, and also extracts the specific area of the object from the input CG image. Extract a valuated silhouette. Then, the posture estimation program performs thinning processing on each of the extracted silhouettes by thinning means, performs expansion processing on each of the thinned silhouettes by expansion processing means, and by distance conversion means, A gradation image is generated by performing distance conversion on each of the expanded silhouettes, and HOG is calculated as a feature amount of each of the gradation images by a gradient feature amount extraction unit. Then, the posture estimation program collates the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image by the collating unit, A joint angle parameter of an object in the captured image is estimated.

According to the present invention, the following excellent effects can be achieved.
According to the first aspect of the present invention, in the posture estimation apparatus, the distance conversion and the HOG feature can be used to improve the silhouette matching without affecting the clothes pattern of the object in the captured image. High-precision collation can be performed without being affected by the upper position. In addition, the posture estimation apparatus can reduce the thickness of an object in a captured image to a constant thickness, similar to the thickness of an object in a CG image, by thinning and expansion processing. Highly accurate collation can be performed without receiving.
According to the invention described in claim 6, the attitude estimation program can achieve the same effect as the attitude estimation apparatus described in claim 1.

  According to the second aspect of the present invention, since the posture estimation apparatus stores a model sequence corresponding to a series of predetermined actions, the CG takes an approximate posture similar to the posture of the object in the captured image. Posture estimation by matching with images can be performed quickly.

  According to the third aspect of the present invention, since the posture estimation device can change the value of the joint angle parameter of the model sequence created in advance, the CG frame image can be finely adjusted to match the captured frame image. .

  According to the fourth aspect of the present invention, the posture estimation apparatus can synchronize the timing of each frame of the model sequence created in advance with each frame of the captured image. Therefore, for example, when the motion of the CG character is set to slow motion or high speed, a natural motion such as a live action can be produced.

  According to the fifth aspect of the present invention, the posture estimation device can finely adjust the CG frame image in which the timing of the operation is matched with the captured frame image to fit the captured frame image. Therefore, the captured moving image can be compared with the CG moving image with high accuracy in terms of time and space, and motion data with high temporal and spatial accuracy can be obtained robustly from the captured moving image. .

It is a block diagram which shows the structure of the attitude | position estimation apparatus which concerns on 1st Embodiment of this invention. It is explanatory drawing of the image processing of the attitude | position estimation apparatus shown in FIG. 1, (a) is a picked-up image, (b) is the image which extracted the person area from the picked-up image, (c) extracted the lower body area from the person area. (D) is an image obtained by thinning the lower body image, (e) is an image subjected to expansion processing, (f) is an image subjected to distance conversion, and (g) is a CG generated corresponding to the photographed image. An image (h) shows an image after distance conversion generated from the CG image by the same process as the captured image. It is a flowchart which shows operation | movement of the attitude | position estimation apparatus shown in FIG. It is a flowchart which shows the outline | summary of the HOG calculation process shown in FIG. It is explanatory drawing of S21 shown in FIG. 4, Comprising: The original image is shown. 5A and 4B are explanatory diagrams of S22, in which FIG. 4A shows a cell region obtained from FIG. 5, and FIG. 4B shows a gradient histogram obtained from FIG. It is explanatory drawing of S23 shown in FIG. 4, Comprising: The mode of the movement of a block is shown. It is a block diagram which shows the structure of the attitude | position estimation apparatus which concerns on 2nd Embodiment of this invention. It is explanatory drawing of the model sequence memory | storage means shown in FIG. It is explanatory drawing of the temporal feature extraction means shown in FIG.

  DESCRIPTION OF EMBODIMENTS Hereinafter, a mode (hereinafter referred to as “embodiment”) for implementing a posture estimation apparatus according to the present invention will be described in detail with reference to the drawings.

(First embodiment)
The posture estimation apparatus 1 shown in FIG. 1 estimates parameters that characterize the posture or motion of a target object by image processing from a single-view still image obtained by capturing the target object to be estimated or a captured image showing a moving image. To do.
In the following description, a captured image obtained by taking a moving image of a person who performs a predetermined action such as “walking” or “kick” as a target object with a single camera is input to the posture estimation apparatus 1 and is frame-by-frame. In the following description, it is assumed that the motion of a person as an object in a captured frame image that is a captured image is estimated. Here, the frame is a frame image, and the sampling frequency in the time direction is not particularly limited. For example, a non-interlace method (for example, 29.97 fps (Frame Per Second)) or two types of frame images are used. An interlace method (for example, 59.94 fps) for reading in the field may be used.

As shown in FIG. 1, the posture estimation apparatus 1 includes a CG generation unit 2, a frame data processing unit 3, a collation unit 4, and an image input unit 5.
The image input means 5 inputs a photographed image and also inputs a CG image generated by pseudo-drawing an object in the photographed image. The captured image and the CG image are subjected to image processing by the frame data processing means 3. The frame number (captured image frame number) of the captured image input to the image input unit 5 is input to the CG generation unit 2, and the CG image generation unit 24 of the CG generation unit 2 generates a CG image that matches the captured image. It is used as information for Note that the image input means 5 may input an image captured from the outside on a storage medium or online to the frame data processing means 3, or an image stored in advance in a storage device inside the posture estimation apparatus 1. It may be read and input to the frame data processing means 3.

  The CG generation unit 2 generates a CG image in which an object in a photographed image input to the image input unit 5 is drawn. The CG generation unit 2 stores a CG character model 21, a parameter change unit 23, and a CG image. A generation unit 24 and a model sequence storage unit 25 that stores the CG data 22 are provided. The storage means for storing the CG character model 21 may be different from the storage means for storing the CG data 22, or the model sequence storage means 25 may be shared.

The CG character model 21 is obtained by modeling an object to be estimated as an articulated object for computer graphics (CG). In this embodiment, since the object to be estimated is a person, the CG character model 21 includes a human body structure model having angle information of joints of the human body as a parameter, and includes CG parts created in advance of the human body.
Here, the human body structure model is not particularly limited, and a joint or the like may be appropriately set according to the motion to be estimated and the required accuracy. For example, when estimating movements such as “walking” and “kicking”, the finger joints are ignored and the joints are classified as shoulder joints, elbow joints, hip joints, knee joints, ankle joints, and the like. A model that can bend each joint within a predetermined angle range with a degree of freedom of 1 to 3 axes according to the part can be used. Here, for example, if a motion such as “walking” is estimated, a 24-dimensional joint angle parameter can be used as described in Non-Patent Document 1.

  The CG data 22 is a joint angle parameter related to an object of a CG image that is generated in a pseudo manner in order to collate with an object in a captured image, and is used to draw a CG image based on the CG character model 21. In FIG. 1, CG data 22 represents a set of joint angle parameters for generating one CG image corresponding to one captured image input to the frame data processing unit 3. Here, the one set of joint angle parameters is, for example, 24 angles (values) that can specify a predetermined joint and the axial direction of the joint when a 24-dimensional joint angle parameter is adopted in the human body structure model. Indicates.

  The parameter changing means 23 changes the value of the CG data (joint angle parameter) 22 read for each frame from the model sequence storage means 25 with respect to the captured frame image within a predetermined range. The parameter changing unit 23 finely adjusts the value of a set of joint angle parameters stored in advance in the model sequence storage unit 25 corresponding to one posture of the CG image. Here, the fine adjustment is, for example, a change of the angle in a range larger than ± 45 °, for example, when focusing on one joint in the movement of a person, for example, can be said to be a relatively large adjustment. The change within the range of within ± 30 °, preferably the change within the range of ± 30 ° is referred to as fine adjustment. For example, the parameter changing unit 23 finely adjusts the joint angle of the CG data (joint angle parameter) 22 by 1 °, for example. Following this processing, the CG image generation means 24 creates a CG frame image based on the value of the joint angle changed and the CG character model 21, and after the image processing before matching and matching, the parameter changing means 23 Repeats the process to finely adjust the joint angle again.

  The CG image generation means 24 is CG data (joint angle parameter) 22 read out for each frame from the model sequence storage means 25 in correspondence with a photographic frame image that is a photographic image input to the image input means 5 for each frame. Based on the joint angle parameter value changed by the parameter changing means 23 and the CG character model 21, a CG frame image is generated as a CG image for each frame. The CG image generation unit 24 receives an input of a captured image frame number that specifies a captured image to be input to the image input unit 5 and uses it as information for generating a CG frame image that matches the captured frame image. The CG image generation unit 24 generates virtual three-dimensional space data based on the CG data, renders a CG object and an alpha plane based on the input joint angle, and outputs the rendered CG object together with the alpha plane as an image input unit. 5 is output. The alpha plane is an image having information for distinguishing an object area (subject area) from a CG frame image and a non-alpha area.

  The model sequence storage means 25 stores the value of CG data (joint angle parameter) 22 created in advance for each frame as a model for the object to be estimated to perform a series of predetermined operations, as a model sequence. For example, it is composed of a general hard disk or memory. Specifically, in this model sequence storage means 25, data (model sequence) in which a frame number and a set of joint angle parameters when a person performs a “walking” motion, or a person “kick” A model corresponding to a basic movement is stored, such as a model sequence for performing an action.

  The drawn CG frame image itself may be stored in the model sequence storage unit 25. In the present embodiment, the pose estimation apparatus 1 is provided with the CG image generation unit 24 to generate the CG image to be input to the specific area extraction unit 31 of the frame data processing unit 3. Is stored in the posture estimation apparatus 1 in advance, the CG image generation means 24 is not essential.

  The frame data processing unit 3 performs image processing on the captured image and the CG image in units of frames, and includes a specific area extracting unit 31, a thinning unit 32, an expansion processing unit 33, and a distance converting unit. 34 and a gradient feature quantity extraction means 35. In the block diagram of FIG. 1, for convenience of explanation, a sign “a” is assigned to each means 31 to 35 for processing a captured image, and a sign “b” is assigned to each means 31 to 35 for processing a CG image. However, it is only necessary to have one means each.

  The specific area extraction unit 31 extracts a silhouette obtained by binarizing a specific area of an object (person) from the input captured image and performs input CG as pre-processing for collating the captured image and the CG image. A silhouette obtained by binarizing a specific area of an object (person) is extracted from an image. As in the present embodiment, if the target is a person, the specific area of the object on the image may be a part of it or the whole body. In order to specify the lower body area as a part of the person area, a threshold range of the position on the image may be determined in advance, for example, “lower half of the input photographed image”. As a technique for binarizing an image and extracting a silhouette, a known technique such as predetermining an object position on the image, its size, or a brightness threshold can be employed. A specific example of image processing will be illustrated and described in the description of operations described later.

The thinning means 32 performs thinning processing on each silhouette extracted by the specific area extraction means 31.
The expansion processing means 33 performs expansion processing on each silhouette thinned by the thinning means 32.
The distance conversion means 34 generates a grayscale image by performing distance conversion on each silhouette expanded by the expansion processing means 33.

Here, the thinning process, the expansion process, and the distance conversion are pre-processes for collating a captured image with a CG image, and can be realized by using a function that is stored in a library in general image processing software. .
In the thinning process, the silhouette of the binary image is converted into a line image having a width of 1 pixel.
In the expansion process, the fine line is widened to a uniform thickness.
The distance conversion is a conversion that gives the shortest distance from each pixel of a binary image having values 0 and 1 to a pixel having a value 0.

  The gradient feature quantity extraction means 35 calculates HOG (Histogram of Oriented Gradient) as the feature quantity of each grayscale image, which is an image after preprocessing for collating the captured image and the CG image. is there. HOG indicates a feature amount obtained by extracting a brightness difference between pixels adjacent in the horizontal direction and the vertical direction as a luminance gradient for a pixel of interest of an image. The HOG calculated here is output to the collation means 4 and used for collation between the captured image and the CG image. As a reference for HOG, “N. Dalal and B. Triggs,“ Histograms of Oriented Gradients for Human Detection, ”IEEE Computer Vision and Pattern Recognition, 886-893, 2005.” is known.

  In the present embodiment, the specific area extracting unit 31 extracts the silhouette by binarizing the captured frame image and the CG frame image, so the thinning unit 32, the expansion processing unit 33, and the distance converting unit 34 described above. The gradient feature amount extraction unit 35 also performs image processing for each frame on the captured frame image and the CG frame image.

  The collating means 4 collates the HOG calculated based on the silhouette of the object in the photographed image with the HOG calculated based on the silhouette of the object in the CG image, so that the joint angle of the object in the photographed image The parameter is estimated. As shown in FIG. 1, the collation unit 4 includes a difference calculation unit 41, a difference data storage unit 42, and a spatial feature determination unit 43.

  The difference calculating unit 41 is generated using the CG data (joint angle parameter) 22 read from the model sequence storage unit 25 or the value of the joint angle parameter changed by the parameter changing unit 23 with respect to the captured frame image. Difference data between each HOG based on the silhouette of the CG image and the HOG based on the silhouette of the captured frame image is calculated. The calculated difference data is stored in the difference data storage means 42.

The difference data storage means 42 stores frame numbers 51, parameters 52, and difference data 53 in association with each other, and is a storage device such as a hard disk.
The frame number 51 is the frame number of the CG data (joint angle parameter) 22 read from the model sequence storage unit 25.
The parameter 52 is the value of the joint angle parameter corresponding to the frame number 51 or the value of the joint angle parameter changed by the parameter changing unit 23 in the frame number 51.
The difference data 53 corresponds to the captured frame image and is HOG difference data calculated based on the silhouette of the CG frame image generated from the parameter 52.

When the frame number of the model sequence is fixed, the spatial feature determination unit 43 determines the value of the joint angle parameter when the difference data is minimum based on the HOG difference data calculated for the captured frame image. Is determined.
The collation means 4 outputs the frame number and the value of the joint angle parameter as the estimation result.

[Operation of posture estimation device]
Next, the operation of the posture estimation apparatus 1 will be described with reference to FIGS. 2 and 3 (refer to FIG. 1 as appropriate). FIG. 2 shows a processing example of the specific area extraction unit 31, the thinning unit 32, the expansion processing unit 33, and the distance conversion unit 34 in the frame data processing unit 3 of the posture estimation apparatus 1. Further, in this example, the description will be made assuming that the motion of the lower body is estimated from a captured image in which a person kicks a ball.

FIG. 3 is a flowchart showing the operation of the posture estimation apparatus shown in FIG.
First, in the posture estimation apparatus 1, a photographed image is input to the frame data processing unit 3 by the image input unit 5 (step S1). Then, in the frame data processing means 3, the photographed image shown in FIG. 2A is input to the specific area extracting means 31a. The specific area extracting means 31a first binarizes the photographed image to extract the silhouette of the person area shown in FIG. 2 (b), and then in this case, the lower half of the silhouette as shown in FIG. 2 (c). An area is extracted as a specific area (step S2). The lower body region can be extracted, for example, by setting a threshold range based on the position of the image in advance, such as “half lower image”.

  The thinning means 32a performs thinning processing on the extracted silhouette as shown in FIG. 2D (step S3), and the expansion processing means 33a applies the thinning silhouette to FIG. Expansion processing is performed as shown in (e) (step S4). Further, the distance converting unit 34a generates a grayscale image from the binary image by performing distance conversion on the expanded silhouette as shown in FIG. 2 (f) (step S5). Then, the gradient feature amount extraction unit 35a calculates the HOG for the grayscale image (see FIG. 2F) based on the captured image (step S6). A specific example of processing of the gradient feature quantity extraction unit 35 will be described later.

  On the other hand, in the example shown in FIG. 2 in the CG data 22 in order to generate a CG image corresponding to the captured image, attention is paid to the movement of the lower body, so the left hip, left knee, left ankle, and right hip The joint angle of the right knee and right ankle is set. Then, the CG image generating means 24 creates a CG frame image as shown in FIG. 2G based on the joint angle setting value of the CG data 22 corresponding to the photographed image and the CG character model 21 ( Step S7).

  Then, as performed on the captured image, the processing by the CG specific area extraction unit 31b (step S8), the processing by the thinning unit 32b (step S9), and the processing by the expansion processing unit 33b (step) S10), the process (step S11) by the distance conversion means 34b is sequentially executed, and a grayscale image subjected to the distance conversion is generated as shown in FIG. 2 (h). Then, the gradient feature amount extraction unit 35b calculates the HOG for the grayscale image (see FIG. 2H) based on the CG image (step S12).

  Next, the collating unit 4 compares the HOG characteristics of the grayscale image in FIG. 2 (f) and the grayscale image in FIG. 2 (h). Here, the difference calculating means 41 of the matching means 4 calculates difference data between the HOG based on the silhouette of the CG image and the HOG based on the silhouette of the captured frame image, and associates them with the frame number 51 and the parameter 52. The difference data 53 is stored in the difference data storage means 42 (step S13).

  And the parameter change means 23 changes the value of a joint angle parameter, when not selecting all the values of predetermined parameters (joint angle), for example, the range of ± 30 ° (Step S14: No). S15). That is, when the parameter changing unit 23 finely adjusts the joint angle of the CG data (joint angle parameter) 22 by 1 °, for example, and returns to step S7, the CG image generating unit 24 changes the value of the joint angle. Then, a CG frame image is created based on the CG character model 21 and a new grayscale image as shown in FIG.

  On the other hand, when all the values of predetermined parameters (joint angles) are selected in step S14 (step S14: Yes), the spatial feature determination unit 43 of the matching unit 4 is stored for the captured frame image. The frame number 51 and the parameter 52 when the difference data is the smallest among the HOG difference data 53 are output as estimation results (step S16). That is, the matching unit 4 outputs the value of the joint angle with the best matching degree as the estimated parameter.

  The above is the process for one captured frame image by the frame data processing means 3 of the posture estimation apparatus 1. Therefore, by performing the processing in steps S1 to S16 in the same manner for all frame images of the captured moving image, it is possible to estimate the motion of the person in the captured moving image.

[HOG calculation processing]
Next, the HOG calculation process shown in steps S6 and S12 of FIG. 3 will be described with reference to FIGS. 4 to 7 (refer to FIG. 1 as appropriate). FIG. 4 is a flowchart showing an outline of the HOG calculation process. Since the HOG calculation process is a known technique disclosed in, for example, the above-mentioned HOG reference documents, Non-Patent Document 1, and the like, an outline thereof will be briefly described below.

  In the HOG calculation process, as a first step, a luminance gradient is calculated from the image (step S21). Then, as a second stage, a gradient direction histogram is calculated for each cell from the calculated luminance gradient (step S22). Then, as a third stage, the feature amount is normalized for each block of the image using the calculated gradient direction histogram (step S23). Since HOG is based on a histogram of luminance gradient, for example, it has a property that it is hardly affected by the position and size of the lower body region of a person. Therefore, the gradient feature quantity extraction unit 35 of the posture estimation apparatus 1 executes each process of steps S21 to S23.

  Hereinafter, the first stage (step S21) to the third stage (step S23) of the HOG calculation process will be sequentially described. Here, as an example of the original image, a case is assumed in which a captured image of a walking person as shown in FIG. 5 is used and the motion is estimated from the original image.

<First stage (step S21)>
In the first stage, a luminance gradient (luminance gradient image) is obtained from the original image. Specifically, the gradient intensity m and gradient direction θ of each pixel (pixel) of the original image are calculated. Here, assuming that the upper left corner in the image is the origin, the horizontal coordinate of the pixel is u, the vertical coordinate of the pixel is v, and the luminance value at the pixel (u, v) is I (u, v), The gradient intensity m (u, v) at the pixel (u, v) is expressed by the following equation (1). Further, the gradient direction θ (u, v) in the pixel (u, v) is expressed by the following equation (2).

<Second Stage (Step S22)>
In the second stage, a gradient direction histogram is calculated using the luminance gradient θ (luminance gradient image). For this purpose, as shown in FIG. 6A, the luminance gradient image is divided into a plurality of cells 101 in a matrix. Here, in the image example shown in FIG. 6A, 25 pixels of 5 × 5 pixels are defined as one cell, and the luminance gradient image is divided into 72 cells 101 of 6 × 12 pixels. In the image example shown in FIG. 6A, in the luminance gradient image, the outline of the person is indicated by a black thin line and all other areas are indicated by white. However, color display according to the angle of the luminance gradient θ is performed. Then, all the areas including the outline line are displayed in color.

  In addition, the 25 arrows shown for each pixel in the cell 101 indicate the luminance gradient θ in the pixel, and the magnitude indicates the gradient intensity m. The luminance gradient θ is actually calculated as a value from −180 ° to + 180 °. However, in order to ignore only the direction on a straight line and consider only the direction, the negative value is shifted by adding 180 °. In the following description, it is assumed that the luminance gradient has a value of 0 to 180 °. In this case, 0 ° and 180 ° mean the same thing. The same symbol (θ) is used for the luminance gradient after shift conversion.

Here, the number of divisions in the range of 0 to 180 ° of the luminance gradient θ is assumed to be 9. That is, the luminance gradient θ is divided into the following sections (1) to (9). In each section, for example, the lower limit value is not included, and the upper limit value is included.
(1) 0-20 °
(2) 20-40 °
(3) 40-40 °
(4) 60-40 °
(5) 80-100 °
(6) 100-120 °
(7) 120-140 °
(8) 140-160 °
(9) 160-180 °

  An example of the gradient direction histogram obtained for each cell, that is, with 25 pixels as one unit is shown in FIG. In this example, it can be seen that (5) the luminance gradient in the section of 80 to 100 ° is the largest.

Hereinafter, the position coordinates of the cell 101 in the luminance gradient image shown in FIG. 6A are indicated by (i, j) (1 ≦ i ≦ 6, 1 ≦ j ≦ 14). Further, in the cell (i, j), the magnitudes in the respective directions obtained by dividing the gradient direction into nine are represented by f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , f 7 , f 8 , f 9. And In this case, the feature vector F ij of one cell (i, j) is expressed in 9 dimensions as in Expression (3).

<Third stage (step S23)>
In the third stage, the feature amount is normalized for each block of the image from the calculated gradient direction histogram. For this purpose, as shown in FIG. 7, a block 102 configured by selecting a plurality of cells 101 at a time in a luminance gradient image divided into cells 101 is assumed. In this block, some areas may overlap each other.

In the image example shown in FIG. 7, 72 cells 101 of 6 × 12 are displayed, and nine cells 101 of 3 × 3 are selected as one block 102. In this case, if the position of the upper left corner cell in one block is (i, j) using the above-described equation (3), the feature vector V k of one block at a certain position (identifier k) is It is expressed in 81 dimensions as the following equation (4).

  As described above, some areas of the block may overlap each other. Here, in the image example of FIG. 7, for example, a block surrounded by a thick line in which nine cells in the range from the first column to the third column and from the second row to the fourth row are selected (assuming this is b) = 1 block). Even when the entire block is shifted to the upper side of the image by one cell, another block (similarly referred to as a block of b = 2; reference numerals are omitted in the figure, the same applies hereinafter) is formed. From this state, a block cannot be selected any further upward. On the other hand, if the entire block is shifted by one cell to the right side of the image from this state, another block (b = 3) is formed. Similarly, another block (b = 4) is formed by shifting to the right by one cell. Furthermore, another block (b = 5) is formed even when the cell is shifted to the right by one cell, and no further block can be selected on the right.

  A state in which five blocks (b = 1) to blocks (b = 5) selected by shifting by one cell as described above are superimposed is schematically shown on the upper side of FIG. Each block includes nine cells. When the block is shifted by one cell, cell overlap occurs. In FIG. 7, the pattern is displayed larger and darker as the number of overlapping cells increases. This pattern schematically shows nine sections (9 directions) of the gradient direction θ based on the histogram for each cell and the size thereof.

  In the image example of FIG. 7, when the block is shifted, 40 blocks of 4 × 10 can be selected during the process. All of these are identified by an identifier k (k = 1 to 40). In FIG. 7, the scale of “1, 2, 3, 4” in the horizontal direction indicates the number of blocks that can be selected by shifting in the horizontal direction of the image with the upper left block of the image as the origin, and similarly in the vertical direction. The scale of “1, 4, 7, 10” indicates the number of blocks that can be selected by shifting in the vertical direction of the image. In this example, the above-described equation (3) and the above-described equation (4) are applied to 40 blocks. There are (3 × 3) cells in the block. At this time, the magnitude v normalized by the magnitude of the feature vector V of the block, where f is the gradient direction histogram of the cell, is expressed by the following equation (5). The content of f is the same dimension as the value (= 3240) of the calculation result of (gradient direction “= 9”) × (number of cells in block “= 9”) × (number of blocks “= 40”). It becomes.

Thereby, the distance between v (v m ) obtained from the HOG calculated based on the silhouette of the photographed image and v (v cg ) obtained from the HOG calculated based on the silhouette of the CG image is It can be evaluated that the smaller the degree of similarity, the greater the degree of similarity.

Here, the distance between v (v m ) and v (v cg ) is the difference between the histograms. This difference (difference data) can be, for example, a cumulative sum of positive values obtained by processing the difference for each histogram class (gradient of angle in the gradient direction). Further, as a method for calculating the cumulative sum of positive values obtained by processing the difference for each class, for example, there are a square sum of the magnitude difference for each class, an absolute value sum of the differences, and the like.

  According to the first embodiment, highly accurate matching can be performed without being affected by the position on the screen by distance conversion and the HOG feature while using the robustness of silhouette matching without being influenced by the pattern of clothes. . In addition, the limbs have a constant thickness due to the thinning and expansion processing, and are not affected by the thickness. Therefore, according to the first embodiment, motion data with high spatial accuracy can be obtained robustly from the captured moving image.

(Second Embodiment)
The posture estimation apparatus 1B shown in FIG. 8 estimates the posture or motion in consideration of not only the spatial feature of the motion of the object (person) to be estimated but also the temporal feature when collating the captured image with the CG image. To do. As shown in FIG. 8, the posture estimation apparatus 1B includes a CG generation unit 2, a frame data processing unit 3, and a collation unit 4B. In this posture estimation device 1B, the same components as those in the posture estimation device 1 shown in FIG.

  In the model sequence storage unit 25, it is assumed that CG data (joint angle parameter) 22 is generated in advance so that a CG moving image similar to the sequence of the captured moving image can be generated. For example, as shown in FIG. 9 (a), when a sequence of captured moving images related to a motion from a posture of a person kicking a ball to kicking while turning around and changing the direction of the body is input to the posture estimation device 1B A similar model sequence is prepared in advance and created as a CG moving image. An example of the CG moving image sequence created from the model sequence is shown in FIG. The frame sampling cycle and the number of frames may be the same or different, but are preferably the same.

In the matching means 4B shown in FIG. 8, the method for posture estimation is roughly divided into two stages. The first stage is a temporal process for the entire time series frame. Here, each frame of the captured moving image in which a predetermined motion such as “kick” is displayed is collated with a frame in the model sequence that can display the same motion as that displayed in the captured moving image. To extract the most similar frame. With this first stage, a temporal frame correspondence between the captured moving image and the model sequence is obtained.
The second stage is spatial processing for each frame. Here, with respect to the frame of the model sequence extracted in the first step, the joint newly created by adjusting the joint angle parameter and the corresponding captured moving image frame are repeatedly collated to obtain a joint closer to the real posture. Find the angle.

For this reason, as shown in FIG. 8, the matching unit 4 </ b> B includes a difference calculation unit 41, a difference data storage unit 42, a spatial feature determination unit 43, and a temporal feature extraction unit 44.
The temporal feature extraction unit 44 generates CG using the value of the CG data (joint angle parameter) 22 read from the model sequence storage unit 25 when the frame number of the model sequence is changed with respect to the captured frame image. Based on the HOG difference data for the image, the frame number of the model sequence when the difference data is minimized is extracted. When the frames arranged in time series are continuously observed, it can be seen that the posture of the object continuously changes. This has the same meaning as the temporal change of operation. The temporal feature extraction unit 44 estimates the posture of each frame by collating a model frame created in advance with a captured moving image frame. A continuous change in posture at this time is obtained as a temporal feature of the motion. Then, when the frame number extracted by the temporal feature extraction unit 44 is fixed, the spatial feature determination unit 43 applies to the captured frame image when the parameter angle change unit 23 changes the value of the joint angle parameter. Based on the HOG difference data calculated as described above, the value of the joint angle parameter when the difference data is minimized is specified.

  The operation of the posture estimation apparatus 1B is performed in the second stage after performing the same operations in the first stage by replacing the processes in steps S14 to S16 shown in FIG. 3 to extract temporal characteristics of the movement. Since it is the same as that of the first embodiment except that the process for determining the spatial feature of the operation is performed as in the posture estimation apparatus 1 of FIG.

  An example of the processing result of the first stage of the operation of the posture estimation device 1B is shown in FIG. An example of this is the result of estimation of the lower body motion in the human motion shown in FIG. In the graph of FIG. 10, the horizontal axis indicates the frame number of the captured moving image, and the vertical axis indicates the frame number of the model sequence. The temporal feature extraction means 44 is a CG that minimizes the difference data calculated from the HOG based on the silhouette of the captured frame image and the silhouette of the same CG frame image when focusing on a frame number of the captured moving image. The frame number of the frame image is obtained. In the example shown in FIG. 10, when the frame number of the captured moving image is “1”, as a result of searching for all the frame numbers in the model sequence, it can be seen that the most similar frame number in the model sequence is “0”. . The same applies hereinafter.

  In the “kick” operation as in this example, the direction of movement (movement) is one direction. For example, when the frame number of the captured moving image is “8”, it is necessary to search for all frame numbers in the model sequence. If the result already determined in the previous search is used, the frame number “5” of the model sequence and the remaining frames after that may be compared.

  Further, it is preferable to use DP (Dynamic Programming) matching in order to prevent the temporal feature extraction unit 44 from erroneously extracting model frames that are greatly deviated in time series.

  Note that the temporal feature extraction means 44 does not necessarily have to create a graph as shown in FIG. 10 as long as the collation search results are held in a table format. However, when the graph as shown in FIG. 10 is created, when the inclination is small, the actual person's movement is slower than the model's movement, and conversely, when the inclination is large, It is preferable to create a graph because it can be understood that the movement of the person is fast or the movement characteristics of each person according to the time change of the actual movement speed of the person can be understood.

  As described above, when the frame number extracted by the temporal feature extraction unit 44 is fixed in the first stage of collation, the spatial feature determination unit 43 changes the value of the joint angle parameter by the parameter change unit 23. However, the value of the joint angle parameter when the difference data is minimum is specified. As a result of the experiment, in the example shown in FIG. 10, in the posture indicated by a certain frame number, as a result of fine adjustment of the joint angle within a range of ± 30 °, the CG frame image is adjusted to the posture of the photographing frame image. I was able to.

  According to the second embodiment, the motion feature of the target object to be estimated is divided into two stages, a time change feature (temporal feature) and a posture feature (spatial feature). By performing stepwise matching, it is possible to provide a motion capture method with high reproducibility of motion features. That is, according to the second embodiment, motion data with high temporal and spatial accuracy can be obtained robustly from a captured moving image.

  As mentioned above, although embodiment of this invention was described, this invention is not limited to each embodiment. For example, the posture estimation apparatus 1B according to the second embodiment includes both the spatial feature determination unit 43 and the temporal feature extraction unit 44 in the matching unit 4B. Of these, the temporal feature extraction is performed. Only the means 44 may be provided. That is, it is good also as performing only the 1st step among the 1st step and the 2nd step which posture estimation device 1B concerning a 2nd embodiment performs for collation. According to the posture estimation apparatus configured as described above, the timing of each frame of the model sequence created in advance and each frame of the photographed image can be synchronized. When this is done, you can produce natural movements like live-action.

  In each embodiment, the CG generation unit 2 includes the parameter changing unit 23. However, in the present invention, the parameter changing unit 23 may be provided as necessary. For example, the posture estimation according to the second embodiment is performed. In the case where only the first stage is performed among the first stage and the second stage, which is performed by the device 1B for collation, it may be excluded.

  In each embodiment, the CG generation unit 2 includes the model sequence storage unit 25. However, in the present invention, the model sequence storage unit 25 may be provided as necessary. For example, the model sequence storage unit 25 is input as a captured image. If there are only one image or several images, it may be excluded.

  In addition, when the image input as a captured image is a moving image, if there is no model sequence for the person to be estimated to perform a series of predetermined operations, for example, for comparison with a captured image of a person who is walking When creating a CG image from a human body model, the joint angle parameter value is changed comprehensively over all of several tens of joints from an upright posture along the body axis, and one CG image based on each is obtained. It is necessary to verify and match one. For such a problem, since the posture estimation devices 1 and 1B store a model sequence corresponding to a series of predetermined operations, a CG image that has a posture that approximates the posture of an object in the captured image is obtained. It can be easily obtained manually or automatically, and posture estimation by matching can be performed quickly.

Further, for example, the action performed by the person who is the estimation target is not limited to the “kicking” action. Further, the physique of the person to be estimated is not limited to that shown in the figure.
Furthermore, the estimation target part is not limited to a person, and can perform various operations such as posture change. If the operation can be modeled, for example, in addition to animals, jointed dolls, robots, and moving objects Artificial objects such as various machines may be used.

  Further, the posture estimation apparatuses 1 and 1B can be realized by operating a general computer by a program that functions as each of the above-described units. This program can be provided via a communication line, or can be written on a recording medium such as a CD-ROM and distributed.

1, 1B Posture estimation device 2 CG generation means 21 CG character model 22 CG data (joint angle parameter)
23 Parameter changing unit 24 CG image generating unit 25 Model sequence storage unit 3 Frame data processing unit 31a, 31b Specific area extracting unit 32a, 32b Thinning unit 33a, 33b Expansion processing unit 34a, 34b Distance conversion unit 35a, 35b Gradient feature amount Extraction means 4 Collation means 4b Collation means 41 Difference calculation means 42 Difference data storage means 43 Spatial feature determination means 44 Temporal feature extraction means 5 Image input means 51 Frame number 52 Parameter 53 Difference data 101 Cell 102 Block

Claims (6)

  1. A posture estimation device that estimates, by image processing, parameters that characterize the posture or movement of the target object from a single-point still image or moving image that shows a moving image of the target object to be estimated,
    The photographed image is input, and an object in the photographed image is modeled for computer graphics (CG) as an articulated object and based on a joint angle parameter used in the CG character model. Image input means for inputting a CG image generated by pseudo-drawing the object of
    A specific area extracting means for extracting a binarized silhouette of the specific area of the object from the input captured image, and extracting a binarized silhouette of the specific area of the object from the input CG image;
    Thinning means for thinning each extracted silhouette;
    Expansion processing means for performing expansion processing on each of the thinned silhouettes;
    Distance conversion means for generating a grayscale image by performing distance conversion on each expanded silhouette;
    Gradient feature amount extraction means for calculating HOG (Histogram of Oriented Gradient) as the feature amount of each grayscale image;
    By comparing the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image, the joint angle parameter of the object in the captured image is determined. A matching means to estimate;
    A posture estimation apparatus comprising:
  2. In order to generate the CG image to be input to the image input means,
    A model sequence storage unit that stores, as a model sequence, a value of a joint angle parameter created in advance for each frame as a model for the object to be estimated to perform a series of predetermined operations;
    Based on the value of the joint angle parameter read for each frame from the model sequence storage unit in correspondence with the captured frame image that is a captured image input to the image input unit for each frame, and the CG character model, CG image generation means for generating a CG frame image as a CG image for each frame,
    The specific area extraction unit extracts a silhouette obtained by binarizing the specific area of the object from the shooting frame image, and extracts a silhouette obtained by binarizing the specific area of the object from the CG frame image,
    2. The thinning unit, the expansion processing unit, the distance conversion unit, and the gradient feature amount extraction unit perform image processing for each frame of the captured frame image and the CG frame image. Posture estimation device.
  3. Parameter change means for changing the value of the joint angle parameter read for each frame from the model sequence storage means for the captured frame image within a predetermined range;
    The CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model,
    The verification means includes
    Each HOG based on the silhouette of a CG image generated using the joint angle parameter read from the model sequence storage unit or the joint angle parameter value changed by the parameter changing unit with respect to the captured frame image; Difference calculation means for calculating difference data from the HOG based on the silhouette of the captured frame image,
    Spatial determination of the value of the joint angle parameter when the difference data is minimized based on the difference data of the HOG calculated for the captured frame image when the frame number of the model sequence is fixed And a feature determination means,
    The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.
  4. The verification means includes
    The difference between each HOG based on the silhouette of the CG image generated using the value of the joint angle parameter read from the model sequence storage means for the captured frame image and the HOG based on the silhouette of the captured frame image A difference calculating means for calculating each data,
    Based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter read from the model sequence storage means when the frame number of the model sequence is changed with respect to the captured frame image. A temporal feature extraction means for extracting the frame number of the model sequence when the difference data is minimized,
    The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.
  5. Parameter change means for changing the value of the joint angle parameter read for each frame from the model sequence storage means for the captured frame image within a predetermined range;
    The CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model,
    The verification means includes
    Each HOG based on a silhouette of a CG image generated using a joint angle parameter read from the model sequence storage unit or a value of a joint angle parameter changed by the parameter change unit with respect to the captured frame image; Difference calculation means for calculating difference data from the HOG based on the silhouette of the captured frame image,
    Based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter read from the model sequence storage means when the frame number of the model sequence is changed with respect to the captured frame image. , Temporal feature extraction means for extracting the frame number of the model sequence when the difference data is minimum;
    Based on the HOG difference data calculated for the captured frame image when the extracted frame number is fixed and the value of the joint angle parameter is changed by the parameter changing means, the difference data Spatial feature determination means for specifying the value of the joint angle parameter when
    The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.
  6. In order to estimate a parameter characterizing the posture or movement of the target object by image processing from an object shown in a captured image showing a single-point still image or moving image of the target object to be estimated,
    The photographed image is input, and an object in the photographed image is selected based on a CG character model modeled for computer graphics using an object in the photographed image as an articulated object and a joint angle parameter used in the CG character model. Image input means for inputting a CG image generated by pseudo-drawing;
    A specific area extracting means for extracting a binarized silhouette of the specific area of the object from the input captured image and extracting a binarized silhouette of the specific area of the object from the input CG image;
    Thinning means for thinning each of the extracted silhouettes;
    Expansion processing means for performing expansion processing on the thinned silhouettes;
    Distance conversion means for generating a grayscale image by performing distance conversion on each expanded silhouette;
    Gradient feature amount extraction means for calculating HOG as the feature amount of each grayscale image,
    By comparing the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image, the joint angle parameter of the object in the captured image is determined. Matching means to estimate,
    Posture estimation program to function as
JP2010260468A 2010-11-22 2010-11-22 Posture estimation apparatus and posture estimation program Expired - Fee Related JP5503510B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010260468A JP5503510B2 (en) 2010-11-22 2010-11-22 Posture estimation apparatus and posture estimation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010260468A JP5503510B2 (en) 2010-11-22 2010-11-22 Posture estimation apparatus and posture estimation program

Publications (2)

Publication Number Publication Date
JP2012113438A JP2012113438A (en) 2012-06-14
JP5503510B2 true JP5503510B2 (en) 2014-05-28

Family

ID=46497605

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010260468A Expired - Fee Related JP5503510B2 (en) 2010-11-22 2010-11-22 Posture estimation apparatus and posture estimation program

Country Status (1)

Country Link
JP (1) JP5503510B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10283005B2 (en) * 2013-10-24 2019-05-07 Huawei Device Co., Ltd. Image display method and apparatus
JP2017162391A (en) 2016-03-11 2017-09-14 東芝メモリ株式会社 Image processing method and image processing program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009288917A (en) * 2008-05-28 2009-12-10 Sony Corp Information processor, information processing method and program

Also Published As

Publication number Publication date
JP2012113438A (en) 2012-06-14

Similar Documents

Publication Publication Date Title
Zollhöfer et al. Automatic reconstruction of personalized avatars from 3D face scans
JP5244951B2 (en) Apparatus and system for image processing based on 3D spatial dimensions
ES2693028T3 (en) System and method for deriving accurate body size measurements from a sequence of 2D images
JP4653606B2 (en) Image recognition apparatus, method and program
Simon et al. Hand keypoint detection in single images using multiview bootstrapping
US9058514B2 (en) Apparatus and method for estimating joint structure of human body
Balan et al. Detailed human shape and pose from images
JP5290864B2 (en) Position and orientation estimation apparatus and method
Corazza et al. Markerless motion capture through visual hull, articulated icp and subject specific model generation
EP2751777B1 (en) Method for estimating a camera motion and for determining a three-dimensional model of a real environment
US9330307B2 (en) Learning based estimation of hand and finger pose
US9355305B2 (en) Posture estimation device and posture estimation method
JP3954211B2 (en) Method and apparatus for restoring shape and pattern in 3D scene
KR20110013200A (en) Identifying method of human attitude and apparatus of the same
JP2012518236A (en) Method and system for gesture recognition
Dornaika et al. Fast and reliable active appearance model search for 3-D face tracking
JP4349367B2 (en) Estimation system, estimation method, and estimation program for estimating the position and orientation of an object
US20030123713A1 (en) Face recognition system and method
Richardson et al. Learning detailed face reconstruction from a single image
US8824781B2 (en) Learning-based pose estimation from depth maps
JP2010079453A (en) Apparatus for generating position-and-orientation estimation model, position and orientation calculation device, image processor, and method for the same
JP2011521357A (en) System, method and apparatus for motion capture using video images
Lv et al. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection
DE69910757T2 (en) Wavelet-based facial motion detection for avatar animation
Lee et al. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20130624

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20140206

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20140218

R150 Certificate of patent or registration of utility model

Ref document number: 5503510

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20140314

LAPS Cancellation because of no payment of annual fees