WO2021235440A1

WO2021235440A1 - Method and device for acquiring movement feature amount using skin information

Info

Publication number: WO2021235440A1
Application number: PCT/JP2021/018809
Authority: WO
Inventors: 仁彦中村; 洋介池上; 添威張; 稔尚赤瀬
Original assignee: 国立大学法人東京大学
Priority date: 2020-05-22
Filing date: 2021-05-18
Publication date: 2021-11-25
Also published as: JP2021184215A

Abstract

In the present invention, a movement feature amount is extracted from the dynamic state of skin polygons that change in shape during movement.　In the present invention, the shape of an object is identified by skin polygons, the vertices of the skin polygons each have coordinates that are dependent on the pose of the object, the shape of the object is represented by a plurality of representative regions selected from the skin polygons, time series data of the skin polygons during the movement of the object is prepared, the plurality of representative regions are used in a plurality of frames to calculate shape representative values that represent the shape of the object in the respective frames, and the time series data of the shape representative values is used to acquire, as a movement feature amount, a value that represents a change in the spatial relation between the representative regions that occurs along with object movement.

Description

Method and device for acquiring motor features using skin information

The present invention relates to a device and a method for acquiring a motor feature amount using skin information. The exercise feature amount acquired by the present invention can be used as an informationization of individuality in exercise and a personal authentication technique.

The posture of the target can be represented by a skeletal model, and the posture of the target is determined from each joint angle and position. By using motion capture, it is possible to acquire the motion data of the target from the time series data of the posture of the target.

The shape of the object can be represented by a polygon model or a polygon mesh. In the polygon model, the body surface of the target is composed of a set of a large number of polygons (typically triangles), and the shape of the target is determined from the coordinates of the vertices of all the polygons. The polygon model can be obtained by acquiring the three-dimensional coordinates of all the tops of the polygons on the target body surface using, for example, a 3D body scanner.

The shape of the target (coordinates of the vertices of the polygon) changes depending on the posture of the target. In the field of animation, skinning is known as the task of associating a 3DCG model with a skeleton. Skinning determines how each vertex of the model's polygons follows the skeleton, and weight adjustments are made to adjust the effect of the skeleton on each vertex. The same applies to the human body model, and the calculation of the three-dimensional coordinates of the surface (skin polygon) of the human body using the posture information of the human body is described in, for example, Patent Documents 1 to 3 and Non-Patent Document 1.

The progress of motion capture technology that analyzes motion by three-dimensionally reconstructing motion based on one or more camera images is remarkable. For example, video motion capture in which two-dimensional joint positions are estimated from a camera image by deep learning and integrated to reconstruct the motion in three dimensions has been realized (Patent Document 4, Non-Patent Document 2). By using video motion capture, it is possible to acquire three-dimensional information of the target skeleton from camera images without interfering with the target person. For example, OpenPose (Non-Patent Document 3) can be used for estimating the two-dimensional joint position by deep learning from the camera image.

The video motion capture system estimates the 3D information of the skeleton based on the joint position, but in addition to the skeleton, a technology to estimate the 3D body surface information (shape information) only from the RGB camera has been developed. There is. As a technique for estimating shape information only from an RGB camera, a technique for reconstructing the detailed shape of clothes (Non-Patent Document 4) and a technique for constructing a skin model without wearing clothes (Non-Patent Document 5). , Non-Patent Document 6) is known.

In this way, in recent years, technology has been developed that makes it easier to acquire posture information and shape information using camera images. At this time, not only the movement of the skeleton but also the shape information represented by the skin is also developed. Rich motion information of a person can be expressed by three-dimensional reconstruction. Based on this exercise information, we will consider computerizing individuality in general exercise and establishing personal authentication technology using it.

Regarding personal authentication technology, biometric authentication is attracting attention as an authentication method that solves the drawbacks of conventional password and password encryption. Many biometrics such as face recognition, fingerprint recognition, vein recognition, and iris recognition require the acquisition of biometric information at a short distance after obtaining consent. On the other hand, gait authentication, which identifies an individual by walking, is attracting attention as a method for acquiring biometric information from a long distance without the cooperation of the subject. Many of the gait authentication uses silhouette images of walking and joint angles as feature quantities, and the amount of information used is still limited, and it is premised on gait as a specific exercise. The accuracy of the cage is not yet satisfactory. On the other hand, by using the information obtained from the skin polygon having a posture, personal authentication that is not necessarily limited to walking becomes possible.

In addition, research on shape search of 3D Human Model is also being conducted, and Non-Patent Document 7 can be referred to. Further, HKS (Heat Kernel Signature) and WKS (Wave Kernel Signature) have been proposed as shape descriptors. HKS (Heat Kernel Signature) is described in Non-Patent Document 8, and WKS is described in Patent Document 5 and Non-Patent Document 9. These shape searches and shape descriptors do not focus on changes in shape during exercise.

WO2016 / 207311A1 (US10,395,511B2) US2020 / 0058137A1) WO2019 / 207176A1 JP-A-2020-042476 EP2530623A1

The present invention aims to extract motion features from the dynamics of skin polygons that deform during exercise.

The technical means adopted in the present invention is
The shape of the object is specified by the skin polygon,
Each vertex of the skin polygon has coordinates depending on the posture of the target.
The shape of the object is represented by one or more representative regions selected from the skin polygons, and each representative region is a group of vertices consisting of a plurality of vertices.
Prepare time-series data of the skin polygons during the movement of the target,
In a plurality of frames, using the one or more representative regions, a shape representative value representing the target shape of each frame is calculated.
Using the time-series data of the shape representative value, a value representing the temporal change of the one or more representative regions accompanying the movement of the target is acquired as a motion feature amount.
It is a method of acquiring a motor feature amount.

In one embodiment, the shape of the object is represented by a plurality of representative regions.
The temporal change of the plurality of representative regions includes a temporal change of the spatial relationship between the representative regions with the movement of the subject.
In one embodiment, the spatial relationship between the representative regions is defined by a function (shape representative value or shape descriptor) of the vertex coordinates of any two representative regions.
In one embodiment, the spatial relationship between the representative regions is defined by the distance between vertices (shape representative value or shape descriptor) between any two representative regions.

In one embodiment, the representative region is a ring of annular vertices or rings arranged in a ring.
The annular vertex group (ring) is not limited to the annular vertex group, and may be, for example, a group of vertices arranged in a substantially square shape.
In one embodiment, the annular apex group is a group of vertices arranged along the perimeter of a part of the human body.
In one embodiment, the annular apex group is placed on the surface of a part of the human body.
In one embodiment, the annular apex group is
For all vertices, get the HK value using HKS (Heat Kernel Signature) and
All vertices are divided into two groups according to the threshold,
Obtain a ring of vertices consisting of a set of vertices arranged in a circle at the boundary between the two groups.
By changing the threshold value, a plurality of annular vertex groups are acquired.
In one embodiment, the representative region is a planar apex group consisting of a plurality of vertices arranged so as to form a surface (the annular apex group is, so to speak, a linear apex group).
Further, the representative region may represent the shape of the target portion.
Further, the entire skin polygon may be selected and used as a representative area.

The posture of the subject is specified by the skeletal model,
We have obtained a function that associates each vertex of the skin polygon model with the skeleton.
The coordinates (initial coordinates) of each vertex of the target skin polygon model are obtained depending on the specific posture (initial posture).
The coordinates of each vertex in an arbitrary posture can be obtained from the initial coordinates, the initial posture, and the arbitrary posture by using the function.

In one aspect, the posture of the subject is acquired by markerless motion capture using one or more images.

Other technical means adopted by the present invention are
The shape of the object is specified by the skin polygon,
Each vertex of the skin polygon model has coordinates depending on the posture of the target.
For all vertices, HKS values are obtained using HKS (Heat Kernel Signature).
All vertices are divided into two groups according to the threshold,
An annular vertex group (ring) consisting of a set of vertices arranged in an annular shape located at the boundary between the two groups is acquired as a shape representative region.
This is a method for acquiring shape representative information.
In one embodiment, the shape of the object is represented by a group of annular vertices.
By changing the threshold value, the plurality of annular vertices are determined.

In one embodiment, a plurality of annular vertex groups representing the target shape are represented by a function (shape representative value or shape descriptor) of the vertex coordinates of the two annular vertex groups.
In one embodiment, a plurality of annular vertex groups representing the target shape are represented as a distance between the vertices (shape representative value or shape descriptor) between the two annular vertex groups.

In one embodiment, a plurality of annular apex groups representing the shape of the target are represented by the area of the region surrounded by each annular apex group and / or the perimeter.

The method for acquiring the motion feature amount and the method for acquiring the shape representative information are executed by a computer, and the present invention is also provided as a computer program for causing the computer to execute these methods.

Other technical means adopted by the present invention are
It is equipped with a storage unit, a shape representative value calculation unit, and a motion feature amount calculation unit.
The storage unit stores time-series data of skin polygons that specify the shape of the target during exercise, and each vertex of the skin polygon has a vertex ID and coordinates depending on the posture of the target.
The shape of the object is represented by one or more representative regions selected from the skin polygons, and each representative region is a group of vertices specified by the vertex IDs and coordinates of the plurality of vertices.
The shape representative value calculation unit calculates a shape representative value that represents the shape of the target depending on the posture by using the one or a plurality of representative regions.
The motion feature amount calculation unit uses the time-series data of the shape representative values acquired in a plurality of frames, and uses the motion features to represent the temporal change of the one or more representative regions with the motion of the target. Calculated as a quantity,
It is a device for acquiring motion features.

In one embodiment, the shape of the object is represented by a plurality of representative regions.
The shape representative value calculation unit calculates a shape representative value that represents the shape of the target depending on the posture from the spatial relationship between the representative regions.
The motion feature calculation unit uses time-series data of the shape representative values acquired in a plurality of frames to calculate a value representing the change in the spatial relationship between the representative regions due to the motion of the target as the motion feature. do.
In one embodiment, the shape representative value is defined by a function of vertex coordinates of any two representative regions.
In one embodiment, the shape representative value is defined by the distance between vertices between any two representative regions.

According to the present invention, it is possible to extract motion features from the dynamics of skin polygons that deform during exercise.

It is a schematic diagram of the feature amount acquisition apparatus using the shape information which concerns on this embodiment. It is a figure which shows the flow of the acquisition of the feature amount of an object using the shape information which concerns on this embodiment. It is a figure explaining the skin polygon model which concerns on this embodiment. It is a figure explaining the acquisition of the skin polygon which concerns on this embodiment. It is a figure explaining the acquisition of the shape representative value which concerns on this embodiment. It is a flow chart which shows the acquisition of the shape representative value using HKS. It is a flow chart which shows the acquisition of the feature amount which reflected the change of the posture. It is a conceptual diagram which shows the change of the spatial relationship of two rings with the change of the posture of an object. It is a conceptual diagram showing that the spatial relationship between two rings is defined by the distance between vertices. It is a figure explaining the acquisition of the shape representative value which concerns on other embodiment. It is a flow chart which shows the acquisition of the feature quantity which is less dependent on the posture. All polygon vertices of the human body model are color-coded according to the HKS value. It is actually a color image. It is a figure which shows the method of detecting the vertex set S located at the boundary of two groups A and B. Let the set S be the vertices that make up a triangle in which the vertices belonging to group A and the vertices belonging to group B exist at the same time on the three vertices of the mesh. The coordinate set forming the ring extracted from the vertex set S3 by clustering and the plane obtained by the principal component analysis are shown. It is a figure which shows the set of vertices located at the boundary detected corresponding to S3, S8, S38, S70, S80 in order from the left. In the skin model of different body shape and posture, _{all 17 boundaries of the set S 3} , S ₈ , S ₃₈ , S ₇₀ , S ₈₀ are displayed together in one skin model. It is a figure which shows 10 postures of a SHREC model (Non-Patent Document 7). The set of vertex sets of the boundary consisting of S5, S15, S30, and S80 is shown. It is a figure which applies the joint position acquired by video motion capture to the skin model acquired by HMR. The skeletal model used in video motion capture is shown, and the numbers represent joints. The skeletal model used in video motion capture is shown, and the numbers represent bones. It is a figure which shows the skin model acquired by the method shown in FIG. The walking motion on the target treadmill is shown every 20 frames (0.3 seconds). It is a conceptual diagram which illustrates a plurality of shape representative regions provided in each part part of a human body. It is a conceptual diagram which illustrates a plurality of shape representative regions provided on the back of a human body. It is a conceptual diagram which illustrates one shape representative area provided on the back of a human body.

[A] Outline of the target feature amount acquisition system using skin information [A-1] Outline of the system As shown in FIG. 1, the feature amount acquisition system using the shape information according to the present embodiment is a target moving image. From one or more video cameras to acquire, and one or more computers to calculate and output the amount of motion features of the target by inputting image information and executing a predetermined calculation. It is composed. More specifically, the feature amount acquisition system acquires the target shape information (skin polygon) using the image or image and the posture information, and the posture information acquisition unit that acquires the target posture information from the image constituting the moving image. It is provided with a shape information acquisition unit for calculating a shape representative value, a shape representative value calculation unit for calculating a shape representative value using the shape information, and a calculation unit for calculating a feature amount using the shape representative value. The posture information acquisition unit, shape information acquisition unit, shape representative value calculation unit, and feature amount calculation unit are realized by a computer processor, and the attitude information, shape information (skin polygon), shape representative value, and feature amount are stored in the computer memory ( It is stored in the storage unit). Further, although not shown in FIG. 1, a computer storage unit stores a skeleton model, a skin polygon model, a function (skinning function, etc.) that associates polygon vertex coordinates with the skeleton model.

[A-2] Posture acquisition The posture of the target is specified by the joint position of the target skeletal model. In this embodiment, posture information is acquired by adopting video motion capture technology. Video motion capture is a method of synchronously photographing the motion of a target human using a plurality of cameras and performing three-dimensional reconstruction of the motion (Patent Document 4, Non-Patent Document 2). The video motion capture system estimates the joint position by processing the images from multiple RGB cameras arranged so as to surround the object with OpenPose (Non-Patent Document 3), and the estimated 3D joint position is a skeletal model. The accuracy of 3D joint estimation is improved by performing inverse dynamics calculation using. For details of the video motion capture system, Patent Document 4 and Non-Patent Document 2 can be referred to.

As shown in FIG. 20, there are 18 measurement points in the video motion capture according to the present embodiment, and each joint is labeled (Table 1). In addition, using the estimated joint position, the line connecting adjacent joints is treated as a bone (see FIG. 21). The number of bones is 17, and each bone is labeled (Table 1). The skeleton model, the target skeleton information (bone length), and the coordinates of each joint position for each frame acquired by motion capture are stored in the storage unit.

Although OpenPose (Non-Patent Document 3) is used in the video motion capture according to the present embodiment, another method may be used for estimating the two-dimensional joint position from the camera image by deep learning. Further, various methods for acquiring the posture of the target are known to those skilled in the art, and the method for acquiring the posture of the target is not limited to video motion capture (Patent Document 4, Non-Patent Document 2), for example. A method of analyzing an RGB image from one viewpoint to acquire motion data may be adopted, or a markerless motion capture using a camera and a depth sensor may be adopted.

[A-3] The shape to be acquired for the shape is defined by a polygon model or a polygon mesh. A polygon model is composed of vertices, edges, and faces. For example, in the case of a triangular mesh model, it has three vertices, three sides of the face of the triangle are composed of ridges, and the start point and the end point of each ridge are each composed of two vertices out of the three vertices. Each polygon can be represented by the 3D coordinate values of all the vertices that make up the polygon, and the simplest data structure of the polygon model can be the IDs and 3D coordinate values of all the vertices of the polygon model. .. Various data structures of the polygon model are known, and the shape of the target may be quantified, and the specific data structure is not limited.

Regarding the acquisition of the shape of the human body, a method using a parametric body model that can specify the body shape and posture by parameters is known. An example of a parametric body model is SMPL (A Skinned Multi-Person Linear Model) (

Patent Documents

1 and 2, Non-Patent Document 1). SMPL is a naked human parameter model that has 72 postures and 10 body parameters and returns a triangular mesh. In the model-based method, three-dimensional shape information is acquired by optimizing the parameters of the model so that they fit the human silhouette of the image. As a method for acquiring a shape using an SMPL model, there is one that constitutes a skin model in a state where no clothes are worn (Non-Patent Documents 5 and 6). In the experiment described later, a target mesh model was obtained using the method according to Non-Patent Document 5.

[A-4] By using feature quantity video motion capture (Patent Document 4, Non-Patent Document 2) acquired from a skin polygon having posture information, one or more cameras and a computer can be prepared. It is possible to reconstruct the motion of the skin in three dimensions, and it is also possible to reconstruct the approximate outer shape of the skin in three dimensions even if the clothes are worn from the images of one or more cameras (non-patent). Patent Documents 5 and 6). When the motion of the object is three-dimensionally reconstructed, not only the motion of the skeleton but also the shape information represented by the skin can be three-dimensionally reconstructed to express the rich motion information of the object. That is, by combining the time-series data of the posture and the time-series data of the shape, the motion information can be obtained from the temporal change (deformed skin polygon) of the shape of the individual. The feature amount of the motion is extracted by quantifying the skin polygon deformed by the motion using the shape representative value or the shape descriptor.

A known shape descriptor can be used when obtaining the shape representative value or the shape descriptor. Many candidates can be considered as the shape descriptor, but in this embodiment, Heat Kernel Signature (Non-Patent Document 8) is used. In this embodiment, HKS has identified a plurality of representative regions (closed curve groups or ring sets) on the body surface of the subject. The amount of calculation can be reduced by calculating the shape representative value or the shape descriptor using the information in the representative area.

As shown in FIG. 8, when the posture of the target changes to posture 1, posture 2, and posture 3 due to the movement, the coordinate value of the ring 1, the coordinate value of the ring 2, and the ring 1 and the ring depend on the posture. The spatial relationship between 2 also changes. For example, as shown in FIG. 9, in any two rings, the distance from each vertex of one ring to all the vertices of the other ring is calculated, and a vector storing this value is obtained. A vector using the distances between all the rings constituting the ring set representing the target shape or the distances between specific rings is used as the shape representative value or the shape descriptor. In the posture 1, the posture 2, and the posture 3, the shape representative value or the shape descriptor is obtained, and the time series data of the shape representative value or the shape descriptor is acquired. The array of time-series data may be used as a motion feature quantity, or the array of time-series data may be reduced in dimension to form a motion feature quantity. This motor feature amount includes information such as that a plurality of representative parts on the target's skin are separated or approached with the movement of the target (posture 1 → posture 2 → posture 3), and the posture is included. It can be said that the motion feature amount obtained from the skin polygon having information reflects the motion information and the skeletal information consisting of the time-series data of the posture of the target.

The motion feature amount acquired from the skin polygon having the posture information reflects the individual difference and individuality in the exercise, and by using this motion feature amount, the informationization of the individuality in the exercise and the personal authentication technology are established. be able to. This motor feature amount can be used as an index of motor change in exercise training and rehabilitation. In addition, using this motor feature amount, the degree of similarity between one's own movement and the movement of another person can be obtained, and the distance between the movement of a certain individual and the target movement can be expressed numerically. By capturing the walking motion of a person by the time change of the skin polygon with posture information, the motion feature amount acquired from the skin polygon having the posture information becomes the feature amount of the individual walking motion, and this feature amount identifies the individual. It can be used as important information for doing so. ‥

[A-5] Other feature quantities The feature quantities of movement can be acquired from the time-series data of posture. For the feature amount of motion, for example, the relative position of each joint with NOSE as the origin is calculated, the RWRIST, LWRIST, RANKLE, and LANKLE parts of the array containing the relative position of each joint are extracted, and the relative joint position is extracted. The array of can be defined as the feature quantity obtained from the motion information by collecting a certain number of frames and calculating the dispersion for each joint. It will be understood by those skilled in the art that there are various methods for acquiring low-dimensional motion features.

The length of the labeled bone can be used as a feature. From this three-dimensional skeletal information, skeletal features and motion features are defined. As for the skeletal features, for example, an array containing the bone length values can be created and defined as the skeletal features. The sequence contains the value of the i-th bone length of the label, and the length of the sequence is 17, which is equal to the number of bones. The skeletal features may be reduced in dimension.

From the shape of the human body, the shape feature amount that does not depend on the posture may be acquired. For example, the thickness or volume of a certain part of the human body, or the ratio or volume ratio of the thickness between a plurality of parts may be used as the shape feature amount.

Further, the mass distribution may be estimated from the skin polygons of the human body according to the shape, and the estimated value of the force of the joints and muscles generated by the movement may be used for extracting the movement feature amount. The volume of each part of the human body can be calculated from the shape information of the human body. The specific gravity of the body or body part is known, and the volume and specific gravity can be used to roughly calculate the mass distribution. By using the mass distribution of the human body, it is possible to estimate the force applied to each joint and each muscle in the movement of an individual with a shape. This is also valid information for personal authentication.

[A-6] Skin Polygon Model The skin polygon model according to the present embodiment will be described with reference to FIG. The posture of the target is represented by a skeleton model, and time-series data of the posture (posture 1 to 5) is acquired from the moving image data (time-series data of the image). Postures 1 to 5 do not necessarily have to be continuous frames, and are, for example, characteristic time-series frames in a predetermined motion, or time-series frames extracted from continuous frames for each predetermined number of frames. .. The skin polygon model provides a skin polygon having posture information corresponding to each posture. The skin polygons 1 to 5 corresponding to the postures 1 to 5 have vertex coordinates corresponding to the postures 1 to 5, respectively. Using the apex coordinates of the skin polygon, a shape representative value representing the shape of the target defined by the skin polygon is obtained. By acquiring the shape representative value 1 to the shape representative value 5 from each of the skin polygons 1 to 5, time-series data of the shape representative values can be obtained. The time-series data of the shape representative value reflects the motion data of the target, and the feature amount of the motion of the target is acquired using the time-series data of the shape representative value.

One embodiment of the acquisition of the skin polygon model will be described with reference to FIG. From the target video, the target posture information is acquired by motion capture. Based on the image of a certain posture of the target, the initial posture is obtained, and at the same time, the skin polygon corresponding to the initial posture is obtained. Known means can be used to obtain skin polygons from an image. By matching the coordinate systems of the skeleton model and the skin polygon model, the posture (joint position) and the coordinates of the polygon apex are made to correspond. A function (for example, a skinning function) that associates the coordinates of the apex of the skin polygon with the skeleton model (posture) has been obtained, and the coordinates of each apex in an arbitrary posture of the target are the initial coordinates and the above using the function. It can be obtained from the initial posture and the above-mentioned arbitrary posture. That is, the skin polygons 1 to 6 can be obtained from the postures 1 to 6, respectively.

One embodiment of the acquisition of the shape representative value will be described with reference to FIG. The skin polygon 1 has vertex coordinates corresponding to the posture 1, a shape descriptor is calculated using all the vertex coordinates, a plurality of representative regions are extracted using the calculation results, and a set consisting of a plurality of representative regions is used. Is used to calculate the shape representative value. The representative region is a set of vertices, which is specified by the vertex ID and coordinates. The shape representative value is, for example, a function of the coordinates of all vertices of any two representative regions, and is, for example, the distance between the vertices of any two representative regions.

For example, when HKS is used as the shape descriptor, a plurality of representative regions (rings) are extracted using the threshold value and the HKS value, and the ring set is used as the shape representative region. More specifically, as shown in FIG. 6, all the vertices are divided into two groups using the HKS value and the threshold value, and the set of vertices located at the boundary between the two groups is detected as a ring. The ring is identified by the vertex ID and coordinates. By changing the threshold value, a plurality of rings can be detected, and a ring set composed of a plurality of rings is acquired.

Ring sets are extracted by applying HKS to all vertices of skin polygon 1. Each ring in the ring set is identified by a vertex ID. The skin polygons 2 to 5 are each composed of vertices having IDs and coordinates, and the rings in the skin polygons 2 to 5 are specified from the IDs of the vertices constituting the ring, and are specified by the vertex IDs and coordinates. You can get the ring set to be done. The shape representative value is calculated using the ring set.

With reference to FIG. 7, the calculation of the feature amount using the time-series data of the shape representative value will be described. Acquire the ring set corresponding to the posture 1. Set a function of all vertex coordinates of all the rings that make up the ring set or some of the selected rings. This function is, for example, the distance between the vertex coordinates of one ring and the vertex coordinates of the other ring (see FIG. 9). Using these distances, the shape representative value is 1. The ring set corresponding to the posture 2 is acquired, and the shape representative value 2 is calculated in the same manner. The feature amount calculated using the shape representative value 1 and the shape representative value 2 is information on how one ring approaches or separates from another ring when the target posture is displaced from the posture 1 to the posture 2. (See Fig. 8). Therefore, when the postures 1 to 6 represent the specific movement of the target, the shape representative value 1 to the shape representative value 6 are considered to represent the movement of the target skin during the movement, and the shape. The feature amount acquired from the representative value 1 to the shape representative value 6 is a motion feature amount.

In FIG. 6, when the postures 1 to 5 correspond to the walking motion, the skin polygon changes depending on the postures 1 to 5. That is, the movement of the skin polygon (skin polygon 1 to skin polygon 5) corresponds to the walking motion, and the time-series data of the shape representative value representing the shape of the skin polygon represents the walking motion. The kinetic features obtained from the time-series data of the shape representative values can be important information for identifying an individual. Further, in FIG. 6, when the postures 1 to 5 correspond to the swing motion of golf, the time-series data of the shape representative value representing the shape of the skin polygon represents the swing motion. For example, when the performance is good, the swing motion is acquired from the time-series data of the shape representative value, and when the performance is poor, the motion feature is acquired from the swing motion, and the difference in performance is obtained. Can be quantified as the distance (difference) of the motion features.

In FIGS. 6 and 7, in one embodiment, the annular vertex group or the ring as the shape representative region is acquired using HKS, but the setting of the shape representative region is not limited to that using HKS, and the other. Existing shape descriptors may be used. Further, as shown in FIG. 23, in a predetermined plurality of parts of the human body, for example, in the thigh, lower leg, forearm, upper arm, chest, abdomen or waist, a predetermined position is selected and a shape representative area is set, and the shape representative area is set. May be determined by the vertex ID of the vertex set that is the shape representative region. In the embodiment shown in FIG. 24, two shape representative regions are provided at positions corresponding to the scapula on the back of the human body, and the movement of the scapula may be analyzed by the information obtained from the temporal change of the shape representative region. .. In the embodiment shown in FIG. 25, one shape representative region is provided on the back of the human body over a wide area, and the movement of the human body may be analyzed by the information obtained from the temporal change of the shape representative region. In FIGS. 24 and 25, the shape representative region is a ring-shaped vertex group, but the shape representative region may be a planar vertex group composed of a plurality of vertices.

In the embodiment shown in FIG. 10, HKS is applied to each of the skin polygons 1 to 5 to acquire the shape representative region 1 (ring set 1) to the shape representative region 5 (ring set 5), respectively. The shape representative value 1 to the shape representative value 5 are calculated from the shape representative region 1 (ring set 1) to the shape representative region 5 (ring set 5). In the present embodiment, the shape representative value is the perimeter of each ring and / or the cross-sectional area, and the feature quantity is an array consisting of the shape representative value 1 to the shape representative value 5 acquired in a plurality of frames.

With reference to FIG. 11, acquisition of a shape representative value composed of a cross-sectional area will be described. Acquire the ring set corresponding to the posture 1. Using the vertex coordinates of each ring constituting the ring set, the cross-sectional area in each ring (representative region) is calculated, and the shape representative value 1 in the posture 1 is acquired from the cross-sectional area set. Acquire the ring set corresponding to the posture 2. The cross-sectional area in each ring (representative region) is calculated using the vertex coordinates of each ring constituting the ring set, and the shape representative value 2 in the posture 2 is acquired from the cross-sectional area set. The shape representative value using the cross-sectional area is a feature quantity that is less dependent on the posture of the target and can be used for personal authentication (see the experiment described later).

[B] Shape descriptor of target body surface information [B-1] HKS (Heat Kernel Signature)
A shape descriptor is known as a method for extracting a feature amount from three-dimensional shape information. The shape descriptor is used for collation and search of the shape, and various methods have been proposed. HKS (Heat Kernel Signature) and WKS (Wave Kernel Signature) are known as shape descriptors for matching between non-rigid bodies such as the three-dimensional body surface of the human body. In this embodiment, HKS using the heat diffusion equation on the body surface is adopted, and the shape representative value representing the target shape is acquired by using HKS.

HKS (Heat Kernel Signature) as a shape descriptor will be described. Heat diffusion on a three-dimensional surface when the three-dimensional coordinates of two different points on the body surface are x and y, the set of manifold surfaces is M, the heat diffusion time is t, and u (x, t) is the heat distribution. equation

Fundamental solution of

The kt (x, y) in is called the heat kernel. Eigenvalue decomposition of the heat kernel

Will be. Here, λ _i and φ _i are the i-th eigenvalue and eigenfunction of Δ, respectively. Here, H (x, t) = kt (x, x) when y = x is defined as HKS (Heat Kernel Signature).

From the above equation, HKS is a function defined in each set M for each shape information and depends on the 3D coordinates x and the time t. The time t here is not the time of the model motion, but the elapsed time in HKS. FIG. 12 shows actually colored vertices of the model according to the HKS value. In reality, FIG. 12 is a color image. The maximum and minimum values of HKS were colored according to the maximum and minimum values of the color map, respectively. The HKS value is the smallest near the torso, and the HKS value increases toward the end of the body (hands and feet).

[B-2] Representative region representing the shape Extraction of the region representing the shape will be described. Let S be the set of coordinates on the body surface where HKS shows similar values, and let M be the set of all coordinates on the body surface of the polygon model. The set M of all coordinates on the body surface and the set S of a part on the body surface are the set of vertices of the mesh.

How to determine the set S will be explained. First, the set M is divided into two sets A and B above and below a _{predetermined threshold (H th} ) according to the HKS value of each vertex. That is,

Will be.

At this time, the points constituting the triangle in which the points belonging to the group A and the points belonging to the group B are simultaneously present at the three vertices of the triangular polygon are set as the set S (see FIG. 13). The set of vertices that make up the set S is variable by changing the value of the _{threshold H th.} Using the value of the HKS of k percentile when sorted values of HKS of all vertices in ascending order as the threshold value H _thk, the vertex set and S _k corresponding thereto (k = 1, ··· 99) .

Let the 99 vertex sets S be S ₁ , S ₂ , ..., S _{99 in} order from the smallest. FIG. 15 shows the vertex set S when the value of the threshold value H _{th is changed.} The boundary region consisting of the set of vertices S is a closed curved region or ring. FIG. 15 shows the boundary regions corresponding to S3, S8, S38, S70, and S80 in order from the left. FIG. 15 shows a case where the value _{of H th} is increased in order from the left, and _{it can be seen that when the value of H th} is increased, the position of the boundary region moves to the terminal part (hands and feet) of the body. Also, the number of boundary regions may vary depending on the value of _{the threshold H th.} In the two figures on the left side of FIG. 15, the number of boundary areas is 2, in the central view, the number of boundary areas is 5, and in the two figures on the right side, the number of boundary areas is 4.

FIG. 17 shows, as an example, a skin model having different body shapes and postures, in which _{all 17 boundaries of the sets S 3} , S ₈ , S ₃₈ , S ₇₀ , and S ₈₀ are displayed together in one skin model. .. From this figure, it can be seen that the positions of the boundary regions are substantially the same even if the body shapes and postures are different.

The set of vertices that make up the boundary region may be further classified. For example, a representative region (ring) may be extracted by performing k-means clustering on the vertex set included in each boundary region using the coordinates of the vertices. The shape representative value is calculated using a ring set consisting of all 17 boundary regions (rings) of the set S.

[B-3] Typical shape value (cross-sectional area or perimeter of ring)
In this embodiment, personal authentication is performed using the feature amount acquired from the shape information by HKS. By using HKS, it is possible to match the same part between people even if they have different shapes. Therefore, the same part (ring) is found using HKS, and the parts are compared to identify by shape.

An array containing the cross-sectional areas at all 17 boundaries of the boundary set S obtained in this way was used as the feature quantity of each model. For the vertices of each boundary, a plane with the largest variance between the coordinates of the vertices is obtained using principal component analysis, and each vertex is projected onto that plane to make it two-dimensional. The convex hull is obtained from the apex on the plane, and the perimeter and area of the convex hull are used as features.

^{Create arrays i} c and ⁱ a by rearranging the array containing the perimeter and area of the convex hull obtained from the boundary set S _{i in ascending order of value.} _{Next, let S = S a} , Sb, ..., Which is a combination of a plurality of boundary sets S _i. The label of the human and m, when the pose of the label was p, the frontier set defined ⁱ c for each S _i that is being included in the S, ⁱ set a person and pose by stacking sequence of a S ( That is, an array c (m, p) = ^a c, ^b c, ···, a (m, p) = ^a a, ^b a, ··· having a variable of the boundary set S _{i) is created.} Let c (m, p) and a (m, p) obtained in this way be the features from the shape information. In the example of FIG. 15, the set S is S = {S3, S8, S38, S70, S80}, and from the number of boundary sets, the number of elements k _i of 3c, 8c, 38c, 70c, 80c is k3 = 2, k8. = 2, k38 = 5, k70 = 4, k80 = 4, and the array length of c (m, p) is 17.

[C] Experiment [C-1] Data set used In this chapter, people with different data sets were identified using the proposed feature array, and the usefulness of the feature by the proposed method was confirmed. SHREC'14 (Shape Retrieval of Non-Rigid 3D Human Models) Human Dataset (Non-Patent Document 7) was used for the experiment. It consists of a total of 400 mesh models in which 40 people pose in 10 different poses (see Fig. 17). The number of vertices of the mesh is around 15,000. Since the accuracy of comparison between the same sex should be prioritized in personal authentication, the model of 20 males out of 40 models was used. Among the 200 types of models, there are some that cannot calculate HKS because a part of the mesh was missing, and finally, using a total of 163 types of skin models of 20 people and 9 poses excluding them. Identification was performed. Each of these models was processed to calculate HKS at time (t = 1000) and used as data.

[C-2] Normalization
Although the HKS value is normalized so that it does not depend on the size of the model, the cross-sectional area of the boundary region derived from HKS is normalized so that the feature quantity does not depend on the size of the model. .. As a method of normalization, a method of unifying the height (length of the femur), a method of unifying the cross-sectional area of the neck, a method of unifying the volume, etc. can be considered, but in the experiment, the method of normalizing by volume is used. Adopted. It should be noted that those skilled in the art will understand that normalization may or may not be necessary depending on the purpose of using the feature amount, and the normalization means are not limited to the above.

[C-3] Difference in correct answer rate depending on the pose The relationship between the pose and the correct answer rate was investigated. Each pose is shown in FIG. From the upper left to the lower right, the poses are 0 to 9, respectively. Since the mesh was missing in

pose

8, 9 of the 10 poses were used in the experiment. Table 2 shows the results of examining the correct answer rate at n = 1, 2, and 3 when normalization was performed by volume. The horizontal axis shows the pose type (p), and the vertical axis shows the correct answer rate at n. n is the correct answer rate when n body shape candidates inferred from a certain test data are given.

From the results, it is considered that the position of the arm and the angle of the joint do not affect the correct answer rate so much, while the joint angle of the knee can affect the correct answer rate. If the knee joint angle is obtuse, such as p = 3, 4, 6, there is no significant effect on discrimination, but it can be estimated that the effect cannot be ignored at an acute angle.

[C-4] The optimum boundary region was verified using a ring set consisting of sets S3, S8, S38, S70, and S80, but these are examples, and the set of boundary regions is limited to this. It's not something. There are a total of 99 sets S _h = 1, ..., 99, of which h = 5, 10, ..., 90, 95, which is a boundary of multiples of 5, a total of 19 rings. _{A total of 15 S h} were investigated, excluding S20, which did not form S20, and S85, S90, and S95, which are the parts of the hands and toes. We selected 4 sets from 15 S _h _{(about 3000 ways of 15} C ₄ ), and selected the most suitable combination from them. Table 3 shows the percentages of the top five correct answers and the results of the combinations, which had the highest percentage of correct answers from the approximately 3000 combinations.

From this, the maximum correct answer rate was 96.93% for the combination of S5, S15, S30, and S80. S5 is included in all of the top five, and it can be seen that the area near the fuselage is an important feature. Except for S35, the upper combinations include the boundary below S15, that is, around the torso, and the boundary above S65, that is, the part from the elbow to the wrist. From this, it is inferred that when the volume is normalized, the area around the torso and the area from the elbow to the wrist mainly contributes significantly. FIG. 18 shows a ring set consisting of S5, S15, S30, and S80.

[C-5] Identification experiment from walking motion In this experiment, the walking motion of the target on the treadmill was measured. The speed of the treadmill was set to 4.0 km / h, which is the average walking speed of a person. Four cameras were installed so as to surround the treadmill, and walking was measured at 60 fps for about one minute for three men in their twenties and one in their thirties, for a total of four people.

In the experiment, we wanted to use a skin model without clothes as an input, so we created a skin model by HMR (Human Mesh Recovery) disclosed in Non-Patent Document 5. The method of Non-Patent Document 5 is to reconstruct a skin model from one image. The skin model may be reconstructed based on a plurality of images.

In this experiment, the joint position acquired by video motion capture (Patent Document 4 and Non-Patent Document 2) is embedded in a skin model acquired in advance in a predetermined posture (for example, an upright posture or a T pose) to create a walking pose. The deformed one was treated as the shape information acquired from the camera. FIG. 22 shows the result of deforming the posture by applying the joint position acquired by video motion capture to the skin model acquired in advance. This shows a walking model every 20 frames (about 0.3 seconds) in order from the left. The shape information may be three-dimensionally reconstructed from the image on the treadmill.

For this skin model, features were extracted using S8, S20, S30, and S80 as boundaries, and four people were identified. As an evaluation method, the feature amount of the shape was taken out from a part of the measured walking data, and the average value of the feature amount for each person was taken and used as each feature base data. Next, each feature amount was extracted from the walking data other than the frame used in the creation of the base data, and these were used as test data. As a method of creating base data, since walking is not stable at the start and end of walking, select 600 consecutive frames from the frames excluding the relevant part. For the selected 600 frames, for the shape, a skin model is generated for each frame for a total of 60 frames obtained by extracting one frame for every 10 frames from the selected 600 frames, and for that skin model, the representative area S8 , S20, S30, S80 were created in an array containing the cross-sectional areas, and the average value was taken for each element of the 60 elements in this array as the base data.

Next, as a method of creating test data, from the walking data excluding the part used in the base data, in the case of shape, another 200 frames were selected from the selected frames and taken every 10 frames from there. For each of the 20 frames in total, the skin model was transformed so that it fits the skeleton, and a total of 20 arrays containing the cross-sectional areas created from the skin model were used as test data. We created a total of 80 test data, 4 base data for 4 people and 20 per person. For each test data, compare the L1 norms of the base data for 4 people, use the base data with the smallest value as the estimation result, and calculate the ratio that the estimation result and the test data are the same person, 88.75%. It became.

Claims

The shape of the object is specified by the skin polygon,
Each vertex of the skin polygon has coordinates depending on the posture of the target.
The shape of the object is represented by one or more representative regions selected from the skin polygons, and each representative region is a group of vertices consisting of a plurality of vertices.
Prepare time-series data of the skin polygons during the movement of the target,
In a plurality of frames, using the one or more representative regions, a shape representative value representing the target shape of each frame is calculated.
Using the time-series data of the shape representative value, a value representing the temporal change of the one or more representative regions accompanying the movement of the target is acquired as a motion feature amount.
How to get the amount of motor features.
The shape of the object is represented by a plurality of representative regions.
The temporal change of the plurality of representative regions includes the temporal change of the spatial relationship between the representative regions with the movement of the subject.
The method for acquiring a motor feature amount according to claim 1.
The spatial relationship between the representative regions is defined by a function of the vertex coordinates of any two representative regions.
The method for acquiring a motor feature amount according to claim 2.
The spatial relationship between representative regions is defined by the distance between vertices between any two representative regions.
The method for acquiring a motor feature amount according to any one of claims 2 and 3.
The representative region is a group of circular vertices arranged in a ring.
The method for acquiring a motor feature amount according to any one of claims 1 to 3.
The ring-shaped vertices are
For all vertices, get the HK value using HKS (Heat Kernel Signature) and
All vertices are divided into two groups according to the threshold,
Obtain a ring of vertices consisting of a set of vertices arranged in a circle at the boundary between the two groups.
Acquire multiple circular vertices by changing the threshold value,
The method for acquiring a motor feature amount according to claim 5.
The posture of the subject is specified by the skeletal model,
We have obtained a function that associates each vertex of the skin polygon model with the skeleton.
The initial coordinates of each vertex of the skin polygon model of interest are obtained depending on the specific initial posture.
The coordinates of each vertex in an arbitrary posture can be obtained from the initial coordinates, the initial posture, and the arbitrary posture by using the function.
The method for acquiring a motor feature amount according to any one of claims 1 to 6.
The posture of the subject is acquired by markerless motion capture using one or more images.
The method for acquiring a motor feature amount according to any one of claims 1 to 7.
The shape of the object is specified by the skin polygon,
Each vertex of the skin polygon model has coordinates depending on the posture of the target.
For all vertices, HKS values are obtained using HKS (Heat Kernel Signature).
All vertices are divided into two groups according to the threshold,
A group of annular vertices consisting of a set of vertices arranged in an annular shape located at the boundary between the two groups is acquired as a shape representative region.
How to get shape representative information.
The shape of the object is represented by a group of circular vertices.
By changing the threshold value, the plurality of circular vertices are determined.
The method for acquiring shape representative information according to claim 9.
A plurality of annular vertex groups representing the target shape are represented by a function of the vertex coordinates of the two annular vertex groups.
The method for acquiring shape representative information according to claim 10.
A plurality of ring-shaped vertices representing the shape of the target are represented as the distance between the vertices between the two ring-shaped vertices.
The method for acquiring shape representative information according to any one of claims 10 and 11.
A plurality of annular vertices representing the shape of the object are represented by the area of the area surrounded by each annular vertex group and / or the perimeter.
The method for acquiring shape representative information according to any one of claims 9 and 10.
It is equipped with a storage unit, a shape representative value calculation unit, and a motion feature amount calculation unit.
The storage unit stores time-series data of skin polygons that specify the shape of the target during exercise, and each vertex of the skin polygon has a vertex ID and coordinates depending on the posture of the target.
The shape of the object is represented by one or more representative regions selected from the skin polygons, and each representative region is a group of vertices specified by the vertex IDs and coordinates of the plurality of vertices.
The shape representative value calculation unit calculates a shape representative value that represents the shape of the target depending on the posture by using the one or a plurality of representative regions.
The motion feature amount calculation unit uses the time-series data of the shape representative values acquired in a plurality of frames, and uses the motion features to represent the temporal change of the one or more representative regions with the motion of the target. Calculated as a quantity,
A device for acquiring motion features.
The shape of the object is represented by a plurality of representative regions.
The shape representative value calculation unit calculates a shape representative value that represents the shape of the target depending on the posture from the spatial relationship between the representative regions.
The motion feature calculation unit uses time-series data of the shape representative values acquired in a plurality of frames to calculate a value representing the change in the spatial relationship between the representative regions due to the motion of the target as the motion feature. do,
The device for acquiring a motion feature amount according to claim 14.
The shape representative value is defined by a function of the vertex coordinates of any two representative regions.
The device for acquiring a motion feature amount according to claim 15.
The shape representative value is defined by the distance between vertices between any two representative regions.
The device for acquiring a motion feature amount according to any one of claims 15 and 16.