WO2009145071A1

WO2009145071A1 - Motion database structure, motion data normalization method for the motion database structure, and searching device and method using the motion database structure

Info

Publication number: WO2009145071A1
Application number: PCT/JP2009/059037
Authority: WO
Inventors: 仁彦中村; 克山根; 能迪山口
Original assignee: 国立大学法人東京大学
Priority date: 2008-05-28
Filing date: 2009-05-15
Publication date: 2009-12-03
Also published as: JPWO2009145071A1

Abstract

Provided is a database structure for storing and reusing a large volume of motion data obtained from motion capture data and manual work by the animator. The database structure uses a plurality of sets of normalized motion data and is a hierarchical database having a tree structure. The highest layer of the tree structure is a node including all frames of the plurality of sets of normalized motion data. Each of the frames of the normalized motion data is included in any one of the nodes of each hierarchy of the tree structure, and each node includes a frame having a nearer state quantity as going from an upper layer to a lower layer. The transition probabilities between a plurality of nodes constituting each hierarchy are stored on a per-hierarchy basis.

Description

Exercise database structure, exercise data normalization method for the exercise database structure, and search apparatus and method using the exercise database structure

The present invention relates to an exercise database structure, an exercise data normalization method for the exercise database structure, and a search apparatus and method using the exercise database structure.

In the fields of medical, sports and entertainment, cases of handling human skeleton model motion data are increasing. Motion capture requires a large amount of equipment costs and labor costs, and there is a problem that it is difficult to reuse data if the marker arrangement and the skeleton model are different.

Actually, in the CG / game field, there was no means for classifying and accumulating a large amount of exercise data measured so far, and it was virtually impossible to reuse.

Up to now, as shown in Non-Patent Documents 1 to 4, attempts have been made to construct an exercise database, but since it is a one- to two-layer database that depends on the joint structure and marker arrangement, the search speed is high. However, there is a limit that the motion that can be applied is limited to walking.
L. Kovar, M. Gleicher, and F. Pighin.Motion graphs. In Proceedings of ACM SIGGRAPH 2002, pp. 473.482, 2002. O. Arikan and DA Forsyth: "Interactive Motion Generation from Examples," Proceedings of SIGGRAPH2002, pp. 483-490, 2002. J. Lee et al .: "Interactive control of avatars animated with human motion data," Proceedings of SIGGRAPH2002, pp. 491-500, 2002. W. Takano, K. Yamane, and Y. Nakamura: "Capture Database through Symbolization, Recognition and Generation of Motion Patterns," Proceedings of IEEE ICRA, pp. 3092-3097, 2007.

The present invention uses a database structure for accumulating and reusing a large amount of motion data obtained by motion capture data or manual operation of an animator, a method for normalizing motion data for the database structure, and a database structure. We propose a search method for the existing exercise data.
The present invention proposes a method for constructing a large-scale motion database in a manner that does not depend on the marker position at the time of measurement / generation of motion data or the joint structure of the model.
The present invention proposes a method for searching for a motion / pause from a motion database using a motion / pause of one frame or more as a key.

The first technical means adopted by the present invention is:
A database structure using a plurality of normalized motion data,
The time interval of each frame of the plurality of normalized motion data is shared,
The database structure is a hierarchical database having a tree structure;
The top layer of the tree structure is one node that contains all frames of multiple normalized motion data,
Each frame of normalized motion data is included in any one node of each hierarchy of the tree structure, and each node has a frame having a state quantity closer to the lower layer from the upper layer. Is included,
For each tier, transition probabilities between a plurality of nodes constituting each tier are stored.
An exercise database structure.

One motion data is composed of a plurality of frames arranged in time series. Each frame corresponds to the posture of the skeleton model, and each frame is specified by a state quantity.
Normalized motion data is typically motion data obtained by normalizing motion data obtained by different skeletal models and methods, but a large amount of motion data (resulting in the same method) for the same skeleton model. It does not exclude the database structure consisting of (normalized).
In constructing a motion database from normalized motion data, the time interval Δt of each frame affects the transition probability, so the time interval Δt is shared. It will be understood by those skilled in the art that when there is motion data composed of time-series frames, the time interval of frames can be changed to an arbitrary interval by polynomial interpolation. In one embodiment, the motion data is converted into joint angle data by inverse kinematic calculation, and then subjected to polynomial interpolation to obtain motion data every Δt. The motion data may be interpolated at the marker locus stage. Further, in the present specification, “the time interval is made common” includes a case where the time interval is the same from the beginning in a plurality of exercise data constituting the exercise database.
The skeletal model is a character on the computer for reproducing the joint angle data or the marker position, and the posture of the skeleton model is specified by the position of the joint angle data or the marker set.
The marker set is a mark for measuring the movement of the subject, and is also a target value for causing the character to reproduce the movement.
The state quantity is a variable that uniquely represents the “state” of the robot, the system, or the like, and here represents the state of the skeleton model. In one aspect, the state quantity is determined by the position and velocity of each marker in the marker set that can determine any pose of the skeleton model. Further, the speed of the root link may be included in the state quantity (particularly, when a marker is not arranged on the root link). Alternatively, the joint angle of the skeleton model or a vector in which the joint angle and the joint speed are arranged may be used as the state quantity.
The root link is a link corresponding to a root of a tree structure that represents a joint model. In a typical embodiment, the root link is a trunk link, one of the links when the trunk is composed of a plurality of links, or a link of a part corresponding to the waist.

In one aspect, the tree structure is a binary tree structure. The binary tree structure constituting the database structure of the present invention may be a full binary tree or a complete binary tree, but is not necessarily a full binary tree or a complete binary tree. The tree structure of the database structure of the present invention may include a balanced tree (B-tree) or a multi-part tree in a part of the binary tree structure as a subject. For example, a balanced tree (B-tree) may be used so that the binary tree is not biased. The “binary tree structure” in this specification includes a binary tree structure that includes a part of a branch other than the binary tree.
In addition, although a binary tree structure is advantageous from the viewpoint of search speed, the tree structure of the database structure of the present invention is not limited to the binary tree structure, and may be a balanced tree (B-tree) or a multi-tree.
Each node of the binary tree includes frames having close state quantities regardless of the type of motion (each motion data) among the frames included in the motion data set. Each node includes a frame having a closer state quantity in accordance with the hierarchy from the upper layer to the lower layer.
In one aspect, a node in the lowest layer of the tree structure includes a predetermined number or more of frame data. Generally, unless a minimum value of the number of frames included in the lowest layer node is set, the search speed is slowed down. Moreover, the data amount of the whole database will increase.
It is understood by those skilled in the art that the minimum value of the number of frames included in the lowest layer node can be appropriately determined in consideration of the granularity of search, the upper limit of the total data amount, and the like. For example, in the case of 200 fps data and assuming a granularity that transitions to the next node every 0.2 seconds on average, the number of frames per node can be set to about 40. Alternatively, after determining the upper limit of the data amount of the entire database, the number of frames of the lowest layer node is determined so that the data amount of the entire database falls within the determined upper limit. By increasing the number of frames of the lowest layer node, the total number of nodes can be reduced and the data amount of the entire database can be reduced. In particular, when data compression is performed by storing only the feature values (average, variance, etc.) of each node, it is advantageous to reduce the total number of nodes.
Each node of the tree structure is characterized by a feature amount based on a state amount of a plurality of frames included in the node. Examples of the feature amount include average (including trim average and weighted average), median, and variance (covariance matrix). In one aspect, the feature amount of each node is stored as a table.
In one aspect, the representative value of the state quantities of a plurality of frames included in each node is used as the feature quantity of each node, and the feature quantity of each node is replaced with the state quantities of the plurality of frames included in each node or In addition, it is stored.

The exercise database structure according to the present invention stores transition probabilities between nodes in each hierarchy. In the present invention, there is a table of transition probabilities between two nodes. One is a table of transition probabilities between nodes in each hierarchy regardless of the distinction of exercise data. The other is a table of transition probabilities between nodes in each hierarchy for each motion data.

In one aspect, the state quantity of each frame of the normalized motion data is determined by the position and speed of each virtual marker of the virtual marker set that can determine an arbitrary posture of the skeleton model. By adopting a virtual marker set, a motion database can be constructed in a manner that does not depend on the marker position at the time of measurement / generation of the motion data or the joint structure of the model. However, if the motion database structure of the present invention has normalized motion data, the normalization method is not limited and the link position in the absolute coordinate system is based on the joint angle. This does not exclude data that has been normalized by such a method.

<Summary of motion database according to the present invention> In the motion database to be used, all the frames included in a plurality of operations to be learned become samples and constitute one database. The sampling time for the learning operation is constant and Δt. The database has a binary tree structure in which a maximum of two child nodes are connected to a certain node, and all samples are included in any node belonging to that layer in each layer.

In each node, a statistic (representative value) for calculating the likelihood that a certain state quantity is generated from the node is calculated. In the simplest embodiment, the mean and covariance matrix of all samples (frame state quantities) is used, and the likelihood is calculated assuming a Gaussian distribution. In order to calculate a more accurate likelihood, it is conceivable to use a mixed Gaussian distribution or the like. Hereinafter, the likelihood that the state quantity s is generated from the node n is expressed as F (s | n). For a Gaussian distribution with mean μ and covariance matrix Σ, F (s | n) can be calculated as:

Here, D is the dimension of the state quantity vector.

Furthermore, the transition between nodes observed in the learned operation and the respective transition probabilities are obtained. There are transition probabilities obtained by counting transitions without considering which operation the sample belongs to, and transition probabilities obtained by counting only transitions within the same operation. There is one set of transition probability tables for each layer. The latter transition probability table has one set for each layer and each action.

The second technical means adopted by the present invention is:
A top node is formed from all frames of a plurality of normalized motion data to form a tree structure root,
A first division step of performing node division until the number of frames included in the nodes of each hierarchy is equal to or less than a first threshold;
A second division step of performing node division sequentially from the lowest layer node of the first division step until the number of frames included in the nodes of each hierarchy is equal to or less than the second threshold;
With
The first division step performs node division based on a one-dimensional index acquired from a state quantity of a frame included in each node;
In the second dividing step, node division is performed by hierarchical clustering using a state quantity of a frame included in each node.
An exercise database construction method using a plurality of normalized exercise data.

In one aspect, the first dividing step and the second dividing step divide each node into two lower nodes. As a result, a binary tree structured database is constructed from the normalized data.

In one aspect, the first division step is node division based on principal component analysis. The covariance matrix of the node is calculated, and the principal component analysis is performed on the covariance matrix. The first principal component score for each frame is calculated, and the nodes are bisected in descending order of the first principal component score. Repeat until the number of frames in the node falls below the threshold. The first division step is not limited to the one using principal component analysis. For example, the node may be divided using the height of the center of gravity as an index. Examples of how to obtain the coordinates of the center of gravity include a method of calculating the center of gravity in consideration of the mass of each link after simple or weighted average of marker positions or inverse kinematics calculation. As a one-dimensional index used in the first division step, in addition to the first principal component score and the height of the center of gravity, other coordinate values of the center of gravity, the height of the root link, and the like can be adopted by those skilled in the art. Understood.

As the second division step, the Ward method is shown in the embodiment described later, but other hierarchical clustering methods such as the shortest distance method may be used.

When distributing a frame to nodes, information about which node each frame and the next frame belong to is stored, and the transition probability between each node in the same layer is obtained.

The third technical means adopted by the present invention is:
A method for normalizing exercise data,
The motion data consists of time series data of each joint angle of the skeletal model,
A virtual marker set composed of a plurality of virtual markers is arranged at a predetermined position of the skeleton model so that an arbitrary posture of the skeleton model can be determined,
From the time series data of the joint angles, obtain the position and speed of the virtual marker set of each frame, the position and speed of the virtual marker set as the state quantity of the frame,
Convert the state quantity of each frame to a coordinate system based on the root link of the skeleton model,
Scale the state quantity of each converted frame,
Normalized motion data by arranging the state quantities of each scaled frame in time series
This is a method for normalizing exercise data.
The motion data normalization method may further include sharing the time interval of each frame.

A virtual marker set composed of virtual markers is prepared so that an arbitrary posture of an arbitrary skeleton model can be determined. For example, if it is a human skeleton model, even if it is a different skeleton model, it is considered that it has almost the same link structure, and by preparing a virtual marker set consisting of a sufficient number of virtual markers, any skeleton model Any posture can be determined. In one aspect, the virtual marker set has a mounting position (which position on which link) is determined with respect to a basic basic skeleton model (which may be a virtual model or an actual model). By associating each link between the target skeleton model and the basic skeleton model, the virtual marker set is arranged in the target skeleton model.

In one aspect, the coordinate system based on the root link of the skeletal model is in an upright posture, and a single axis is added to the vector representing the direction considered as the “front” direction of the humanoid link mechanism in the coordinate system of the root link. And this axis is projected onto the horizontal plane to make a new axis. As long as correspondence between different humanoid link mechanisms can be obtained, it is not always necessary to be “front”, and one axis may be taken as a vector in the “rear” direction, the “left” direction, and the “right” direction.

In one aspect, the time-series data of the joint angle is obtained by inverse kinematic calculation from the time-series data of the marker positions arranged in the skeleton model.
In one embodiment, the time series Q of the joint angle data is obtained by inverse kinematics using the measurement model _Su and the measurement marker set _Mu , and the Q, the measurement model _Su and the virtual marker set _Mv are used. get the time series P _v 'virtual marker position by the forward kinematics, by scaling it, obtain a time series P _v of the virtual marker position.
In one aspect, the time-series data of the joint angle is directly given to the skeleton model.

The fourth technical means adopted by the present invention is:
An exercise data search device comprising the database structure according to any one of claims 1 to 6, an input unit, an output unit, and a processing unit,
The input unit is configured to input a normalized state quantity or state quantity sequence corresponding to motion data of one frame or more,
The processing unit calculates the likelihood of generating the input state quantity or state quantity sequence using the representative value of each node based on the state quantities of a plurality of frames included in each node,
The output unit is configured to output one or more nodes or one or more node transition sequences having a high likelihood as candidates,
An exercise data retrieval device;
As a representative value of each node based on the state quantities of a plurality of frames included in each node, an average, median, trim average, weighted average, or the like can be used.
The time interval between the frames of the input state quantity sequence does not necessarily need to coincide with the time interval of the frames constituting the database structure. It will also be understood by those skilled in the art that the time interval between frames of the input state quantity sequence can be changed by interpolation.

In one embodiment,
The processing unit uses the inter-node transition probability to calculate one or more next nodes to which the calculated node can transition,
The output unit outputs one or more calculated next nodes as candidates.
In one embodiment,
The state quantity input from the input unit is determined by the position and speed of each virtual marker in the virtual marker set.
The processing unit calculates the likelihood using all or partial information of the state quantity input from the input unit. For example, the next posture with attention paid to a part of the body may be presented, or the likelihood considering only the marker to be noticed may be used.

The fifth technical means adopted by the present invention is:
An exercise data search method using the database structure according to claim 1,
Inputting a normalized state quantity or state quantity sequence corresponding to at least one frame of motion data;
Calculating a likelihood of generating an input state quantity or state quantity sequence using a representative value of each node based on a state quantity of a plurality of frames included in each node;
Outputting one or more nodes or one or more node transition sequences having a high likelihood as candidates,
This is an exercise data search method.

In one embodiment,
Furthermore, using the inter-node transition probability, calculating one or more next nodes to which the calculated node can transition;
One or more calculated next nodes are output as candidates.

In one embodiment,
The state quantity to be input is determined by the position and speed of each virtual marker in the virtual marker set.
In one aspect, the motion data of one frame or more is specified by the joint angle data of a certain joint model,
The normalized state quantity input is
Obtaining the position or position and speed of the virtual marker by forward kinematics calculation based on the joint angle data;
Converting the position or position and velocity of the acquired virtual marker into a coordinate system based on the root link of the skeleton model;
Scaling the position or position and velocity of the transformed virtual marker;
Obtained by.
The one or more node candidates or the one or more node transition sequence candidates are:
Based on the representative value of the state quantity included in the node candidate or node transition series candidate, obtain joint angle data by inverse kinematics calculation,
Displayed by applying the acquired joint angle data to the joint model.

In the single frame search, a state quantity specifying a certain posture of the joint model is input, and a node having a high likelihood of generating the input state quantity is searched. Typically, a node that easily transitions from the searched node is searched and presented as a next frame node candidate.
In one aspect, the search for a node corresponding to the next frame of a certain posture is as follows.
(1) The position of the virtual marker is obtained by forward kinematics calculation based on joint angle data in the posture (joint model is known).
(2) The position of the virtual marker is converted into the coordinate system {O ′} and scaled as input data.
(3) One or more node candidates corresponding to the input data are obtained.
(4) Node candidates are selected based on inter-node transition probabilities.
(5) Based on the representative value (for example, average value) of the virtual marker position information included in the node candidate, the scale is returned, and joint angle data is obtained by inverse kinematic calculation.
(5) The joint angle data is applied to the joint model and displayed.
Multi-frame search, that is, the likelihood that the motion to be searched is generated from a certain transition is the product of “the likelihood that the state of each frame is generated from a node” and “the transition probability between each node”. And search for the transition that maximizes it. In the embodiment described later, a search using the Dijkstra method will be described, but the search method applicable to the present invention is not limited to the Dijkstra method. As a search method other than the Dijkstra method, a depth-first search, a breadth-first search, or A ^* search can be exemplified as long as it is a problem in which a heuristic function can be designed well.

The hardware configuration of the exercise database structure, the exercise data normalization method for the exercise database structure, and the search apparatus and method using the exercise database structure according to the present invention is composed of one or a plurality of computer devices. The The computer apparatus includes a storage means for storing data, an input means for inputting data, a processing means for performing calculation using the input data and the stored data, an output means for outputting a processing result, and an output processing result Display means for displaying.
The present invention can be provided as a computer program that executes a computer in order to normalize exercise data. The present invention can be provided as a computer program that causes a computer to execute an exercise data search using an exercise database structure.

The motion database according to the present invention is characterized in that it is databased in a reusable form independent of the marker arrangement at the time of measurement / generation and the joint structure / size of the model. Moreover, the transition probability between each node of each layer of a tree structure is stored. As a result, motion capture data scattered on the network can be widely used, and the search function can recognize and predict motions, which can be applied to generation of animations and human motion recognition by robots.

As for the database search method, it is necessary to present multiple candidates as fast as possible for one frame, so a binary tree with a search time of O (log N) in consideration of the high-speed search and the possibility of implementation. It was adopted. Therefore, similar operations can be searched at high speed with O (log N) using pauses and operations as keys.

[A] Construction of Motion Database FIG. 1 shows an outline of a database construction method. First, in order to store various marker arrangements and model motion data in one database, the motion data is normalized. Furthermore, in order to perform a search with various accuracy at high speed, the normalized data is registered in a binary tree structure database. Moreover, the transition probability between the nodes of each hierarchy of the binary tree structure is calculated using the frame information of the original motion data.

[A-1] Normalization of motion data In order to make the database independent of the data format, a virtual marker set _Mv is defined, and the motion is expressed using the position / velocity of each marker included therein. Taking sufficiently large number N _m of the markers included in the M _v, movement of the most practical skeleton model is expressible. One aspect of the virtual marker set is shown in FIG. The skeleton model shown in FIG. 7 is a model with 155 degrees of freedom, and the virtual marker set is composed of 34 virtual markers. Although the number of virtual markers is not limited, it is considered that an arbitrary posture of the skeleton model can be expressed by arranging about 34 virtual markers in the skeleton model.

When constructing a general-purpose large-scale motion database, it is necessary to normalize from original data to registration data. The original data can be created by various tools. To obtain the data P _u of the measurement marker position of the time series user using the measurement marker set M _u and the measurement model S _u in motion capture, measurement marker set to register the motion database of the present invention We need M _u and model S _u and a virtual marker set M _v in that model.

Therefore, for the motion capture data, first, the model S _u and marker set M _u used at the time of measurement, and the arrangement of M _v in S _u are specified. The time series data Θ of the joint angle θ is obtained from the time series data P _{u of} S _u , M _u and the marker position p _u by inverse kinematic calculation. The theta, obtains the time-series data P _v of the virtual marker position p _v by the forward kinematics calculation from S _u and virtual markers set M _v.

In the case of manual animation, since the animator directly operates the character, the time series data Θ of the joint angle of the generation model can be obtained directly. Therefore, to register the motion database of the present invention requires the virtual markers set M _v in the model and generate a model S _u. That is, in the case of an operation generated manually by an animator, S _u and Θ are directly given, and therefore P _v is obtained by giving the arrangement of M _v in S _u .

Then converted into a state amount x for registering the position p _v of the virtual markers set in the database. Hereinafter, a front direction of the character which is fixed in the root link S _u x-axis, using a coordinate system with the upward direction as the z-axis {R}. Further, the vertical upward direction of the absolute coordinate system is taken as the z axis. In each frame, the x-axis of the coordinate system {R} is projected onto the xy plane of the absolute coordinate system {O}, the starting point is the origin of the coordinate system {O ′}, and the shadow is the x-axis. Further, the same direction as the z axis of the absolute coordinate system {O} is taken as the z axis of {O ′}. Finally, if the y-axis is selected according to the right-handed system, {O ′} is determined. In other words, the vertical foot with the center of gravity of the waist link (root link) down to the ground is the origin, the direction perpendicular to the ground is the z-axis, and the direction of the human front projected onto the ground is the x-axis. With this local coordinate system, it is possible to classify the frame data based only on the posture without distinguishing between the human standing position and the rotation direction. The method of taking each axis at the time of conversion, that is, the x-axis, the y-axis, and the z-axis is not limited. For example, in the coordinate system {R}, the forward direction of the character may be the z axis and the upward direction may be the y axis.

The position of the virtual marker X as seen from this {O '}

And speed

And arranging for all markers, state quantity x '

Is obtained.

Finally, scaling to eliminate differences due to the size of the model S _u. The state quantity x registered in the database using the coefficient α for normalizing the model _Su

Is obtained. α is determined, for example, so that the height is one unit length. Alternatively, the ratio between the model _Su used by the user and a predetermined basic model may be α.

This state quantity x is arranged in time series, and normalized motion data Ma representing the operation a of the number of frames N _f

Define Here, the dimension 6N _m of the state quantity is represented as D.

[A-2] Database construction A motion database is constructed using all frames of one or more motion data M = [M ₁ , M ₂ ,..., M _Na ]. In order to enable transition between operations and to perform searches with various accuracies at high speed, all frames are stored in a hierarchical database having one binary tree structure regardless of the type of operation. Each frame is included in any node in each hierarchy of the binary tree, and each node includes a frame having a close state quantity with a granularity corresponding to the hierarchy. The feature quantity of each node is the average and variance of the state quantities of the frames included in that node.

Various methods have been proposed for hierarchical clustering, but generally distance information between all samples is required. Although the present invention does not exclude such clustering using distance information, clustering of data including a large number of samples (thousands to tens of thousands or more) as in the present invention is based on the viewpoint of memory amount and calculation amount. Therefore, it cannot be said that it is preferable. On the other hand, since human motion is continuous, it is considered that a set of frame data does not have a clear cluster structure when it includes a sufficiently large number of motions.

From the above, when the upper hierarchy node of the binary tree is divided into nodes in the immediately lower hierarchy, since a large number of frames are included, a simple division method based on principal component analysis is applied, and the number of frames in the node is a constant value n. Clustering is performed by the Ward method, which is one of the hierarchical clustering methods, after the value becomes _c or less.

[A-2-1] Upper Layer Node Division Method First, nodes B ₁ and ₁ including all frames of the motion marker data set M are created in the first layer. Thereafter, when the i-th node B _{l, i} in the l-th layer includes n _c or more samples (frame data) _{, the i-} th node B _{l, i} is divided into two child nodes by the following procedure. First, a principal component analysis of the covariance matrix Σ _{l, i} of the state quantities included in B _{l, i} is performed, and each state quantity included in B _{l, i} is projected onto the principal axis of the first principal component to obtain the first principal component. Find the component score. In order to divide these into two nodes, an appropriate threshold value may be obtained. Here, the automatic threshold selection method of Kittler et al. (J. Kittler and J. Illingworth. Minimum error thresholding Pattern Recognition, Vol. 19, No. 1, pp. 41.47, 1986.). Alternatively, half of the frames in the node may be divided into two child nodes in descending order of the first principal component score.

When [A-2-2] number of samples clustering a node by Ward method becomes less n _c, a n _c pieces of node group comprising one sample in the initial state, performs clustering using the Ward method. After clustering is completed, nodes whose number of samples is equal to or less than a certain threshold are deleted.

[A-3] Internode Transition Probability First, the internode transition probability regardless of the type of operation is obtained. As described above, all the frame data in one motion data M is distributed to each node of each layer. When distributing to each node, information on which node each frame data belongs to is recorded. . This information is used to determine the internode transition probability of each layer. That is, when information on which node each frame and the next frame belong to is recorded when distributing the frame to the nodes in each layer, the transition probability a _ij between the nodes in the same layer is transferred to each node. The number of transitions N _ij from node i to node j is divided by the total number of transitions from node i to other nodes including itself.

For example, in FIG. 5, 1 → 1 five times, 1 → 2 twice, if 1 → 3 three times, a ₁₃ becomes 0.3. This transition information is used when a candidate for the next frame is presented at the time of retrieval.

Also find the transition probability of each layer node in each operation. The transition information between the nodes of each layer for the entire motion data M is obtained as described above, but the independent information of each motion data Mi has disappeared in this information. However, transition information for each operation may be required depending on the purpose of use of the database. Therefore, as shown in FIG. 6, the transition probability of the node of each motion data Mi is expressed. The basic method is the same as above, but it is expressed in the left-to-right type in order to express the time series. The transition probability between the nodes is updated when the node is transitioned, and is set so that the sum of the probability of self-transition and transition to another node is 1. Each layer holds a table of node transition probabilities for each operation (each motion data).

[B] Retrieval using motion database [B-1] Outline Using a database constructed from a plurality of motion data, the best correspondence to given motion data of one frame or more, that is, given motion data A node transition sequence that maximizes the likelihood to be generated is obtained. Using this result, it is possible to recognize which motion in the database corresponds to the given motion. In addition, since the node that can make the next transition from the last node and its transition probability are known, the next operation can be predicted.

In the search, the state quantity x corresponding to the posture in a certain frame is generated from B _{l, i} using the average μ _{l, i} of the frames in the node B _{l, i} and the covariance matrix Σ _{l, i.} Every time

Is used. As the state quantity x, the virtual marker position and velocity calculated from the posture are normalized by the same method as [A-1]. That is, in searching for a node corresponding to a certain posture, the position of the virtual marker is obtained by forward kinematics calculation based on joint angle data (joint model is known) in the posture. Next, the position of the virtual marker is converted into the coordinate system {O ′}, and the state quantity obtained by scaling is used as input data. However, if speed information is not available or only some virtual markers are considered, Partial Observation (DH Lee and Y. Nakamura. Mimesis from partial observations. In Proceedings of IEEE / RSJ International Conference on Intelligent Robots and Systems , pp. 1911.1916, 2005.) For the components not considered, the average value is used as it is.

[B-2] Corresponding Node Candidate A candidate for the lowest layer node corresponding to each frame of motion data composed of one or more frames input as a search key is obtained. At this time, by using the hierarchical structure of the database and following the high likelihood of the number 8 from the upper hierarchy, candidates can be obtained at high speed. As a method of deleting a node having a low likelihood when moving to a lower hierarchy, a method of cutting off at a certain threshold, a method of leaving a certain number, a method of cutting off when the likelihood greatly decreases, and the like can be considered.

By the above method, one or more corresponding node candidates are obtained for each frame, and are arranged in descending order of likelihood.

[B-3] Optimal node transition sequence When a plurality of frames are given, the node transition having the highest likelihood is searched in consideration of the transition probability between nodes. Using the internode transition probability and the corresponding node candidate, a node transition sequence that maximizes the likelihood of generating a given motion is obtained. Consider a search using movement data M = [x1x2... X _K ] consisting of K frames as a key. The likelihood that the node transition sequence B = [B1, B2,..., B _K ] generates motion data M is calculated by the following equation.

Here, a _{k, k + 1} is a transition probability from the node B _k to B _{k + 1} , and it is assumed that a _{K, K + 1} = 1.

In one embodiment, this is cost

The Dijkstra method (Steven M. LaValle. Planning Algorithms. Cambridge University Press, New York, NY, 2006.) is applied.

[B-4] Detailed Explanation of Search Using Motion Database As search targets, state quantities s1, s2,..., SN in N key frames and times t1, t2,. Let tN be given. The purpose of the search is to find a node transition sequence that best represents these key frames, ie, has the maximum generation likelihood. As a search method, there are the following options.

[Presence of interpolation]
(1) With interpolation: A method in which a given key frame is interpolated in advance and a node is associated with each frame after interpolation.
(2) No interpolation: A method for obtaining a node transition sequence that passes a key frame at a given time.
[Whether there is any change during operation]
(1) If the node transition is limited to the same operation, the search is not allowed to change during the operation.
(2) If all possible transitions are taken into account regardless of the type of motion, the search will allow a change in the middle of the motion.
Since the above combinations are free, there are four ways to search.

The Dijkstra method is used as one embodiment of the search method of the present invention. The Dijkstra method is one of the discrete optimization methods, counting the possible inputs after one step in order from the lowest cost candidate. It is guaranteed that an optimal input sequence can be obtained when the final step has the lowest cost. In this method, which of the possible transitions is selected as an input, the logarithm of the reciprocal of the generation likelihood is used as the cost.

[B-4-1] Preparation The set C _i of the possible nodes corresponding to the state quantity s _i (i = 1, 2,..., N) corresponding to the key frame among the nodes at the bottom of the database. Ask for. This is generally obtained at high speed by following a node with a high likelihood from the highest layer. At this time, in each layer, only those having a generation likelihood higher than a certain threshold value may be considered in the next layer, or may be cut off at a large change in order of decreasing likelihood.
Hereinafter, the probability of transition from node n _i to n _j is represented as T (n _i , n _j ). This includes the probability of transition to self, that is, n _i = n _j .

[B-4-2] Upper Search Searches for an optimum route that sequentially follows the nodes included in C _i (i = 1, 2,..., N).
The generation likelihood P for the node transition sequence {n ₁ → n ₂ → ... → n _N } (n _i ∈C _i ) is calculated by the following equation.

Here, T (hat) _m (n _i , n _j ) represents the probability of transition from node n _i to n _j in m steps, and is obtained by a lower search described in the next section. M _i represents the number of frames of the motion data learned by the time interval between the key frames s _i and s _{i + 1} , and M _i = (t _{i + 1} −t _i ) / Δt Desired. The Dijkstra method that costs -logP is applied to P.

[B-4-3] Subordinate search T (hat) _m (n _i , n _j ) is a node transition sequence that can move from node n _i to n _{j in} m steps, but has the highest probability. Defined as probability. This is a transition series

Probability of occurrence

Is obtained by the Dijkstra method with a cost of -logP.

[B-4-4] Correspondence to each option Presence / absence of interpolation: When performing interpolation in advance, all frames after interpolation are regarded as key frames. Therefore, the low-order search is always one step and is very fast. However, since the number of steps of the upper search increases, the overall search time cannot be said simply.
Presence / absence of change in operation: In the upper / lower search, if only transitions in the same operation as the transition of the previous step are considered when listing possible transitions, the search becomes within the same operation, and the binary tree If all possible transitions in a layer are considered, the search can be changed during the operation.

[B-5] Example FIG. 8A shows an example illustrating possible transitions between lowermost nodes in a database. The top represents all possible transitions and transition probabilities (T) without considering the type of motion, and the bottom represents the transitions in the three motion data used for database construction. In the transition diagram in the action, the transition probability and the arrow indicating the transition to oneself are omitted. A search example based on the above transition diagram is shown below.

FIG. 8B is an example of input to the search system. Here, three key frames s ₁ , s ₂ , and s ₃ are designated, and the time interval between the key frames corresponds to 2 frames and 3 frames of operation data included in the database, respectively.

First, nodes that each key frame may correspond to are listed. Assume that the results are arranged in descending order of likelihood as shown in the lower part of FIG. 8B. That is, C1 = {n ₁ , n ₃ } and the like. The number next to the node represents the likelihood (it may be greater than 1 because it is not a probability).

There are two nodes n ₁ and n ₃ that can correspond to the key frame s ₁ , and the likelihood is P = 3.0, 0.1, and the likelihood of n ₁ is higher, so the cost is lower. First, consider the case where n ₁ corresponds to s ₁ .

Since n ₄ and n ₁ may correspond to s ₂ , first, it is searched whether transition from n ₁ to n ₄ can be made in two steps. As a result, it can be seen that the following transitions are possible.

The transition probabilities are 0.2 × 0.1 = 0.02 and 0.05 × 0.3 = 0.015, respectively. Both of these may be retained, or only the optimal transition may be left. Here, for the sake of simplicity, only the former, which is the optimum transition, is left. The likelihood of combining this with the likelihood that the key frame s ₂ is generated from the node n ₄ is 2.0 × 0.02 = 0.04. Meanwhile, the path of transition in two steps from the n ₁ to n ₁ are the following two types.

Since the transition probabilities are 0.7 × 0.7 = 0.49 and 0.2 × 0.3 = 0.06, respectively, the former is the optimal transition, and the combined likelihood of the key frame generation likelihood is 0.5 × 0.49 = 0.245. From the above, when n ₁ corresponds to s _1, there are _two transitions to s _2: n ₁ → n ₁ (P = 0.245) and n ₁ → n ₄ (P = 0.02) in descending order of likelihood. It will be.

Taking the generation likelihood of s ₁ into consideration, the current candidates are arranged in descending order of likelihood as follows.

Since the Dijkstra method always considers the one with the lowest cost (the highest likelihood in this example), the next _one that has transitioned from s ₁ : n _{1 to} s ₂ : n ₁ is considered.

n ₂ and n5 may correspond to s ₃ . First, when searching for a method of transition from n ₁ to n ₂ in three steps, the following result is obtained.

Among these, the one with the maximum transition probability is adopted. The method of transition from n ₁ to n ₅ in three steps and its probability are similarly searched. The candidates obtained by these are added and arranged in descending order of likelihood, and similarly, the one with the highest likelihood is considered. Due to the nature of the Dijkstra method, the optimum transition sequence having the maximum likelihood is found when a candidate that has already reached s ₃ is selected, and the search ends.

[C] Application to Key Frame Generation Support Tool [C-1] Background As an application example of the proposed database, implementation as a key frame generation support tool in character animation is shown. In general, key frames in character animation are often generated manually by animators using inverse kinematics calculation or the like, but even a skilled animator takes time.

Therefore, if you create a small number of poses, the next pose candidate is presented by a search using them as a key, and you can easily generate a large number of key frames by selecting the most appropriate one from them. Implemented. The next pose candidate can be obtained by calculating the node that can transition from the last node in the transition sequence obtained by the search in the previous section and performing inverse kinematics calculation using the marker position obtained from the average value of each node. is there.

[C-2] Database construction Ten types of motion capture data “running, walking, jumping, etc.” acquired at 200 fps were used. After performing inverse kinematics calculation in each frame, the total number of frames was 3480 as a result of resampling at 120 fps to facilitate conversion to 30 fps or 24 fps, which is normally used in animation. Further, a virtual marker set (see FIG. 7) with the number of markers N _m = 34 was used. When the maximum number of samples for clustering by the Ward method is Nc = 200 and the minimum number of samples at each node is 24, the total time required for file reading, inverse kinematics, forward kinematics, and database construction is 1190 seconds there were. If simple node division is performed in all layers, the time is 260 seconds.

[C-3] Results FIG. 9 shows an example in which the top eight candidates among the next pose candidates obtained from the search results are presented using a single frame as a key. The character displayed in the center is the posture input, and the next pose candidates are arranged in descending order of likelihood from the left. It can be seen that appropriate search results are obtained for each input. Table 1 shows the number of frames of input motion data, the time required for search and inverse kinematics calculation, and the number of candidates obtained.

For inverse kinematics calculations, see “Katsu Yamane, Yoshihiko Nakamura. Cooperative structured interface for generating human figure whole body motion. Journal of the Robotics Society of Japan, Vol. 20, No. 3, pp. 335.343, 2002. The convergence calculation was repeated 10 times using the method of "." Compared with the inverse kinematic calculation in which 34 × 3 = 102 constraints must be taken into consideration for each candidate, the time required for the search is short.

[D] Additional registration of motion data Although the method of initial registration of motion data has been described, in order to make the database more versatile, an additional function of the motion data database is required. One mode of additional registration of exercise data will be described. In the aspect described here, the node is reconfigured using the backup data. The backup database is a place where all the frames of the motion data used for the construction of the motion database and each information of the constructed motion database are recorded.

As shown in FIG. 10, first, initial registration of motion data in the motion database is performed (step 1).
All frame and tree configurations for which initial registration has been completed and the average and covariance rows of each node are stored in the backup database (step 2).
Next, when using the motion database, the node average and covariance matrix are read from the backup database, and the motion database is reconfigured (step 3).
When additionally registering motion data, information on the frames belonging to each node is acquired from the backup database, and the average and covariance matrix of the node and the index of the frame held by the node are updated in consideration of the added data ( Step 4).
When the update is completed, the motion database can be reconstructed by updating and registering the average and covariance matrix of each node and the frame held by the node again in the backup database, and repeating the following.

One aspect of the node update method will be described. Assume that the number of frames in node B _{l, i} before additional registration in a certain node is n, the average is μ _{l, i} , and the covariance matrix is Σ _{l, i} . When frame x _{n + 1 at} this node comes in, the mean and covariance matrix are:

As a result, the node average and the covariance matrix are sequentially updated. Whether it is assigned to a certain frame added here is determined by the likelihood calculated by the equation shown in Equation 8 ((A) in FIG. 11).

From the highest node to the child node, x _{n + 1} is added to the node with the maximum likelihood in each layer. Similarly to the case of new registration, when the number of frames in the node exceeds the threshold, it is divided into two child nodes ((B) in FIG. 11). Thus, when the tree has a different depth depending on the branch, a single child node is added to match the branch having the maximum depth ((C) in FIG. 11).

The present invention can be used not only in computer graphics but also in various fields such as sports science, rehabilitation, and medical fields.

An overview of how to register motion data in the motion database is shown below. In the figure, IK indicates inverse kinematics calculation and FK indicates forward kinematics calculation. The conversion of the position of the virtual marker set acquired by the forward kinematics calculation to the coordinate system based on the root link of the skeleton model is shown. The scaling of the position of the virtual marker set obtained in FIG. 2 is shown. An outline of the binary tree structure is shown. In the present invention, the depth of each node of the binary tree is called a layer, and the highest layer is the first layer. The number of nodes in the l-th layer is 2 ^l-1 . Each frame data (state quantity x) of the set M of motion data is included in any one node of each layer of the binary tree. It is a figure explaining calculation of the transition probability between the some nodes which comprise each hierarchy (1st layer). It is a figure explaining calculation of transition probability between a plurality of nodes which constitute each hierarchy for every exercise data. It is a figure which shows a virtual marker set. The transition diagram of the lowest layer node is shown. An example of input to search is shown. Display experimental results. The overall flow of additional registration of motion data is shown. The additional registration of motion data in a binary tree structure is shown.

Claims

A database structure using a plurality of normalized motion data,
The time interval of each frame of the plurality of normalized motion data is shared,
The database structure is a hierarchical database having a tree structure;
The top layer of the tree structure is one node that contains all frames of multiple normalized motion data,
Each frame of normalized motion data is included in any one node of each hierarchy of the tree structure, and each node has a frame having a state quantity closer to the lower layer from the upper layer. Is included,
For each tier, transition probabilities between a plurality of nodes constituting each tier are stored.
Exercise database structure.
The exercise database structure according to claim 1, wherein the tree structure is a binary tree structure.
The exercise database structure according to claim 1, wherein transition probabilities between a plurality of nodes constituting each hierarchy are stored for each exercise data.
The motion database according to any one of claims 1 to 3, wherein a state quantity of each frame of the normalized motion data is determined by a position and a velocity of each virtual marker of a virtual marker set capable of determining an arbitrary posture of the skeleton model. Construction.
The exercise database structure according to any one of claims 1 to 4, wherein a node of the lowest layer of the tree structure includes a predetermined number of frames or more.
The representative value of the state quantity of each frame included in each node is used as the feature quantity of each node, and the feature quantity of each node is stored in place of or in addition to the state quantity of the plurality of frames included in each node. The exercise database structure according to any one of claims 1 to 5.
A top node is formed from all frames of a plurality of normalized motion data to form a tree structure root,
A first division step of performing node division until the number of frames included in the nodes of each hierarchy is equal to or less than a first threshold;
A second division step of performing node division sequentially from the lowest layer node of the first division step until the number of frames included in the nodes of each hierarchy is equal to or less than the second threshold;
With
The first division step performs node division based on a one-dimensional index acquired from a state quantity of a frame included in each node;
In the second dividing step, node division is performed by hierarchical clustering using a state quantity of a frame included in each node.
Exercise database construction method using multiple normalized exercise data.
The exercise database construction method according to claim 7, wherein the first division step and the second division step divide each node into two lower nodes.
9. The motion database according to claim 7, wherein a state quantity of each frame of the normalized motion data is determined by a position and a speed of each virtual marker of a virtual marker set capable of determining an arbitrary posture of the skeleton model. Construction.
10. The exercise database construction method according to claim 7, wherein the first division step is node division based on principal component analysis, and the index is a first principal component score.
10. The exercise database construction method according to claim 7, wherein the index in the first division step is a height of a center of gravity.
When distributing a plurality of frames included in one upper node to a plurality of lower nodes, information on which node each frame constituting each motion data and the next frame belongs to is stored. The method for constructing an exercise database according to any one of claims 7 to 11, further comprising obtaining a transition probability between nodes.
A method for normalizing exercise data,
The motion data consists of time series data of each joint angle of the skeletal model,
A virtual marker set composed of a plurality of virtual markers is arranged at a predetermined position of the skeleton model so that an arbitrary posture of the skeleton model can be determined,
From the time series data of the joint angles, obtain the position and speed of the virtual marker set of each frame, the position and speed of the virtual marker set as the state quantity of the frame,
Convert the state quantity of each frame to a coordinate system based on the root link of the skeleton model,
Scale the state quantity of each converted frame,
Normalized motion data by arranging the state quantities of each scaled frame in time series
Normalization method for exercise data.
The method for normalizing motion data according to claim 13, comprising sharing the time interval of each frame.
15. The method for normalizing motion data according to claim 13, wherein the time-series data of the joint angles is obtained by inverse kinematic calculation from time-series data of marker positions arranged in a certain skeleton model. .
15. The method for normalizing motion data according to claim 13, wherein the time series data of the joint angles is directly given to the skeleton model.
An exercise data search device comprising the database structure according to any one of claims 1 to 6, an input unit, an output unit, and a processing unit,
The input unit is configured to input a normalized state quantity or state quantity sequence corresponding to motion data of one frame or more,
The processing unit calculates the likelihood of generating the input state quantity or state quantity sequence using the representative value of each node based on the state quantities of a plurality of frames included in each node,
The output unit is configured to output one or more nodes or one or more node transition sequences having a high likelihood as candidates,
Exercise data retrieval device.
The processing unit uses the inter-node transition probability to calculate one or more next nodes to which the calculated node can transition,
The output unit outputs one or more calculated next nodes as candidates,
The exercise data search device according to claim 17.
The state data input from the input unit is determined by the position and speed of each virtual marker of the virtual marker set, the motion data search device according to any one of claims 17 and 18.
An exercise data search method using the database structure according to claim 1,
Inputting a normalized state quantity or state quantity sequence corresponding to at least one frame of motion data;
Calculating a likelihood of generating an input state quantity or state quantity sequence using a representative value of each node based on a state quantity of a plurality of frames included in each node;
Outputting one or more nodes or one or more node transition sequences having a high likelihood as candidates,
Exercise data search method.
Furthermore, using the inter-node transition probability, calculating one or more next nodes to which the calculated node can transition;
Output one or more calculated next nodes as candidates,
The exercise data search method according to claim 20.
22. The motion data search method according to claim 20, wherein the input state quantity is determined by the position and speed of each virtual marker in the virtual marker set.
The motion data of one frame or more is specified by the joint angle data of a certain joint model,
The normalized state quantity input is
Obtaining the position or position and speed of the virtual marker by forward kinematics calculation based on the joint angle data;
Converting the position or position and velocity of the acquired virtual marker into a coordinate system based on the root link of the skeleton model;
Scaling the position or position and velocity of the transformed virtual marker;
Obtained by the
The exercise data search method according to claim 22.
The one or more node candidates or the one or more node transition sequence candidates are:
Based on the representative value of the state quantity included in the node candidate or node transition series candidate, obtain joint angle data by inverse kinematics calculation,
Displayed by applying the acquired joint angle data to the joint model.
The exercise data search method according to claim 23.