CN110516112B

CN110516112B - Human body action retrieval method and device based on hierarchical model

Info

Publication number: CN110516112B
Application number: CN201910799466.5A
Authority: CN
Inventors: 黄天羽; 黄晓舟
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2021-04-27
Anticipated expiration: 2039-08-28
Also published as: CN110516112A

Abstract

The invention relates to a human body action retrieval method and equipment based on a hierarchical model, wherein the method comprises the following steps: extracting information of each joint point of each frame from the human motion file, and calculating the position of the joint point in a world coordinate system; determining the joint point with the largest position change and the main movement direction of the joint point; coding the motion file data according to a hierarchical model; storing the file motion data and the layer model code thereof into a motion database; establishing an index tree for all files according to the hierarchical model code corresponding to each file in the database; the database is retrieved using hierarchical model coding. The invention uses the coding based on the hierarchical model to reserve the main geometric characteristics of the motion data, converts the retrieval of the complex motion data into the retrieval of the simple digital coding, and greatly improves the time efficiency of the retrieval.

Description

Human body action retrieval method and device based on hierarchical model

Technical Field

The invention relates to a data retrieval method and equipment, in particular to a human body action retrieval method and equipment based on a hierarchical model.

Background

With the rise of motion capture technology and the progress of various optical and mechanical motion capture devices, people can quickly acquire a large number of human motion three-dimensional data files. Because the human body motion three-dimensional data file can accurately record all motion tracks of an experimenter in each time period, the detailed details of human body motion can be obtained by analyzing the data obtained by motion capture, and the convenience of obtaining the motion data related work and the reliability of the data are greatly improved. The reuse of motion data and the establishment of large-scale motion databases provide a more time-saving, economical solution for motion capture technologies, and also put higher demands on the organization and search technologies of motion databases.

Motion retrieval technology is a key technology for realizing motion capture data multiplexing. The human body action sequence is a typical high-dimensional time sequence, and for the processing of high-dimensional information, if a conventional method is adopted for retrieval, a large amount of running time and memory space are consumed. Therefore, it is very important to select a proper feature representation method so that the retrieval speed and the retrieval quality can be accepted. The existing retrieval of the motion database is mainly realized by extracting geometric features and calculating Euclidean distances, the purpose is to realize the retrieval of the content of the motion database, but the complex geometric features and the Euclidean distances are calculated, so that a large amount of running time is consumed during the database retrieval, and the real-time requirement cannot be met.

Disclosure of Invention

The invention aims to provide a human body action retrieval method and equipment based on a hierarchical model aiming at the defects of the prior art, which use new geometric features to realize the motion data retrieval based on contents and improve the time efficiency of the conventional retrieval method.

In order to achieve the above object, the present invention provides a human body action retrieval method based on a hierarchical model, comprising the following steps:

extracting information of each joint point of each frame from the human motion file, and calculating the position of the joint point in a world coordinate system;

determining the joint point with the largest position change and the main movement direction of the joint point;

coding the motion file data according to a hierarchical model, wherein a first-level code corresponds to a joint point with the maximum position change, a second-level code corresponds to the main motion direction of the joint point, and a third-level code corresponds to the motion frequency;

storing the file motion data and the layer model code thereof into a motion database;

establishing an index tree for all files according to the hierarchical model code corresponding to each file in the database, wherein non-leaf nodes of the index tree comprise codes of corresponding hierarchies, and leaf nodes comprise an index structure inode and record the following index information: { layer model encoding, filename, pointer of previous inode, pointer of next inode };

when the motion database is searched, firstly, hierarchical model coding is carried out on the inquired motion, leaf nodes with similar codes are searched in the index tree, and finally, all motion files with similar codes are searched according to the pointer information of the inode, so that a similar motion candidate file set is obtained.

Preferably, the joint point with the largest position change and the main movement direction of the joint point are determined according to the following method:

respectively setting oscillation factors S_x、S_y、S_zRecording the maximum displacement of each joint point in the human skeleton framework in the direction of X, Y, Z under a world coordinate system; find S_x、S_y、S_zThe joint point with the largest position change and the main movement direction of the joint point can be determined.

Preferably, after the position of each joint point of each frame under the world coordinate system is calculated, the key frame is distinguished and extracted, the hierarchical model code of the file is determined by using the key frame data, and only the key frame motion data and the hierarchical model code thereof are stored in the motion database.

Preferably, after the key frame extraction, the motion period is judged according to the position of each joint point of each frame, and the key frame in one motion period is selected to determine the joint point with the largest position change and the main motion direction of the joint point.

Preferably, the data after dimensionality reduction is subjected to filtering processing, the difference between the current position of each data frame of each joint point and the maximum change position or the minimum change position of the joint in the whole motion is compared, if the difference value is larger than the dynamic change average value of the joint point, the difference value is regarded as interference information, and the position information of the joint point under the data frame is deleted.

The invention also provides a storage device for human body action retrieval, having stored therein a plurality of instructions adapted to be loaded and executed by a processor:

Preferably, the instructions determine the joint point with the largest position change and the main movement direction of the joint point according to the following method:

Preferably, after the instruction calculates the position of each joint point of each frame in the world coordinate system, the instruction performs key frame discrimination and extraction, determines the hierarchical model code of the file by using key frame data, and only stores the key frame motion data and the hierarchical model code thereof in the motion database.

Preferably, after the instruction extracts the key frames, the instruction determines a motion period according to the position of each joint point of each frame, and selects a key frame in a motion period to determine the joint point with the largest position change and the main motion direction of the joint point.

Preferably, the instruction performs filtering processing on the data after dimensionality reduction, compares the difference between the current position of each data frame of each joint point and the maximum change position or the minimum change position of the joint in the whole motion, and if the difference value is greater than the dynamic change average value of the joint point, the difference value is regarded as interference information, that is, the position information of the joint point in the data frame is deleted.

Advantageous effects

The human body action retrieval method and the human body action retrieval equipment based on the hierarchical model reserve the main geometric characteristics of the motion data by using the codes based on the hierarchical model, convert the retrieval of complex motion data into the retrieval of simple digital codes, and greatly improve the time efficiency of the retrieval.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a file structure of a BVH file header;

fig. 3 shows the file structure of the BVH file data segment.

Fig. 4 is a human skeleton level model with Hips joints as root nodes.

Fig. 5 is a schematic plan view of human body movement.

FIG. 6 is a human motion hierarchy model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

When acquiring human motion data, conventional inertial motion capture devices typically store the raw motion data as a BVH file, which follows the idea of data transformation based on parent-child relationships. Because a BVH file has a parent-child node relationship, the file can be generally divided into two parts, one part is a file header for storing a human skeleton hierarchical structure, and as shown in FIG. 2, the Hierarchy is generally used as a file starting identifier and the ROOT is used as an identifier of a starting point of the human skeleton hierarchical structure, so that internal relation existing in the human skeleton hierarchical structure can be clearly presented in a concise form, and the type and the range of data required by the human skeleton node can be well specified; the second is a data segment for storing specific MOTION data, as shown in fig. 3, MOTION is usually used as a start identifier of the data segment, and Frames is used as a length identifier of how many MOTION sequences are included, so that a specific human skeleton hierarchical structure can be distinguished from the specific MOTION data.

As can be seen from the examples of fig. 2 and 3, the human skeletal hierarchy has Hips as the root node, as shown in fig. 4. The translation amount of each bone joint point relative to the parent node in the local coordinate system is acquired in the human bone architecture definition part in the BVH file, and the rotation component relative to the parent node is acquired in each data frame. The root node data in the data frame comprises information of six dimensions, namely displacement of the root node in a world coordinate system and rotation angle information in a local coordinate system. Data (-195.76, 92.90, 76.64) in fig. 3 is displacement information of the root node start frame, and data (6.83, -1.04, 91.45) is rotation angle information of the root node start frame. Except for root node data, data of any bone node is only composed of three rotation angle data, namely pitch angle data, roll angle data and yaw angle data of each bone joint point relative to a father node. From these data, the position of each joint in the world coordinate system can be obtained for each frame.

Whether common sports in daily life or competitive sports on track and field competition fields, all important joint points of human skeleton present regular space-time characteristics and movement frequency, and constitute semantic characteristics of human movement. An important space-time characteristic of human motion is the joint point with the largest position change and the main motion direction of the joint point, and most motion types can be distinguished through the semantic characteristic. The idea of the invention is to utilize the semantic characteristics of human motion to realize efficient human motion retrieval based on content.

When human motion data is processed, a basic plane of a human body can be generally defined as a horizontal plane, a frontal plane and a sagittal plane, wherein the horizontal plane is a tangent plane which transversely cuts a body and is parallel to the ground in an upright state, and divides the body into an upper part and a lower part; the frontal plane is a longitudinal section made by taking the left and right diameters of the body as tangent lines, and divides the human body into a front part and a rear part; the sagittal plane is a longitudinal section which is made by taking the front and back diameters of the human body as tangents and divides the human body into a left part and a right part; defining the basic axes of a human body as a frontal axis, a vertical axis and a sagittal axis, wherein the frontal axis is an X axis, is vertical to the sagittal plane, is oriented in the left-right direction and is the intersection line of the frontal plane and a horizontal plane; the vertical axis is the Y axis, is vertical to the horizontal plane, is in the up-down direction, and is the intersection line of the frontal plane and the sagittal plane; the sagittal axis is the Z axis, perpendicular to the frontal plane, oriented in the front-back direction, and is the intersection line of the sagittal plane and the horizontal plane.

Taking the most common practice of running the human body as an example, as shown in fig. 5, the human body has definite displacement in the sagittal plane. Wherein, the trunk part does not shake on the horizontal plane and the frontal plane obviously and greatly; the four limbs swing obviously and regularly in the sagittal plane, and the swing amplitude of the lower limbs is far greater than that of the upper limbs, so that the lower limbs are judged to be the main force-exerting part; meanwhile, the specific exercise intensity can be judged according to the swing frequency of the limbs and the position exchange speed. Therefore, the position information of each joint point of the human skeleton model in the sagittal plane is reserved, and the main semantic features can be determined.

Based on this idea, embodiment 1 of the present invention implements a human body action retrieval method based on a hierarchical model, including the following steps:

s1: extracting information of each joint point of each frame from the human motion file, and calculating the position of the joint point in a world coordinate system;

s2: determining the joint point with the largest position change and the main movement direction of the joint point;

s3: coding the motion file data according to a hierarchical model, wherein a first-level code corresponds to a joint point with the maximum position change, a second-level code corresponds to the main motion direction of the joint point, and a third-level code corresponds to the motion frequency;

s4: storing the file motion data and the layer model code thereof into a motion database;

s5: establishing an index tree for all files according to the hierarchical model code corresponding to each file in the database, wherein non-leaf nodes of the index tree comprise codes of corresponding hierarchies, and leaf nodes comprise an index structure inode and record the following index information: { layer model encoding, filename, pointer of previous inode, pointer of next inode };

s6: when the motion database is searched, firstly, hierarchical model coding is carried out on the inquired motion, leaf nodes with similar codes are searched in the index tree, and finally, all motion files with similar codes are searched according to the pointer information of the inode, so that a similar motion candidate file set is obtained.

In embodiment 1, step S2 specifies the joint point with the largest position change and the main movement direction of the joint point according to the following method:

Step S3 is a core step of the search method of the present embodiment, and embodies the core idea of the present invention. An important space-time characteristic of human motion is the joint point with the largest position change and the main motion direction of the joint point, and most motion types can be distinguished through the semantic characteristic. Based on the semantic features of human motion, the invention constructs a human motion level model as shown in fig. 6, and realizes efficient content-based human motion retrieval based on the model. In the human motion level model, the first level takes the main force-exerting part as a motion distinguishing standard to distinguish human motion according to the joint points of the human model; the second level takes the main movement direction of the main force-exerting movement joint point as a movement distinguishing standard and is divided into movement generating displacement on an X axis, movement generating displacement on a Y axis and movement generating displacement on a Z axis; and in the third level, the movement frequency is used as a movement distinguishing standard, and a user defines the movement frequency as different specifications of fast, medium and slow movement by self-defining a threshold value so as to distinguish and define movement.

All movements can be coded according to a well defined hierarchical model. Table 1 shows an example of hierarchical model coding in embodiment 1, where the level 1 motion code is a number of a joint point, the second level motion code takes {0,1,2} values, respectively, representing motion data whose main motion directions are X-axis, Y-axis, and Z-axis, and the third level motion code takes {0,1,2} values, respectively, representing motion data whose motion frequency is fast, medium, and slow.

TABLE 1 human action hierarchical model coding

The definition of the hierarchical model can vary widely based on the inventive idea. For example, joint points may be combined and classified into upper limb exertion exercises, lower limb exertion exercises, trunk exertion exercises, and comprehensive exertion exercises. Thus, the main force-exerting part can be determined to be 4 cases. The hierarchical model is encoded by using specific decimal numbers to represent each classification as shown in table 1, for example, 12 is the joint point with the number of 12, or binary is used, and three coordinate axes corresponding to the movement direction are represented by two binary digits, and if the main force part is determined to be 4, the three coordinate axes are represented by two binary digits.

In step S4, the file motion data and its hierarchical model code are stored in the motion database.

In step S5, an index tree is built for all files according to the hierarchical model code corresponding to each file in the database, non-leaf nodes of the index tree include codes of corresponding hierarchies, and leaf nodes include an index structure inode, and the following index information is recorded: { layer model encoding, filename, pointer of previous inode, pointer of next inode };

in the conventional search of the motion database, if content-based search is required, geometric features are generally extracted and Euclidean distances are calculated, and complicated geometric calculation is required in the process of constructing the index and each search, so that a great deal of time is consumed. According to the invention, the coding based on the hierarchical model is used for reserving the main geometric characteristics of the motion data, the retrieval of the complex motion data is converted into the retrieval of the simple digital coding, the retrieval efficiency of the database can be greatly improved, and the method is very suitable for the application scenes of coarse-grained motion retrieval such as classification of the motion data and the like and is also suitable for the retrieval of the file level.

Step S6 is a step of searching using the database with the established index: when the motion database is searched, firstly, hierarchical model coding is carried out on the inquired motion, leaf nodes with similar codes are searched in the index tree, and finally, all motion files with similar codes are searched according to the pointer information of the inode, so that a similar motion candidate file set is obtained.

Example 2 on the basis of example 1, further optimization was performed, including:

1) and after the position of each joint point of each frame under the world coordinate system is calculated, judging and extracting key frames, and performing subsequent processing by using key frame data. The consistency of human motion enables the position variation of each joint point in a human skeleton framework to have strong similarity in a small unit time, which directly results in the redundancy of human motion data information. In order to improve the effective utilization rate of data and reduce redundancy, the identification and extraction of key frames need to be performed on human motion data. The key frame extraction is mainly to filter and remove the interference frames which are not representative and the invalid frames under special conditions in the motion, and meanwhile, the overall consistency with the human motion in the original human motion three-dimensional data file is still required to be maintained.

2) After extracting the key frame, judging the motion period according to the position of each joint point of each frame, and selecting the key frame in one motion period for subsequent processing. Because the data in different motion periods have strong similarity, the data in one motion period is selected for calculation, and the processing process is simplified.

3) And carrying out filtering processing on the data subjected to the dimensionality reduction. In the process of collecting human motion data by wearing motion capture equipment, inevitable differences of individual motion habits and body postures exist among different individuals participating in sample collection, which leads to the special condition of experimental data to a certain extent, namely, small parts of invalid information or information with strong interference exists in the human motion data. The filtering method adopted in embodiment 2 is as follows: comparing the difference between the current position of each data frame of a certain joint point and the maximum change position or the minimum change position of the joint in the whole motion, if the difference is larger than the dynamic change average value of the joint point, the difference is regarded as interference information, namely the position information of the joint point under the data frame is deleted, and therefore invalid data and interference data are filtered.

Embodiment 3 realizes a storage device for human body motion retrieval, in which a plurality of instructions are stored, the instructions being adapted to be loaded by a processor and to perform the method as shown in fig. 1:

The instructions of embodiment 3 determine the joint point with the largest position change and the main movement direction of the joint point according to the following method:

The instructions of embodiment 3 calculate the position of each joint point in each frame in the world coordinate system, then perform the key frame discrimination and extraction, determine the hierarchical model code of the file using the key frame data, and store only the key frame motion data and the hierarchical model code thereof in the motion database.

The instruction of embodiment 3 determines a motion period according to the position of each joint point of each frame after extracting the key frame, and selects a key frame in a motion period to determine the joint point with the largest position change and the main motion direction of the joint point.

The instruction of embodiment 3 performs filtering processing on the data after dimensionality reduction, compares the difference between the current position of each data frame of each joint point and the maximum change position or the minimum change position of the joint in the whole motion, and if the difference is greater than the average value of the dynamic changes of the joint point, the difference is regarded as interference information, that is, the position information of the joint point in the data frame is deleted.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art will be able to make various modifications and variations without departing from the spirit and scope of the present invention, and such modifications and variations fall within the scope of the invention defined by the appended claims.

Claims

1. A human body action retrieval method based on a hierarchical model is characterized by comprising the following steps:

2. The human motion search method according to claim 1, wherein the joint point whose position changes most and the main movement direction of the joint point are determined according to the following method:

3. The human body motion search method according to claim 2, wherein the positions of the joint points in the world coordinate system of each frame are calculated, and then the determination and extraction of the key frames are performed, and the hierarchical model code of the file is determined using the key frame data, and only the key frame motion data and the hierarchical model code thereof are stored in the motion database.

4. The human body motion retrieval method of claim 3, wherein after the key frame extraction, the motion period is determined according to the position of each joint point of each frame, and the key frame in one motion period is selected to determine the joint point with the largest position change and the main motion direction of the joint point.

5. The human body motion search method according to claim 4, wherein the data after dimension reduction is filtered, the difference between the current position of each data frame of each joint point and the maximum variation position or the minimum variation position of the joint in the whole motion is compared, and if the difference is larger than the average value of the dynamic variation of the joint point, the information is regarded as interference information, and the position information of the joint point in the data frame is deleted.

6. A memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor to:

7. The memory device of claim 6, wherein the instructions determine the articulation point having the greatest change in position and the primary direction of motion for the articulation point according to the following method:

8. The storage device of claim 7, wherein the instructions calculate the position of each joint point in the world coordinate system for each frame, perform key frame identification and extraction, determine the hierarchical model code of the file using the key frame data, and store only the key frame motion data and the hierarchical model code thereof in the motion database.

9. The storage device of claim 8, wherein the instructions determine a motion period according to the position of each joint point in each frame after performing key frame extraction, and select a key frame in a motion period to determine the joint point with the largest position change and the main motion direction of the joint point.

10. The storage device of claim 9, wherein the instructions filter the reduced dimension data, compare the difference between the current position of each data frame of each joint point and the maximum variation position or the minimum variation position of the joint in the whole motion, and if the difference is larger than the average value of the dynamic variations of the joint point, the difference is regarded as interference information, i.e. the position information of the joint point under the data frame is deleted.