US11303925B2 - Image coding method, action recognition method, and action recognition apparatus - Google Patents
Image coding method, action recognition method, and action recognition apparatus Download PDFInfo
- Publication number
- US11303925B2 US11303925B2 US16/903,938 US202016903938A US11303925B2 US 11303925 B2 US11303925 B2 US 11303925B2 US 202016903938 A US202016903938 A US 202016903938A US 11303925 B2 US11303925 B2 US 11303925B2
- Authority
- US
- United States
- Prior art keywords
- angular velocity
- linear velocity
- matrix
- velocity
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/54—Motion estimation other than block-based using feature points or meshes
-
- G06K9/00335—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
Definitions
- This application relates to the field of artificial intelligence technologies, and in particular, to an image coding method, an action recognition method, and a computer device.
- action recognition technologies based on artificial intelligence technologies have a wide range of application scenarios in social life, including a home child care-giving robot, dangerous behavior monitoring in a public place, human-computer interaction game development, and the like.
- a user's action can be recognized to make a warning about the user's dangerous action in a timely manner, to avoid a dangerous event.
- a plurality of groups of human skeleton data need to be collected by using a collection device, an action feature vector sequence is formed by using joint point features extracted from each group of human skeleton data, and action feature vector sequences corresponding to the plurality of groups of human skeleton data are stored, so as to recognize an action based on the stored action feature vector sequences.
- Embodiments of this application provide an image coding method, an action recognition method, and a computer device, so as to resolve a problem that storage resources and calculation resources are greatly consumed because of a relatively large data amount of human skeleton data in the related art.
- an image coding method includes: obtaining a plurality of groups of human skeleton data of performing a target action, where each group of human skeleton data includes joint point data of performing the target action; extracting, based on joint point data in the plurality of groups of human skeleton data, a motion feature matrix corresponding to the plurality of groups of human skeleton data and encoding the motion feature matrix to obtain a motion feature image.
- the plurality of groups of human skeleton data are encoded as one motion feature image, thereby reducing consumption of storage resources and calculation resources.
- the motion feature image includes a linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- coordinates of a joint point in a first group of human skeleton data in the three-dimensional coordinate system may be subtracted from coordinates of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain linear velocity units corresponding to the first group of human skeleton data; and further, a linear velocity matrix corresponding to the plurality of groups of human skeleton data is formed by using all the obtained linear velocity units.
- the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data in the plurality of groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data.
- the linear velocity matrix may be preprocessed, a plurality of linear velocity units in the preprocessed linear velocity matrix are encoded to obtain a plurality of linear velocity pixel frames, and further, a linear velocity image is formed by using the plurality of linear velocity pixel frames.
- the preprocessing includes size normalization or the like.
- the linear velocity matrix is preprocessed, a plurality of linear velocity units in the preprocessed linear velocity matrix are encoded to obtain a plurality of linear velocity pixel frames, a plurality of key linear velocity pixel frames are extracted from the plurality of linear velocity pixel frames, and further, a linear velocity image is formed by using the plurality of key linear velocity pixel frames.
- the key linear velocity pixel frame is a pixel frame that includes various action information and that can distinguish between different actions.
- a maximum linear velocity element value and a minimum linear velocity element value in the linear velocity matrix may be obtained, and then normalization processing is performed on each linear velocity element value in the linear velocity matrix based on the maximum linear velocity element value and the minimum linear velocity element value, to obtain a normalized linear velocity matrix.
- Each linear velocity element value in the normalized linear velocity matrix is between a first value and a second value.
- the first value is less than the second value.
- the first value may be 0, and the second value may be 255.
- coordinates of a joint point in each preprocessed linear velocity unit in the three-dimensional coordinate system are used as image channels, and a plurality of preprocessed linear velocity units are encoded to obtain a plurality of linear velocity pixel frames.
- the image channels are primary colors used to form pixels in an image, and include a red channel, a green channel, a blue channel, and the like.
- coordinates in the three-dimensional coordinate system are used as image channels to encode an image, thereby providing a method for encoding, as an image, a motion feature matrix represented by numbers.
- linear velocity energy change values of the plurality of linear velocity pixel frames are calculated based on the preprocessed linear velocity matrix, and then the plurality of key linear velocity pixel frames are extracted from the plurality of linear velocity pixel frames in descending order of the linear velocity energy change values.
- a quadratic sum of coordinates of each joint point in a first linear velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system are added up to obtain a linear velocity energy value of the first linear velocity pixel frame;
- a quadratic sum of coordinates of each joint point in a second linear velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system are added up to obtain a linear velocity energy value of the second linear velocity pixel frame; and further, the linear velocity energy value of the first linear velocity pixel frame is subtracted from the linear velocity energy value of the second linear velocity pixel frame to obtain a linear velocity energy change value of the first linear velocity pixel frame.
- the first linear velocity pixel frame and the second linear velocity pixel frame are any two adjacent linear velocity pixel frames, and the first linear velocity pixel frame is a previous linear velocity pixel frame of the second linear velocity pixel frame.
- direction angles of joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system may be calculated based on coordinates of the joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system; direction angles of a joint point in a first group of human skeleton data in the three-dimensional coordinate system are subtracted from direction angles of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain angular velocity units; and further, an angular velocity matrix corresponding to the plurality of groups of human skeleton data is formed by using all the obtained angular velocity units.
- the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data.
- the angular velocity matrix may be preprocessed, a plurality of angular velocity units in the preprocessed linear velocity matrix are encoded to obtain a plurality of angular velocity pixel frames, and further, an angular velocity image is formed by using the plurality of angular velocity pixel frames.
- the angular velocity matrix may be preprocessed, a plurality of angular velocity units in the preprocessed angular velocity matrix are encoded to obtain a plurality of angular velocity pixel frames, then a plurality of key angular velocity pixel frames are extracted from the plurality of angular velocity pixel frames, and further, an angular velocity image is formed by using the plurality of key angular velocity pixel frames.
- a maximum angular velocity element value and a minimum angular velocity element value in the angular velocity matrix may be obtained, and normalization processing is performed on each angular velocity element value in the angular velocity matrix based on the maximum angular velocity element value and the minimum angular velocity element value, to obtain a normalized angular velocity matrix.
- Each angular velocity element value in the normalized angular velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- direction angles of a joint point in each preprocessed angular velocity unit in the three-dimensional coordinate system may be used as image channels, and a plurality of preprocessed angular velocity units are encoded to obtain a plurality of angular velocity pixel frames.
- angular velocity energy change values of the plurality of angular velocity pixel frames may be calculated based on the preprocessed angular velocity matrix, and then the plurality of key angular velocity pixel frames are extracted from the plurality of angular velocity pixel frames in descending order of the angular velocity energy change values.
- a quadratic sum of direction angles of each joint point in a first angular velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the direction angles of all the joint points in the three-dimensional system are added up to obtain an angular velocity energy value of the first angular velocity pixel frame;
- a quadratic sum of direction angles of each joint point in a second angular velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system are added up to obtain an angular velocity energy value of the second angular velocity pixel frame, where the first angular velocity pixel frame and the second angular velocity pixel frame are any two adjacent angular velocity pixel frames, and the first angular velocity pixel frame is a previous angular velocity pixel frame of the second angular velocity pixel frame; and further, the linear velocity energy value of the first angular velocity pixel frame is sub
- At least one motion feature image of the target action and an identifier of the target action are input into a CNN (Convolutional Neural Network) model, and training is performed to obtain an action recognition model.
- CNN Convolutional Neural Network
- an action recognition model training method includes: obtaining a plurality of reference motion feature images respectively corresponding to a plurality of types of actions; and inputting the plurality of reference motion feature images and identifiers of the plurality of actions into a CNN model, and performing training to obtain an action recognition model.
- Each reference motion feature image may be obtained by using the method described in the first aspect.
- an action recognition method includes: obtaining a to-be-recognized motion feature image, and recognizing the to-be-recognized motion feature image based on an action recognition model, to obtain a recognition result.
- the to-be-recognized motion feature image is an image obtained by encoding a plurality of groups of to-be-recognized human skeleton data of a to-be-recognized action.
- the action recognition model is obtained through training based on a plurality of reference motion feature images respectively corresponding to a plurality of types of actions and identifiers of the plurality of types of actions, and the recognition result is used to indicate an action type of the to-be-recognized action.
- the to-be-recognized motion feature image is obtained, and then the to-be-recognized motion feature image is recognized based on the established action recognition model, so as to obtain the recognition result of the to-be-recognized action. Because a data amount of the motion feature image is smaller than a data amount of a plurality of action feature vector sequences, storage resources and calculation resources are greatly saved while recognition accuracy is ensured.
- the plurality of groups of to-be-recognized human skeleton data of performing the to-be-recognized action are collected; a to-be-recognized motion feature matrix corresponding to the plurality of groups of to-be-recognized human skeleton data is extracted based on joint point data in the plurality of groups of to-be-recognized human skeleton data and further, the to-be-recognized motion feature matrix is encoded to obtain the to-be-recognized motion feature image.
- Each group of to-be-recognized human skeleton data includes joint point data of performing the to-be-recognized action.
- the to-be-recognized motion feature matrix includes a to-be-recognized linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- coordinates of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system are subtracted from coordinates of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized linear velocity units of the first group of to-be-recognized human skeleton data
- a to-be-recognized linear velocity matrix corresponding to the plurality of groups of to-be-recognized human skeleton data is formed by using all the obtained to-be-recognized linear velocity units.
- the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data in the plurality of groups of to-be-recognized human skeleton data, and the first group of to-be-recognized human skeleton data is a previous group of to-be-recognized human skeleton data of the second group of to-be-recognized human skeleton data.
- the to-be-recognized linear velocity matrix may be preprocessed, a plurality of linear velocity units in the preprocessed to-be-recognized linear velocity matrix are encoded to obtain a plurality of to-be-recognized linear velocity pixel frames, and further, a to-be-recognized linear velocity image is formed by using the plurality of to-be-recognized linear velocity pixel frames.
- the to-be-recognized linear velocity matrix may be preprocessed, and a plurality of to-be-recognized linear velocity units in the preprocessed to-be-recognized linear velocity matrix are encoded to obtain a plurality of to-be-recognized linear velocity pixel frames; then a plurality of to-be-recognized key linear velocity pixel frames are extracted from the plurality of to-be-recognized linear velocity pixel frames; and further, a to-be-recognized linear velocity image is formed by using the plurality of to-be-recognized key linear velocity pixel frames.
- a maximum to-be-recognized linear velocity element value and a minimum to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix are obtained, and then normalization processing is performed on each to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix based on the maximum to-be-recognized linear velocity element value and the minimum to-be-recognized linear velocity element value, to obtain a normalized to-be-recognized linear velocity matrix.
- Each to-be-recognized linear velocity element value in the normalized to-be-recognized linear velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value may be 0, and the second value may be 255.
- coordinates of a joint point in each preprocessed to-be-recognized linear velocity unit in the three-dimensional coordinate system are used as image channels, and a plurality of preprocessed to-be-recognized linear velocity units are encoded to obtain a plurality of to-be-recognized linear velocity pixel frames.
- linear velocity energy change values of the plurality of to-be-recognized linear velocity pixel frames may be calculated based on the preprocessed to-be-recognized linear velocity matrix, and then the plurality of to-be-recognized key linear velocity pixel frames are extracted from the plurality of to-be-recognized linear velocity pixel frames in descending order of the linear velocity energy change values.
- a quadratic sum of coordinates of each joint point in a first to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system are added up to obtain a linear velocity energy value of the first to-be-recognized linear velocity pixel frame;
- a quadratic sum of coordinates of each joint point in a second to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system is further calculated, and the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system are added up to obtain a linear velocity energy value of the second to-be-recognized linear velocity pixel frame; and the linear velocity energy value of the first to-be-recognized linear velocity pixel frame is subtracted from the linear velocity energy value of the second to-be-recognized linear velocity pixel frame to obtain a linear velocity energy change value
- the first to-be-recognized linear velocity pixel frame and the second to-be-recognized linear velocity pixel frame are any two adjacent to-be-recognized linear velocity pixel frames, and the first to-be-recognized linear velocity pixel frame is a previous to-be-recognized linear velocity pixel frame of the second to-be-recognized linear velocity pixel frame.
- direction angles of a joint point in the plurality of groups of to-be-recognized human skeleton data in the three-dimensional coordinate system may be calculated based on a coordinate matrix corresponding to the plurality of groups of to-be-recognized human skeleton data; direction angles of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system are subtracted from direction angles of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized angular velocity units; and further, a to-be-recognized angular velocity matrix corresponding to the plurality of groups of to
- the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data, and the first group of to-be-recognized human skeleton data is a previous group of to-be-recognized human skeleton data of the second group of to-be-recognized human skeleton data.
- the to-be-recognized angular velocity matrix is preprocessed, a plurality of to-be-recognized angular velocity units in the preprocessed to-be-recognized linear velocity matrix are encoded to obtain a plurality of to-be-recognized angular velocity pixel frames, and further, a to-be-recognized angular velocity image is formed by using the plurality of to-be-recognized angular velocity pixel frames.
- the to-be-recognized angular velocity matrix may be preprocessed, and a plurality of to-be-recognized angular velocity units in the preprocessed to-be-recognized angular velocity matrix are encoded to obtain a plurality of to-be-recognized angular velocity pixel frames; then a plurality of to-be-recognized key angular velocity pixel frames are extracted from the plurality of to-be-recognized angular velocity pixel frames; and further, a to-be-recognized angular velocity image is formed by using the plurality of to-be-recognized key angular velocity pixel frames.
- a maximum to-be-recognized angular velocity element value and a minimum to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix are obtained, and normalization processing is performed on each to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix based on the maximum to-be-recognized angular velocity element value and the minimum to-be-recognized angular velocity element value, to obtain a normalized to-be-recognized angular velocity matrix.
- Each to-be-recognized angular velocity element value in the normalized to-be-recognized angular velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- direction angles of a joint point in each preprocessed to-be-recognized angular velocity unit in the three-dimensional coordinate system are used as image channels, and a plurality of preprocessed to-be-recognized angular velocity units are encoded to obtain a plurality of to-be-recognized angular velocity pixel frames.
- angular velocity energy change values of the plurality of to-be-recognized angular velocity pixel frames may be calculated based on the preprocessed to-be-recognized angular velocity matrix, and then the plurality of to-be-recognized key angular velocity pixel frames are extracted from the plurality of to-be-recognized angular velocity pixel frames in descending order of the angular velocity energy change values.
- a quadratic sum of direction angles of each joint point in a first to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system may be calculated, and the quadratic sums of the direction angles of all the joint points in the three-dimensional system are added up to obtain an angular velocity energy value of the first to-be-recognized angular velocity pixel frame;
- a quadratic sum of direction angles of each joint point in a second to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system is calculated, and the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system are added up to obtain an angular velocity energy value of the second to-be-recognized angular velocity pixel frame; and further, the linear velocity energy value of the first to-be-recognized angular velocity pixel frame is subtracted from the linear velocity energy value of the second to-be-recogn
- the first to-be-recognized angular velocity pixel frame and the second to-be-recognized angular velocity pixel frame are any two adjacent to-be-recognized angular velocity pixel frames, and the first to-be-recognized angular velocity pixel frame is a previous to-be-recognized angular velocity pixel frame of the second to-be-recognized angular velocity pixel frame.
- a zero padding operation is further performed on the to-be-recognized motion feature image, and the to-be-recognized motion feature image obtained through the zero padding operation is recognized based on the action recognition model, to obtain the recognition result.
- the action recognition model used in the third aspect may be obtained through training by using the method in the second aspect.
- an image coding apparatus includes:
- a data obtaining unit configured to obtain a plurality of groups of human skeleton data of performing a target action, where each group of human skeleton data includes joint point data of performing the target action:
- a feature extraction unit configured to extract, based on joint point data in the plurality of groups of human skeleton data, a motion feature matrix corresponding to the plurality of groups of human skeleton data;
- a feature coding unit configured to encode a motion feature matrix to obtain a motion feature image.
- the motion feature matrix includes a linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- the feature extraction unit is configured to: subtract coordinates of a joint point in a first group of human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain linear velocity units corresponding to the first group of human skeleton data, where the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data in the plurality of groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data; and form, by using all the obtained linear velocity units, a linear velocity matrix corresponding to the plurality of groups of human skeleton data.
- the feature coding unit is configured to: preprocess the linear velocity matrix
- the feature coding unit is configured to: preprocess the linear velocity matrix
- the feature coding unit is configured to: obtain a maximum linear velocity element value and a minimum linear velocity element value in the linear velocity matrix; and perform normalization processing on each linear velocity element value in the linear velocity matrix based on the maximum linear velocity element value and the minimum linear velocity element value, to obtain a normalized linear velocity matrix.
- Each linear velocity element value in the normalized linear velocity matrix is between a first value and a second value.
- the first value is less than the second value.
- the feature coding unit is configured to: use coordinates of a joint point in each preprocessed linear velocity unit in the three-dimensional coordinate system as image channels, and encode a plurality of preprocessed linear velocity units to obtain a plurality of linear velocity pixel frames.
- the feature coding unit is configured to: calculate linear velocity energy change values of the plurality of linear velocity pixel frames based on the preprocessed linear velocity matrix; and extract the plurality of key linear velocity pixel frames from the plurality of linear velocity pixel frames in descending order of the linear velocity energy change values.
- the feature coding unit is configured to: calculate a quadratic sum of coordinates of each joint point in a first linear velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the first linear velocity pixel frame; calculate a quadratic sum of coordinates of each joint point in a second linear velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the second linear velocity pixel frame, where the first linear velocity pixel frame and the second linear velocity pixel frame are any two adjacent linear velocity pixel frames, and the first linear velocity pixel frame is a previous linear velocity pixel frame of the second linear velocity pixel frame; and subtract the linear velocity energy value of the first linear velocity pixel frame from the linear velocity energy value of the second linear velocity pixel frame to obtain a linear velocity energy change
- the motion feature matrix includes an angular velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- the feature coding unit is configured to: calculate direction angles of joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system based on coordinates of the joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system; subtract direction angles of a joint point in a first group of human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain angular velocity units, where the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data; and form, by using all the obtained angular velocity units, an angular velocity matrix corresponding to the plurality of groups of human skeleton data.
- the feature coding unit is configured to: preprocess the angular velocity matrix; encode a plurality of angular velocity units in the preprocessed linear velocity matrix to obtain a plurality of angular velocity pixel frames; and form an angular velocity image by using the plurality of angular velocity pixel frames.
- the feature coding unit is configured to: preprocess the angular velocity matrix; encode a plurality of angular velocity units in the preprocessed angular velocity matrix to obtain a plurality of angular velocity pixel frames; extract a plurality of key angular velocity pixel frames from the plurality of angular velocity pixel frames; and form an angular velocity image by using the plurality of key angular velocity pixel frames.
- the feature coding unit is configured to: obtain a maximum angular velocity element value and a minimum angular velocity element value in the angular velocity matrix; and perform normalization processing on each angular velocity element value in the angular velocity matrix based on the maximum angular velocity element value and the minimum angular velocity element value, to obtain a normalized angular velocity matrix.
- Each angular velocity element value in the normalized angular velocity matrix is between a first value and a second value.
- the first value is less than the second value.
- the feature coding unit is configured to: use direction angles of a joint point in each preprocessed angular velocity unit in the three-dimensional coordinate system as image channels, and encode a plurality of preprocessed angular velocity units to obtain a plurality of angular velocity pixel frames.
- the feature coding unit is configured to: calculate angular velocity energy change values of the plurality of angular velocity pixel frames based on the preprocessed angular velocity matrix; and extract the plurality of key angular velocity pixel frames from the plurality of angular velocity pixel frames in descending order of the angular velocity energy change values.
- the feature coding unit is configured to: calculate a quadratic sum of direction angles of each joint point in a first angular velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the direction angles of all the joint points in the three-dimensional system to obtain an angular velocity energy value of the first angular velocity pixel frame; calculate a quadratic sum of direction angles of each joint point in a second angular velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain an angular velocity energy value of the second angular velocity pixel frame, where the first angular velocity pixel frame and the second angular velocity pixel frame are any two adjacent angular velocity pixel frames, and the first angular velocity pixel frame is a previous angular velocity pixel frame of the second angular velocity pixel frame; and
- the apparatus further includes:
- a model training module configured to: input at least one motion feature image of the target action and an identifier of the target action into a CNN model, and perform training to obtain an action recognition model.
- an action recognition model training apparatus includes:
- an image obtaining unit configured to obtain a plurality of reference motion feature images respectively corresponding to a plurality of types of actions, where each reference motion feature image is obtained by using the method in the first aspect
- a model training unit configured to: input the plurality of reference motion feature images and identifiers of the plurality of actions into a convolutional neural network CNN model, and perform training to obtain an action recognition model.
- an action recognition apparatus includes:
- an image obtaining unit configured to: obtain a to-be-recognized motion feature image, where the to-be-recognized motion feature image is an image obtained by encoding a plurality of groups of to-be-recognized human skeleton data of a to-be-recognized action;
- an image recognition unit configured to recognize the to-be-recognized motion feature image based on an action recognition model, to obtain a recognition result, where the action recognition model is obtained through training based on a plurality of reference motion feature images respectively corresponding to a plurality of types of actions and identifiers of the plurality of types of actions, and the recognition result is used to indicate an action type of the to-be-recognized action.
- the image obtaining unit is configured to: collect the plurality of groups of to-be-recognized human skeleton data of performing the to-be-recognized action, where each group of to-be-recognized human skeleton data includes joint point data of performing the to-be-recognized action; extract, based on joint point data in the plurality of groups of to-be-recognized human skeleton data, a to-be-recognized motion feature matrix corresponding to the plurality of groups of to-be-recognized human skeleton data; and encode the to-be-recognized motion feature matrix to obtain a to-be-recognized motion feature image.
- the to-be-recognized motion feature matrix includes a to-be-recognized linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- the image collection unit is configured to: subtract coordinates of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized linear velocity units of the first group of to-be-recognized human skeleton data, where the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data in the plurality of groups of to-be-recognized human ske
- the image collection unit is configured to: preprocess the to-be-recognized linear velocity matrix; encode a plurality of linear velocity units in the preprocessed to-be-recognized linear velocity matrix to obtain a plurality of to-be-recognized linear velocity pixel frames; and form a to-be-recognized linear velocity image by using the plurality of to-be-recognized linear velocity pixel frames.
- the image collection unit is configured to: preprocess the to-be-recognized linear velocity matrix; encode a plurality of to-be-recognized linear velocity units in the preprocessed to-be-recognized linear velocity matrix to obtain a plurality of to-be-recognized linear velocity pixel frames; extract a plurality of to-be-recognized key linear velocity pixel frames from the plurality of to-be-recognized linear velocity pixel frames; and form a to-be-recognized linear velocity image by using the plurality of to-be-recognized key linear velocity pixel frames.
- the image collection unit is configured to: obtain a maximum to-be-recognized linear velocity element value and a minimum to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix; and perform normalization processing on each to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix based on the maximum to-be-recognized linear velocity element value and the minimum to-be-recognized linear velocity element value, to obtain a normalized to-be-recognized linear velocity matrix, where each to-be-recognized linear velocity element value in the normalized to-be-recognized linear velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the image collection unit is configured to: use coordinates of a joint point in each preprocessed to-be-recognized linear velocity unit in the three-dimensional coordinate system as image channels, and encode a plurality of preprocessed to-be-recognized linear velocity units to obtain a plurality of to-be-recognized linear velocity pixel frames.
- the image collection unit is configured to: calculate linear velocity energy change values of the plurality of to-be-recognized linear velocity pixel frames based on the preprocessed to-be-recognized linear velocity matrix; and extract the plurality of to-be-recognized key linear velocity pixel frames from the plurality of to-be-recognized linear velocity pixel frames in descending order of the linear velocity energy change values.
- the image collection unit is configured to: calculate a quadratic sum of coordinates of each joint point in a first to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the first to-be-recognized linear velocity pixel frame; calculate a quadratic sum of coordinates of each joint point in a second to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the second to-be-recognized linear velocity pixel frame, where the first to-be-recognized linear velocity pixel frame and the second to-be-recognized linear velocity pixel frame are any two adjacent to-be-recognized linear velocity pixel frames, and the
- the to-be-recognized motion feature matrix includes a to-be-recognized angular velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system
- the image collection unit is configured to: calculate direction angles of a joint point in the plurality of groups of to-be-recognized human skeleton data in the three-dimensional coordinate system based on a coordinate matrix corresponding to the plurality of groups of to-be-recognized human skeleton data; subtract direction angles of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized angular velocity units, where the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data, and the first group of to-be-recognized human skeleton data is a previous group of to-be-recog
- the image collection unit is configured to: preprocess the to-be-recognized angular velocity matrix; encode a plurality of to-be-recognized angular velocity units in the preprocessed to-be-recognized linear velocity matrix to obtain a plurality of to-be-recognized angular velocity pixel frames; and form a to-be-recognized angular velocity image by using the plurality of to-be-recognized angular velocity pixel frames.
- the image collection unit is configured to: preprocess the to-be-recognized angular velocity matrix; encode a plurality of to-be-recognized angular velocity units in the preprocessed to-be-recognized angular velocity matrix to obtain a plurality of to-be-recognized angular velocity pixel frames; extract a plurality of to-be-recognized key angular velocity pixel frames from the plurality of to-be-recognized angular velocity pixel frames; and form a to-be-recognized angular velocity image by using the plurality of to-be-recognized key angular velocity pixel frames.
- the image collection unit is configured to: obtain a maximum to-be-recognized angular velocity element value and a minimum to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix; and perform normalization processing on each to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix based on the maximum to-be-recognized angular velocity element value and the minimum to-be-recognized angular velocity element value, to obtain a normalized to-be-recognized angular velocity matrix, where each to-be-recognized angular velocity element value in the normalized to-be-recognized angular velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the image collection unit is configured to: use direction angles of a joint point in each preprocessed to-be-recognized angular velocity unit in the three-dimensional coordinate system as image channels, and encode a plurality of preprocessed to-be-recognized angular velocity units to obtain a plurality of to-be-recognized angular velocity pixel frames.
- the image collection unit is configured to: calculate angular velocity energy change values of the plurality of to-be-recognized angular velocity pixel frames based on the preprocessed to-be-recognized angular velocity matrix; and extract the plurality of to-be-recognized key angular velocity pixel frames from the plurality of to-be-recognized angular velocity pixel frames in descending order of the angular velocity energy change values.
- the image collection unit is configured to: calculate a quadratic sum of direction angles of each joint point in a first to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the direction angles of all the joint points in the three-dimensional system to obtain an angular velocity energy value of the first to-be-recognized angular velocity pixel frame; calculate a quadratic sum of direction angles of each joint point in a second to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain an angular velocity energy value of the second to-be-recognized angular velocity pixel frame, where the first to-be-recognized angular velocity pixel frame and the second to-be-recognized angular velocity pixel frame are any two adjacent to-
- the apparatus further includes:
- a zero-padding unit configured to perform a zero padding operation on the to-be-recognized motion feature image
- an image recognition unit configured to recognize, based on the action recognition model, the to-be-recognized motion feature image obtained through the zero padding operation, to obtain the recognition result.
- the action recognition model according to the sixth aspect is obtained by using the method in the first aspect.
- a computer device including a processor, a memory, a communications interface, and a bus.
- the memory, the processor, and the communications interface are connected to each other by using the bus, the memory is configured to store a computer instruction, and when the computer device runs, the processor runs the computer instruction, so that the computer device performs the image coding method in the first aspect.
- a computer device including a processor, a memory, a communications interface, and a bus.
- the memory, the processor, and the communications interface are connected to each other by using the bus, the memory is configured to store a computer instruction, and when the computer device runs, the processor runs the computer instruction, so that the computer device performs the action recognition model training method in the second aspect.
- a computer device including a processor, a memory, a communications interface, and a bus.
- the memory, the processor, and the communications interface are connected to each other by using the bus, the memory is configured to store a computer instruction, and when the computer device runs, the processor runs the computer instruction, so that the computer device performs the action recognition method in the third aspect.
- a computer-readable storage medium stores at least one instruction, and when the instruction is run on a computer, the computer device is enabled to perform the image coding method in the first aspect.
- a computer-readable storage medium stores at least one instruction, and when the instruction is run on a computer device, the computer device is enabled to perform the action recognition model training method in the second aspect.
- a computer-readable storage medium stores at least one instruction, and when the instruction is run on a computer device, the computer device is enabled to perform the action recognition method in the third aspect.
- a computer program product that includes an instruction is provided.
- the instruction is run on a computer device, the computer device is enabled to perform the method in the first aspect.
- a computer program product that includes an instruction is provided.
- the instruction is run on a computer device, the computer device is enabled to perform the method in the second aspect.
- a computer program product that includes an instruction is provided.
- the instruction is run on a computer device, the computer device is enabled to perform the method in the third aspect.
- FIG. 1 shows an implementation environment of an image coding method, an action recognition model training method, and an action recognition method according to an embodiment of this application:
- FIG. 2 is a schematic diagram of an application scenario of an action recognition method according to an embodiment of this application.
- FIG. 3 is a flowchart of an image coding method according to an embodiment of this application.
- FIG. 4 is a schematic diagram of main joint points of a human skeleton according to an embodiment of this application.
- FIG. 5 shows a matrix formed by data of M main joint points according to an embodiment of this application:
- FIG. 6 is a schematic diagram of a linear velocity matrix according to an embodiment of this application.
- FIG. 7 is a schematic diagram of a spatial angle of a three-dimensional coordinate system according to an embodiment of this application.
- FIG. 8 is a schematic diagram of an angular velocity matrix according to an embodiment of this application.
- FIG. 9 is a flowchart of an action recognition model training method according to an embodiment of this application.
- FIG. 10 is a schematic diagram of a CNN model according to an embodiment of this application.
- FIG. 11 is a flowchart of an action recognition method according to an embodiment of this application.
- FIG. 12 is a schematic structural diagram of an image coding apparatus according to an embodiment of this application.
- FIG. 13 is a schematic structural diagram of an action recognition model training apparatus according to an embodiment of this application.
- FIG. 14 is a schematic structural diagram of an action recognition apparatus according to an embodiment of this application.
- FIG. 15 is a schematic structural diagram of a computer device according to an embodiment of this application.
- Step 1 Collect, based on a motion sensing device, human skeleton data generated when a user performs a target action.
- the motion sensing device is a collection device that can obtain at least three-dimensional (3D) spatial location information and angle information of each joint point of a human skeleton.
- the human skeleton data includes data of each joint point that is collected by the motion sensing collection device.
- Step 2 Extract data of a main joint point from the human skeleton data.
- the main joint point is a joint point that plays a key role in action or behavior recognition.
- Step 3 Extract an action feature from the data of the main joint point, and form an action feature vector sequence by using the extracted action feature.
- the action features include features such as a position, an angle, a velocity, a velocity of a main joint point, and an included angle between main joints.
- the action feature vector sequence is a sequence of feature vectors formed by action features.
- Step 4 Perform normalization processing on the action feature vector sequence to obtain a normalized action feature vector sequence.
- Step 5 Store a correspondence between the normalized action feature vector sequence and the target action as an action sample to an action sample template library.
- Step 6 Collect human skeleton data of the user in real time based on the motion sensing device, process the human skeleton data according to the method in step 2 to step 5, to obtain a to-be-recognized action feature vector sequence, and then calculate, by using a dynamic time warping algorithm, a distance value between the to-be-recognized action feature vector sequence and each normalized action feature vector sequence stored in the action sample template library.
- Step 7 Calculate, based on the distance value calculated in step 6, a similarity between the to-be-recognized action feature vector sequence and each normalized action feature vector sequence in the action sample template library, and then recognize an action or a behavior of the user based on the similarity.
- a plurality of groups of reference human skeleton data of performing each type of action are encoded based on the provided image coding method, to obtain a plurality of reference motion feature images, the plurality of reference feature images and identifiers of a plurality of actions are input into a CNN model based on a provided action recognition model training method, to obtain an action recognition model through training, and then the users action is recognized based on a provided action recognition method and the action recognition model.
- FIG. 1 shows an implementation environment of an image coding method, an action recognition model training method, and an action recognition method according to an embodiment of this application.
- the implementation environment includes an image coding device 101 , a model training device 102 , and an action recognition device 103 .
- the image coding device 101 can extract, based on a plurality of groups of human skeleton data collected by a motion sensing collection device, a motion feature matrix corresponding to the plurality of groups of human skeleton data, and then encode the motion feature matrix to obtain a motion feature image.
- the image coding device 101 may be a server, or may be a terminal. This embodiment of this application sets no specific limitation on the image coding device 101 .
- the motion sensing collection device may be a kinect camera or the like.
- the kinect camera can provide a real-time depth image according to a structural optical principle.
- human skeleton data can be obtained by using a random forest algorithm.
- the random forest algorithm is a classifier that includes a plurality of decision trees, and a class output by using the random forest algorithm is determined by a class of an individual tree.
- the model training device 102 has a model training capability, and may perform training based on the motion feature image obtained through encoding by the image coding device 101 , to obtain an action recognition model.
- the model training device 102 may be a server, or may be a terminal. This embodiment of this application sets no specific limitation on the model training device 102 .
- the action recognition device 103 has an image collection function, and can collect human skeleton data in real time.
- the action recognition device 103 also has a calculation processing capability, and may recognize an action of a user based on an action recognition model obtained through training by the model training device 102 and the collected human skeleton data.
- the action or behavior recognition device 103 may be paired with another motion sensing collection device, and may further have a built-in skeleton information collection unit, where the skeleton information collection unit has a same function as the motion sensing device.
- the action recognition device 103 may be a home child care-giving robot, a dangerous action monitoring device in a public place, a human-computer interaction game device, or the like.
- the action recognition device 103 in FIG. 1 is a home child care-giving robot, and a kinect camera is disposed in the home child care-giving robot, mainly to prevent accidental injury of a child at home, for example, getting an electric shock due to touching a socket or falling after climbing to a higher place.
- FIG. 2 is a diagram of a working procedure of a home child care-giving robot. Referring to FIG. 2 , the home child care-giving robot collects an action image of a child in a home environment in real time by using a kinect camera, identifies human skeleton data of the child by using an algorithm of the kinect, and then recognizes an action of the child in real time based on the human skeleton data.
- the home child care-giving robot When determining that the action of the child is a dangerous action, the home child care-giving robot sends a warning immediately to attract a family member's attention; otherwise, the home child care-giving robot continues to obtain human skeleton data by using the kinect, so as to monitor the action of the child.
- An embodiment of this application provides an image coding method.
- an image coding device performs this embodiment of this application.
- a method procedure provided in this embodiment of this application includes the following steps.
- An image coding device obtains a plurality of groups of human skeleton data of performing a target action, where each group of human skeleton data includes joint point data of performing the target action.
- the target actions include a stoop action, an action of standing at attention, an action of lifting a hand leftwards, an action of lifting a hand rightwards, and the like.
- a quantity of groups of obtained human skeleton data may be determined based on complexity of the target action. A more complex target action comes with a larger quantity of groups of obtained human skeleton data, and a simpler target action comes with a smaller quantity of groups of obtained human skeleton data.
- the plurality of groups of obtained human skeleton data may be consecutive human skeleton data, or may be a plurality of groups of inconsecutive human skeleton data selected from a plurality of groups of consecutive human skeleton data according to a preset rule.
- the preset rule may be selecting at an interval of one group of human skeleton data, selecting at an interval of two groups of human skeleton data, or the like.
- a stoop action includes 44 groups of consecutive human skeleton data.
- the 44 groups of consecutive human skeleton data may be used as 44 groups of obtained human skeleton data of performing the stoop action; or 22 groups of human skeleton data such as a first group of human skeleton data, a third group of human skeleton data, a fifth group of human skeleton data, . . . , and a 43 th group of human skeleton data may be used as 22 groups of obtained human skeleton data of performing the stoop action.
- the joint point may be all joint points included in a human skeleton, or may be a main joint point that plays a key role in action recognition.
- a preset quantity of joint points may be selected for calculation.
- the preset quantity may be 20, 25, or the like. This embodiment of the present invention sets no specific limitation on the preset quantity. Referring to FIG.
- the 20 joint points include a head joint point, a shoulder center joint point, a left shoulder joint point, a left elbow joint point, a left hand joint point, a right shoulder joint point, a right elbow joint point, a right hand joint point, a spine joint point, a hip center joint point, a left hip joint point, a right hip joint point, a left knee joint point, a right knee joint point, a left ankle joint point, a right ankle joint point, a left foot joint point, and a right foot joint point.
- the image coding device extracts, based on joint point data in the plurality of groups of human skeleton data, a motion feature matrix corresponding to the plurality of groups of human skeleton data.
- the motion feature matrix includes a linear velocity matrix or an angular velocity matrix. According to different motion feature matrices, that the image coding device extracts, based on the joint point data in the plurality of groups of human skeleton data, a motion feature matrix corresponding to the plurality of groups of human skeleton data may include but is not limited to the following two cases.
- the motion feature matrix includes a linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system.
- the image coding device may perform the following steps to extract, based on the joint point data in the plurality of groups of human skeleton data, the motion feature matrix corresponding to the plurality of groups of human skeleton data:
- the image coding device subtracts coordinates of a joint point in a first group of human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain linear velocity units corresponding to the first group of human skeleton data.
- a three-dimensional coordinate system needs to be established, and based on the established three-dimensional coordinate system, the image coding device can obtain coordinates of a joint point in each group of human skeleton data in the three-dimensional coordinate system.
- i is the i th joint point
- a value range of i is [1, M]
- t is a t th group of human skeleton data
- a value range of t is [1, N]
- p x , p y , and p z are coordinates of the i th joint point in an X-axis direction, a Y-axis direction, and a Z-axis direction.
- N groups of human skeleton data that include M joint points may be represented by using an N ⁇ M ⁇ 3 matrix.
- 44 groups of human skeleton data may be obtained, each group of human skeleton data includes 20 pieces of joint point data of performing the stoop action, and the 44 groups of human skeleton data may be represented by using a 44 ⁇ 20 ⁇ 3 matrix.
- Coordinates of 20 joint points in the 1 st group of human skeleton data in a three-dimensional coordinate system are ( ⁇ 0.6197, 0.3280, 3.1819), ( ⁇ 0.6204, 0.3820, 3.1629), ( ⁇ 0.6255, 0.6453, 3.0822), ( ⁇ 0.6614, 0.8672, 2.9904).
- coordinates of 20 joint points in the 44 th group of human skeleton data in the three-dimensional coordinate system are (0.1460, 0.2145, 2.1690), (0.1428, 0.1927, 2.1485), (0.1210, 0.5332, 2.0699), (0.1993, 0.6894, 1.9873), ( ⁇ 0.0031, 0.4087, 2.0452), ( ⁇ 0.0944, 0.1501, 2.0784), ( ⁇ 0.1050, ⁇ 0.0680, 2.1074), ( ⁇ 0.0945, ⁇ 0.1476, 2.1227), (0.2512, 0.4655, 2.2222), (0.2743, 0.2475, 2.3574), (0.3129, 0.0278, 2.5084), (0.3781, ⁇ 0.0206, 2.5579).
- the image coding device subtracts the coordinates of the joint point in the first group of human skeleton data in the three-dimensional coordinate system from the coordinates of the corresponding joint point in the second group of human skeleton data in the three-dimensional coordinate system based on an established coordinate matrix, and the linear velocity units corresponding to the first group of human skeleton data can be obtained.
- the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data in the plurality of groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data.
- coordinates of a joint point i in the r th group of human skeleton data in the three-dimensional coordinate system are P t i
- coordinates of the joint point i in the (r+1) th group of human skeleton data in the three-dimensional coordinate system are P r+1 t .
- a coordinate difference of each joint point in the r th group of human skeleton data and the (r+1) th group of human skeleton data is calculated in this manner, and a linear velocity unit corresponding to the r th group of human skeleton data is formed by using coordinate differences of M main joint points.
- the linear velocity unit is actually an M ⁇ 3 matrix.
- the image coding device forms, by using all the obtained linear velocity units, a linear velocity matrix corresponding to the plurality of groups of human skeleton data.
- N ⁇ 1 linear velocity units may be obtained by subtracting coordinates of a joint point in a previous group of human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a current group of human skeleton data in the three-dimensional coordinate system.
- Each linear velocity unit is a matrix with an order of M ⁇ 3, and therefore a linear velocity matrix corresponding to the N groups of human skeleton data may be represented by using one (N ⁇ 1) ⁇ M ⁇ 3 matrix.
- a 43 ⁇ 20 ⁇ 3 matrix shown in FIG. 6 may be obtained by subtracting coordinates of a joint point in a previous group of human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a current group of human skeleton data in the three-dimensional coordinate system.
- the motion feature matrix includes an angular velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system.
- the image coding device may perform the following steps to extract, based on the joint point data in the plurality of groups of human skeleton data, the motion feature matrix corresponding to the plurality of groups of human skeleton data:
- the image coding device calculates direction angles of joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system based on coordinates of the joint points of the plurality of groups of human skeleton data in the three-dimensional coordinate system.
- a direction angle of the joint point i in an X-axis direction is ⁇
- a direction angle of the joint point i in a Y-axis direction is ⁇
- a direction angle of the joint point i in a Z-axis direction is ⁇ .
- a value range of i is [1, M]
- a value range of t is [1, N].
- the image coding device subtracts direction angles of a joint point in a first group of human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a second group of human skeleton data in the three-dimensional coordinate system to obtain angular velocity units corresponding to the first group of human skeleton data.
- the first group of human skeleton data and the second group of human skeleton data are any two adjacent groups of human skeleton data, and the first group of human skeleton data is a previous group of human skeleton data of the second group of human skeleton data.
- direction angles of a joint point i in the r th group of human skeleton data in the three-dimensional coordinate system are ⁇ r i
- direction angles of the joint point i in the (r+1) th group of human skeleton data in the three-dimensional coordinate system are ⁇ r i
- an angle difference of the joint point i in the r th group of human skeleton data may be obtained by subtracting ⁇ r i from ⁇ r+1 i
- An angle difference of each joint point in the r th group of human skeleton data and the (r+1) th group of human skeleton data is calculated in this manner, and an angular velocity unit corresponding to the r th group of human skeleton data is formed by using angle differences of M main joint points.
- the angular velocity unit is actually an M ⁇ 3 matrix.
- the image coding device forms, by using all the obtained angular velocity units, an angular velocity matrix corresponding to the plurality of groups of human skeleton data.
- N ⁇ 1 angular velocity units may be obtained by subtracting direction angles of a joint point in a previous group of human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a current group of human skeleton data in the three-dimensional coordinate system.
- Each angular velocity unit is a matrix with an order of M ⁇ 3, and therefore an angular velocity matrix corresponding to the N groups of human skeleton data may be represented by using one (N ⁇ 1) ⁇ M ⁇ 3 matrix.
- a 43 ⁇ 20 ⁇ 3 matrix shown in FIG. 7 may be obtained by subtracting direction angles of a joint point in a previous group of human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a current group of human skeleton data in the three-dimensional coordinate system.
- the image coding device encodes the motion feature matrix to obtain a motion feature image.
- a different motion feature image is obtained through encoding by the image coding device. There may be the following two cases when the image coding device encodes different motion feature matrices.
- the motion feature matrix is a linear velocity matrix.
- the image coding device encodes the motion feature matrix to obtain a motion feature image includes but is not limited to the following steps 30311 to 30313 .
- the image coding device preprocesses the linear velocity matrix.
- That the image coding device preprocesses the linear velocity matrix includes the following steps:
- Step 1 The image coding device obtains a maximum linear velocity element value and a minimum linear velocity element value in the linear velocity matrix.
- Step 2 The image coding device performs normalization processing on each linear velocity element value in the linear velocity matrix based on the maximum linear velocity element value and the minimum linear velocity element value, to obtain a normalized linear velocity matrix.
- Each linear velocity element value in the normalized linear velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- X norm X - min ⁇ ( X ) max ⁇ ( X ) - min ⁇ ( X ) * 255.
- X is the maximum linear velocity element value in the linear velocity matrix.
- X ⁇ min(X) is equal to max(X) ⁇ min(X), and a value of X norm is 255.
- X ⁇ min(X) is equal to 0, and a value of X norm is 0.
- a value of X norm is also between the first value and the second value.
- the image coding device encodes a plurality of linear velocity units in the preprocessed linear velocity matrix to obtain a plurality of linear velocity pixel frames.
- the image coding device uses coordinates of a joint point in each preprocessed linear velocity unit in the three-dimensional coordinate system as image channels, and encodes a plurality of preprocessed linear velocity units to obtain a plurality of linear velocity pixel frames.
- the image coding device randomly specifies coordinates of a joint point in each preprocessed linear velocity unit on an X axis, a Y axis, and a Z axis as R, Q and B image channels, and further encodes each linear velocity unit based on the specified image channels to obtain a plurality of linear velocity pixel frames.
- the foregoing method may be used to encode a linear velocity matrix that includes N ⁇ 1 linear velocity units, to obtain N ⁇ 1 linear velocity pixel frames.
- the image coding device forms a linear velocity image by using the plurality of linear velocity pixel frames.
- the image coding device forms, by using the plurality of linear velocity pixel frames, the linear velocity image based on a collection time sequence of human skeleton data corresponding to each linear velocity pixel frame.
- one type of action includes 40 to 120 groups of human skeleton data. Some groups of human skeleton data include more action information. Pixel frames obtained by encoding the human skeleton data are referred to as key pixel frames. Key pixel frames are extracted from a plurality of pixel frames, helping reduce a calculation amount during subsequent image processing. A process of extracting a key pixel frame is as follows:
- Step a The image coding device calculates linear velocity energy change values of the plurality of linear velocity pixel frames based on the preprocessed linear velocity matrix.
- a value range of r is [1, N ⁇ 1]; j is a quantity of any joint points, and a value of j is [1, 20]; ⁇ i j ⁇ 2 is a quadratic sum of coordinates of a j th joint point on the X axis, Y axis, and Z axis, and actually, ⁇ i j ⁇ 2 is a quadratic sum of a linear velocity of the j th joint point; and E r is a linear velocity energy value of an r th linear velocity pixel frame, and is actually quadratic sums of linear velocities of 20 joint points in the r th group of human skeleton data.
- a derivative of a linear velocity energy function of the r th linear velocity pixel frame with respect to time is actually equal to a linear velocity energy value of an (r+1) th linear velocity pixel frame minus the linear velocity energy value of the r th linear velocity pixel frame, namely, a linear velocity energy change value. Because a last linear velocity pixel frame does not have a next linear velocity pixel frame, N ⁇ 2 linear velocity energy change values may be calculated for the first N ⁇ 1 linear velocity pixel frames.
- a larger absolute value of the derivative indicates a larger change degree of an action and a larger information amount corresponding to a pixel frame.
- the pixel frame with a large information amount is actually the key pixel frame to be obtained in this embodiment of this application.
- the image coding device calculates a quadratic sum of coordinates of each joint point in a first linear velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the first linear velocity pixel frame.
- the image coding device may calculate the linear velocity energy value of the first linear velocity pixel frame by calculating the quadratic sum of the coordinates of each joint point in the first linear velocity pixel frame in the three-dimensional coordinate system, and adding up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system.
- the image coding device calculates a quadratic sum of coordinates of each joint point in a second linear velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the second linear velocity pixel frame.
- the image coding device may calculate the linear velocity energy value of the second linear velocity pixel frame by calculating the quadratic sum of the coordinates of each joint point in the second linear velocity pixel frame in the three-dimensional coordinate system, and adding up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system.
- the first linear velocity pixel frame and the second linear velocity pixel frame are any two adjacent linear velocity pixel frames, and the first linear velocity pixel frame is a previous linear velocity pixel frame of the second linear velocity pixel frame.
- the image coding device subtracts the linear velocity energy value of the first linear velocity pixel frame from the linear velocity energy value of the second linear velocity pixel frame to obtain a linear velocity energy change value of the first linear velocity pixel frame.
- Step b The image coding device extracts a plurality of key linear velocity pixel frames from the plurality of linear velocity pixel frames in descending order of the linear velocity energy change values.
- the image coding device may sort the linear velocity energy change values of the plurality of linear velocity pixel frames in descending order of the linear velocity energy change values, and then extract the plurality of key linear velocity pixel frames from the plurality of linear velocity pixel frames based on a sorting result.
- the image coding device Based on the extracted plurality of key linear velocity pixel frames, the image coding device encodes the plurality of key linear velocity pixel frames in a time sequence to obtain a linear velocity picture. For example, for a stoop action, 44 groups of human skeleton data may be obtained and encoded as 43 linear velocity pixel frames. According to the linear velocity energy function, 32 key linear velocity pixel frames are extracted from the 43 linear velocity pixel frames, and are finally encoded as one linear velocity image that includes 32-20 pixels.
- the motion feature matrix is an angular velocity matrix.
- the image coding device encodes the motion feature matrix to obtain a motion feature image includes but is not limited to the following steps 30321 to 30323 .
- the image coding device preprocesses the angular velocity matrix.
- That the image coding device preprocesses the angular velocity matrix includes the following steps:
- Step 1 The image coding device obtains a maximum angular velocity element value and a minimum angular velocity element value in the angular velocity matrix.
- Step 2 The image coding device performs normalization processing on each angular velocity element value in the angular velocity matrix based on the maximum angular velocity element value and the minimum angular velocity element value, to obtain a normalized angular velocity matrix.
- Each angular velocity element value in the normalized angular velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- Y norm Y - min ⁇ ( Y ) max ⁇ ( Y ) - min ⁇ ( Y ) * 255.
- Y ⁇ min(Y) When Y is the maximum angular velocity element value in the angular velocity matrix, Y ⁇ min(Y) is equal to max(Y) ⁇ min(Y), and a value of Y norm is 255.
- Y ⁇ min(Y) When Y is the minimum angular velocity element value in the angular velocity matrix, Y ⁇ min(Y) is equal to 0, and a value of Y norm is 0.
- a value of Y norm is also between the first value and the second value.
- the image coding device encodes a plurality of angular velocity units in the preprocessed angular velocity matrix to obtain a plurality of angular velocity pixel frames.
- the image coding device uses direction angles of a joint point in each preprocessed angular velocity unit in the three-dimensional coordinate system as image channels, and encodes a plurality of preprocessed angular velocity units to obtain a plurality of angular velocity pixel frames.
- the image coding device randomly specifies direction angles of a joint point in each preprocessed angular velocity unit on an X axis, a Y axis, and a Z axis as R, and B image channels, and further encodes each angular velocity unit based on the specified image channels to obtain a plurality of angular velocity pixel frames.
- the foregoing method may be used to encode an angular velocity matrix that includes N ⁇ 1 angular velocity units, to obtain N ⁇ 1 angular velocity pixel frames.
- the image coding device forms an angular velocity image by using the plurality of angular velocity pixel frames.
- the image coding device forms, by using the plurality of angular velocity pixel frames, the angular velocity image based on a collection time sequence of human skeleton data corresponding to each angular velocity pixel frame.
- one type of action includes 40 to 120 groups of human skeleton data. Some groups of human skeleton data include more action information. Pixel frames obtained by encoding the human skeleton data are referred to as key pixel frames. Key pixel frames are extracted from a plurality of pixel frames, helping reduce a calculation amount during subsequent image processing. A process of extracting a key pixel frame is as follows:
- Step a The image coding device calculates angular velocity energy change values of the plurality of linear velocity pixel frames based on the preprocessed angular velocity matrix.
- a value range of r is [1, N ⁇ 1]; j is a quantity of any joint points, and a value of j is [1, 20]; ⁇ i j ⁇ 2 is a quadratic sum of coordinates of a j th joint point on the X axis, Y axis, and Z axis, and actually, ⁇ i j ⁇ 2 is a quadratic sum of an angular velocity of the j th joint point; and E r is an angular velocity energy value of an r th linear velocity pixel frame, and is actually quadratic sums of angular velocities of 20 joint points in the r th group of human skeleton data.
- ⁇ E r E r+1 ⁇ E r .
- a derivative of an angular velocity energy function of the r th angular velocity pixel frame with respect to time is actually equal to an angular velocity energy value of an (r+1) th angular velocity pixel frame minus the angular velocity energy value of the r th angular velocity pixel frame, namely, an angular velocity energy change value.
- N ⁇ 2 angular velocity energy change values may be calculated for the first N ⁇ 1 angular velocity pixel frames.
- a larger absolute value of the derivative indicates a larger change degree of an action and a larger information amount corresponding to a pixel frame.
- the pixel frame with a large information amount is actually the key pixel frame to be obtained in this embodiment of this application.
- the image coding device calculates a quadratic sum of direction angles of each joint point in a first angular velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the first angular velocity pixel frame.
- the image coding device may calculate the angular velocity energy value of the first angular velocity pixel frame by calculating the quadratic sum of the direction angles of each joint point in the first angular velocity pixel frame in the three-dimensional coordinate system, and adding up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system.
- the image coding device calculates a quadratic sum of direction angles of each joint point in a second linear velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain an angular velocity energy value of the second angular velocity pixel frame.
- the image coding device may calculate the linear velocity energy value of the second angular velocity pixel frame by calculating the quadratic sum of the direction angles of each joint point in the second angular velocity pixel frame in the three-dimensional coordinate system, and adding up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system.
- the first angular velocity pixel frame and the second angular velocity pixel frame are any two adjacent angular velocity pixel frames, and the first angular velocity pixel frame is a previous angular velocity pixel frame of the second angular velocity pixel frame.
- the image coding device subtracts the angular velocity energy value of the first linear velocity pixel frame from the linear velocity energy value of the second linear velocity pixel frame to obtain an angular velocity energy change value of the first angular velocity pixel frame.
- Step b The image coding device extracts a plurality of key angular velocity pixel frames from the plurality of angular velocity pixel frames in descending order of the angular velocity energy change values.
- the image coding device may sort the angular velocity energy change values of the plurality of angular velocity pixel frames in descending order of the angular velocity energy change values, and then extract the plurality of key angular velocity pixel frames from the plurality of angular velocity pixel frames based on a sorting result.
- the image coding device may encode the plurality of key angular velocity pixel frames in a time sequence to obtain an angular velocity picture. For example, for a stoop action, 44 groups of human skeleton data may be obtained and encoded as 43 angular velocity pixel frames. According to the angular velocity energy function, 32 key angular velocity pixel frames are extracted from the 43 angular velocity pixel frames, and are finally encoded as one angular velocity image that includes 32 ⁇ 20 pixels.
- At least one motion feature image, obtained through encoding, of the target action may be used to perform training to obtain an action recognition model.
- the image coding device inputs the at least one motion feature image of the target action and an identifier of the target action into a convolutional neural network CNN model, and may perform training to obtain an action recognition model.
- the motion feature matrix corresponding to the plurality of groups of human skeleton data is extracted, and then the extracted motion feature matrix corresponding to the plurality of groups of human skeleton data is encoded as the motion feature image. Because a data amount of the motion feature image is smaller than a data amount of a plurality of action feature vector sequences, consumption of storage resources and calculation resources is reduced.
- An embodiment of this application provides an action recognition model training method. That a model training device performs this application is used as an example. Referring to FIG. 9 , a method procedure provided in this embodiment of this application includes the following steps.
- the model training device obtains a plurality of reference motion feature images respectively corresponding to a plurality of types of actions.
- the plurality of actions include a stoop action, a head lowering action, an action of lifting a hand leftwards, an action of lifting a hand rightwards, and the like.
- a reference motion feature image corresponding to each type of action may be obtained through encoding by using the image coding method shown in FIG. 3 .
- the model training device inputs the plurality of reference motion feature images and identifiers of the plurality of actions into a CNN model, and performs training to obtain an action recognition model.
- a VGG 16 network structure is used to train a CNN model.
- the network structure is shown in FIG. 10 , and includes five convolutional layers, five pooling layers, and two fully connected layers.
- One maximum pooling layer is disposed after each convolutional layer.
- a convolution operation needs to be performed on a feature (namely, a matrix) and several filtering templates at each convolutional layer or fully connected layer, and an output of the layer is an input of a next layer.
- the pooling layer is responsible for compressing an output feature, so as to ensure that the feature is highly compact.
- a weight of a filtering template, as a parameter, may be continuously updated iteratively in a training process of the CNN, and a final output of the CNN may be a multidimensional vector for encoding an original input image.
- the multidimensional vector directly corresponds to a probability description of classifying the object.
- an input of a VGG 16 network is a 224 ⁇ 224 ⁇ 3 color image, and after being input into the network, the image first passes through the first convolutional layer (convolution+ReLU).
- a convolution kernel of the layer is 3 ⁇ 3 ⁇ 64. Therefore, after passing through the first convolutional layer, the input 224 ⁇ 224 ⁇ 3 image becomes a 224 ⁇ 224 ⁇ 64 image.
- the image After the image passes through the first maximum pooling layer (max pooling), a size of the image decreases by half, and the image becomes a 112 ⁇ 112 ⁇ 64 image.
- the second convolutional layer convolution+ReLU whose convolution kernel is 3 ⁇ 3 ⁇ 128, the input 112 ⁇ 112 ⁇ 64 image becomes a 112 ⁇ 112 ⁇ 128 image.
- the image After the image passes through the second maximum pooling layer (max pooling), a size of the image decreases by half, and the image becomes a 56 ⁇ 56 ⁇ 128 image.
- a size of a convolution kernel of the third convolutional layer is 3 ⁇ 3 ⁇ 256
- a size of a convolution kernel of the fourth convolutional layer is 3 ⁇ 3 ⁇ 512
- a size of a convolution kernel of the fifth convolutional layer is 3 ⁇ 3 ⁇ 512, and so on.
- a size of an output image is 7 ⁇ 7 ⁇ 512.
- the 7 ⁇ 7 ⁇ 512 image is input into the first fully connected layer, and may be compressed into a 1 ⁇ 1 ⁇ 4096 image.
- the 1 ⁇ 1 ⁇ 40% image is input into the second fully connected layer, and may be compressed into a 1 ⁇ 1 ⁇ 1000 image. In other words, there are 1000 possible classes of the image.
- the VGG 16 network structure is designed for the 224 ⁇ 224 ⁇ 3 image, and is used to classify the image into 1000 classes.
- a 32 ⁇ 32 ⁇ 3 image is input in this application, and a quantity of action classes that need to be recognized in this application does not reach 1000. Therefore, to reduce a calculation amount and shorten a recognition time in a modeling and application process, the VGG 16 network structure needs to be modified in this application: the first fully connected layer is changed from 1 ⁇ 1 ⁇ 4096 to 1 ⁇ 1 ⁇ 512, and the second fully connected layer is changed from 1 ⁇ 1 ⁇ 1000 to a corresponding quantity of action classes.
- the second fully connected layer is changed from 1 ⁇ 1 ⁇ 1000 to 1 ⁇ 1 ⁇ 20; if 100 action classes need to be recognized, the second fully connected layer is changed from 1 ⁇ 1 ⁇ 1000 to 1 ⁇ 1 ⁇ 100.
- the model training device may adjust model parameters of the CNN model by inputting a plurality of reference motion feature images and identifiers of a plurality of actions into the CNN model, and then use the CNN model corresponding to the obtained adjusted model parameters as the action recognition model.
- the model training device inputs the plurality of reference motion feature images and the identifiers of the plurality of actions into the CNN model, and performs training to obtain the action recognition model may be inputting linear velocity images in the plurality of reference motion feature images and identifiers of corresponding actions into the CNN model, or may be inputting angular velocity images in the plurality of reference motion feature images and identifiers of corresponding actions into the CNN model, or may be inputting linear velocity images in the plurality of reference motion feature images and identifiers of corresponding actions, and angular velocity images in the plurality of reference motion feature images and identifiers of corresponding actions into the CNN model.
- An action recognition model finally obtained through training varies with an input image.
- an action recognition model obtained through training can only recognize a linear velocity image.
- an action recognition model obtained through training can only recognize an angular velocity image.
- an action recognition model obtained through training can recognize both a linear velocity image and an angular velocity image.
- the model training device before inputting the plurality of reference motion feature images and the identifiers of the plurality of actions into the CNN model, the model training device further performs a zero padding operation on the reference motion feature images.
- a zero padding operation on the reference motion feature images.
- one S ⁇ [(S ⁇ M)/2] ⁇ 3 all-zero matrix may be added to the left side and the right side of each reference motion feature image (which is actually adding grayscale pixels), so that the S ⁇ M ⁇ 3 reference motion feature image finally becomes a motion feature image whose pixel quantity is S ⁇ S.
- a 32 ⁇ 20 ⁇ 3 motion feature image may be obtained, and one 32 ⁇ 6 ⁇ 3 all-zero matrix is added to the left side and the right side of the motion feature image, so that the motion feature image finally becomes a 32 ⁇ 32 ⁇ 3 motion feature image.
- an action recognition model is obtained through training based on a reference motion feature image, so that a calculation amount in a model training process is reduced while recognition accuracy is ensured.
- An embodiment of this application provides an action recognition method. That an action recognition device performs this application is used as an example. Referring to FIG. 11 , a method procedure provided in this embodiment of this application includes the following steps.
- the action recognition device obtains a to-be-recognized motion feature image.
- the to-be-recognized motion feature image is an image obtained by encoding a plurality of groups of to-be-recognized human skeleton data of a to-be-recognized action.
- the action recognition device may perform the following steps to obtain the to-be-recognized motion feature image:
- the action recognition device collects the plurality of groups of to-be-recognized human skeleton data of performing the to-be-recognized action, where each group of to-be-recognized human skeleton data includes joint point data of performing the to-be-recognized action.
- the action recognition device extracts, based on joint point data in the plurality of groups of to-be-recognized human skeleton data, a to-be-recognized motion feature matrix corresponding to the plurality of groups of to-be-recognized human skeleton data.
- the to-be-recognized motion feature matrix includes a to-be-recognized linear velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system. That the action recognition device extracts, based on joint point data in the plurality of groups of to-be-recognized human skeleton data, a to-be-recognized motion feature matrix corresponding to the plurality of groups of to-be-recognized human skeleton data includes the following steps.
- Step 1 The action recognition device subtracts coordinates of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system from coordinates of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized linear velocity units of the first group of to-be-recognized human skeleton data.
- the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data in the plurality of groups of to-be-recognized human skeleton data, and the first group of to-be-recognized human skeleton data is a previous group of to-be-recognized human skeleton data of the second group of to-be-recognized human skeleton data.
- Step 2 The action recognition device forms, by using all the obtained to-be-recognized linear velocity units, a to-be-recognized linear velocity matrix corresponding to the plurality of groups of to-be-recognized human skeleton data.
- the to-be-recognized motion feature matrix includes a to-be-recognized angular velocity matrix
- the joint point data includes coordinates of a corresponding joint point in a three-dimensional coordinate system. That the action recognition device extracts, based on joint point data in the plurality of groups of to-be-recognized human skeleton data, a to-be-recognized motion feature matrix corresponding to the plurality of groups of to-be-recognized human skeleton data includes the following steps.
- Step 1 The action recognition device calculates direction angles of a joint point in the plurality of groups of to-be-recognized human skeleton data in the three-dimensional coordinate system based on a coordinate matrix corresponding to the plurality of groups of to-be-recognized human skeleton data.
- Step 2 The action recognition device subtracts direction angles of a joint point in a first group of to-be-recognized human skeleton data in the three-dimensional coordinate system from direction angles of the corresponding joint point in a second group of to-be-recognized human skeleton data in the three-dimensional coordinate system to obtain to-be-recognized angular velocity units.
- the first group of to-be-recognized human skeleton data and the second group of to-be-recognized human skeleton data are any two adjacent groups of to-be-recognized human skeleton data, and the first group of to-be-recognized human skeleton data is a previous group of to-be-recognized human skeleton data of the second group of to-be-recognized human skeleton data.
- Step 3 The action recognition device forms, by using all the obtained to-be-recognized angular velocity units, a to-be-recognized angular velocity matrix corresponding to the plurality of groups of to-be-recognized human skeleton data.
- the action recognition device encodes the to-be-recognized motion feature matrix to obtain a to-be-recognized motion feature image.
- the action recognition device may perform the following steps 1101211 to 1101213 to encode the to-be-recognized motion feature matrix to obtain a to-be-recognized motion feature image.
- the action recognition device preprocesses the to-be-recognized linear velocity matrix.
- the action recognition device may obtain a maximum to-be-recognized linear velocity element value and a minimum to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix, and perform normalization processing on each to-be-recognized linear velocity element value in the to-be-recognized linear velocity matrix based on the maximum to-be-recognized linear velocity element value and the minimum to-be-recognized linear velocity element value, to obtain a normalized to-be-recognized linear velocity matrix.
- Each to-be-recognized linear velocity element value in the normalized to-be-recognized linear velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- the action recognition device encodes a plurality of linear velocity units in the preprocessed to-be-recognized linear velocity matrix to obtain a plurality of to-be-recognized linear velocity pixel frames.
- the action recognition device uses coordinates of a joint point in each preprocessed to-be-recognized linear velocity unit in the three-dimensional coordinate system as image channels, and encodes a plurality of preprocessed to-be-recognized linear velocity units to obtain a plurality of to-be-recognized linear velocity pixel frames.
- the action recognition device may further extract a plurality of to-be-recognized key linear velocity pixel frames from a plurality of to-be-recognized linear velocity pixel frames, and then form a to-be-recognized linear velocity image by using the plurality of to-be-recognized key linear velocity pixel frames.
- the plurality of to-be-recognized key linear velocity pixel frames may be extracted from the plurality of to-be-recognized pixel frames by using the following steps.
- Step 1 The action recognition device calculates linear velocity energy change values of the plurality of to-be-recognized linear velocity pixel frames based on the preprocessed to-be-recognized linear velocity matrix.
- the action recognition device may calculate a quadratic sum of coordinates of each joint point in a first to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the first to-be-recognized linear velocity pixel frame.
- the action recognition device calculates a quadratic sum of coordinates of each joint point in a second to-be-recognized linear velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the coordinates of all the joint points in the three-dimensional coordinate system to obtain a linear velocity energy value of the second to-be-recognized linear velocity pixel frame. Further, the action recognition device subtracts the linear velocity energy value of the first to-be-recognized linear velocity pixel frame from the linear velocity energy value of the second to-be-recognized linear velocity pixel frame to obtain a linear velocity energy change value of the first to-be-recognized linear velocity pixel frame.
- the first to-be-recognized linear velocity pixel frame and the second to-be-recognized linear velocity pixel frame are any two adjacent to-be-recognized linear velocity pixel frames, and the first to-be-recognized linear velocity pixel frame is a previous to-be-recognized linear velocity pixel frame of the second to-be-recognized linear velocity pixel frame.
- Step 2 The action recognition device extracts a plurality of to-be-recognized key linear velocity pixel frames from the plurality of to-be-recognized linear velocity pixel frames in descending order of the linear velocity energy change values.
- the action recognition device forms a to-be-recognized linear velocity image by using the plurality of to-be-recognized linear velocity pixel frames.
- the action recognition device may form a to-be-recognized linear velocity image by using the plurality of to-be-recognized key linear velocity pixel frames.
- the action recognition device may perform the following steps 1101221 to 1101223 to encode the to-be-recognized motion feature matrix to obtain a to-be-recognized motion feature image.
- the action recognition device preprocesses the to-be-recognized angular velocity matrix.
- the action recognition device may obtain a maximum to-be-recognized angular velocity element value and a minimum to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix, and perform normalization processing on each to-be-recognized angular velocity element value in the to-be-recognized angular velocity matrix based on the maximum to-be-recognized angular velocity element value and the minimum to-be-recognized angular velocity element value, to obtain a normalized to-be-recognized angular velocity matrix.
- Each to-be-recognized angular velocity element value in the normalized to-be-recognized angular velocity matrix is between a first value and a second value, and the first value is less than the second value.
- the first value is 0, and the second value is 255.
- the action recognition device encodes a plurality of angular velocity units in the preprocessed to-be-recognized angular velocity matrix to obtain a plurality of to-be-recognized angular velocity pixel frames.
- the action recognition device uses direction angles of a joint point in each preprocessed to-be-recognized angular velocity unit in the three-dimensional coordinate system as image channels, and encodes a plurality of preprocessed to-be-recognized angular velocity units to obtain a plurality of to-be-recognized angular velocity pixel frames.
- the action recognition device may further extract a plurality of to-be-recognized key angular velocity pixel frames from a plurality of to-be-recognized angular velocity pixel frames, and then form a to-be-recognized angular velocity image by using the plurality of to-be-recognized key angular velocity pixel frames.
- the plurality of to-be-recognized key angular velocity pixel frames may be extracted from the plurality of to-be-recognized pixel frames by using the following steps.
- Step 1 The action recognition device calculates angular velocity energy change values of the plurality of to-be-recognized angular velocity pixel frames based on the preprocessed to-be-recognized angular velocity matrix.
- the action recognition device may calculate a quadratic sum of direction angles of each joint point in a first to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system, and add up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain an angular velocity energy value of the first to-be-recognized angular velocity pixel frame.
- the action recognition device calculates a quadratic sum of direction angles of each joint point in a second to-be-recognized angular velocity pixel frame in the three-dimensional coordinate system, and adds up the quadratic sums of the direction angles of all the joint points in the three-dimensional coordinate system to obtain an angular velocity energy value of the second to-be-recognized angular velocity pixel frame.
- the action recognition device subtracts the angular velocity energy value of the first to-be-recognized angular velocity pixel frame from the angular velocity energy value of the second to-be-recognized angular velocity pixel frame to obtain an angular velocity energy change value of the first to-be-recognized angular velocity pixel frame.
- the first to-be-recognized angular velocity pixel frame and the second to-be-recognized angular velocity pixel frame are any two adjacent to-be-recognized angular velocity pixel frames, and the first to-be-recognized angular velocity pixel frame is a previous to-be-recognized angular velocity pixel frame of the second to-be-recognized angular velocity pixel frame.
- Step 2 The action recognition device extracts a plurality of to-be-recognized key angular velocity pixel frames from the plurality of to-be-recognized angular velocity pixel frames in descending order of the angular velocity energy change values.
- the action recognition device forms a to-be-recognized linear velocity image by using the plurality of to-be-recognized angular velocity pixel frames.
- the action recognition device may form a to-be-recognized angular velocity image by using the plurality of to-be-recognized key angular velocity pixel frames.
- the action recognition device recognizes the to-be-recognized motion feature image based on an action recognition model, to obtain a recognition result.
- the action recognition model is obtained through training based on a plurality of reference motion feature images respectively corresponding to a plurality of types of actions and identifiers of the plurality of types of actions, and the recognition result is used to indicate an action type of the to-be-recognized action.
- the action recognition device may input the to-be-recognized motion feature image into the action recognition model to obtain the recognition result.
- the action recognition device sends a warning to warn another user, so as to avoid a dangerous event.
- the to-be-recognized motion feature image is obtained, and then the to-be-recognized motion feature image is recognized based on the established action recognition model, so as to obtain the recognition result of the to-be-recognized action. Because a data amount of the motion feature image is smaller than a data amount of a plurality of action feature vector sequences, storage resources and calculation resources are greatly saved while recognition accuracy is ensured.
- the apparatus includes a data obtaining unit 1201 , a feature extraction unit 1202 , and a feature coding unit 1203 .
- the data obtaining unit 1201 is configured to perform step 301 in FIG. 3 .
- the feature extraction unit 1202 is configured to perform step 302 in FIG. 3 .
- the feature coding unit 1203 is configured to perform step 303 in FIG. 3 .
- An embodiment of this application provides an action recognition model training apparatus.
- the apparatus includes an image obtaining unit 1301 and a model training unit 1302 .
- the image obtaining unit 1201 is configured to perform step 901 in FIG. 9 .
- the model training unit 1202 is configured to perform step 902 in FIG. 9 .
- An embodiment of this application provides an action recognition apparatus.
- the apparatus includes an image obtaining unit 1401 and an image recognition unit 1402 .
- the image obtaining unit 1401 is configured to perform step 1101 in FIG. 11 .
- the model training unit 1402 is configured to perform step 1102 in FIG. 11 .
- FIG. 15 shows a computer device 1500 used in an embodiment of this application.
- the computing device 1500 includes a processor 1501 , a memory 1502 , a communications interface 1503 , and a bus 1504 .
- the processor 1501 , the memory 1502 , and the communications interface 1503 are connected to each other by using the bus 1504 .
- the computing device 1500 may be configured to perform the image coding method in FIG. 3 , or may perform the action recognition model training method in FIG. 9 , or may perform the action recognition method in FIG. 11 .
- the memory 1502 includes a computer storage medium.
- the computer storage medium includes volatile, nonvolatile, movable, and unmovable media that are configured to store information such as a computer-readable instruction, a data structure, a program module, or other data and that are implemented in any method or technology.
- the computer storage medium includes a RAM, a ROM, an EPROM, an EEPROM, a flash memory or another solid-state storage technology, a CD-ROM, a DVD or another optical storage, a cassette, a magnetic tape, a magnetic disk storage or another magnetic storage device.
- the computer device 1500 may be further connected by using a network such as the Internet to a remote computer on a network for running.
- a network such as the Internet
- the computer device 1500 may be connected to the network by using a network interface unit 1505 connected to the bus 1504 , or may be connected to another type of network or a remote computer system (not shown) by using a network interface unit 1505 .
- An embodiment of this application provides a computer-readable storage medium.
- the storage medium includes at least one instruction.
- the instruction is run on a computer device, the computer device is enabled to perform the image coding method in FIG. 3 , the action recognition model training method in FIG. 9 , or the action recognition method in FIG. 11 .
- the foregoing function module division is merely an example for description.
- the foregoing functions may be allocated to different function modules for implementation according to a requirement, that is, an internal structure of the device is divided into different function modules, so as to complete all or some of the functions described above.
- the image coding method, the action recognition model training method, the action recognition method, the image coding apparatus, the action recognition model training apparatus, the action recognition apparatus, and the computer device provided in the foregoing embodiments belong to a same concept. For a specific implementation process, refer to the method embodiments. Details are not described herein again.
- the program may be stored in a computer-readable storage medium.
- the storage medium may be a read-only memory, a magnetic disk, an optical disc, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Robotics (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/716,665 US11825115B2 (en) | 2017-12-19 | 2022-04-08 | Image coding method, action recognition method, and action recognition apparatus |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711378734.3 | 2017-12-19 | ||
CN201711378734.3A CN109934881B (zh) | 2017-12-19 | 2017-12-19 | 图像编码方法、动作识别的方法及计算机设备 |
PCT/CN2018/120337 WO2019120108A1 (zh) | 2017-12-19 | 2018-12-11 | 图像编码方法、动作识别的方法及计算机设备 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/120337 Continuation WO2019120108A1 (zh) | 2017-12-19 | 2018-12-11 | 图像编码方法、动作识别的方法及计算机设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/716,665 Continuation US11825115B2 (en) | 2017-12-19 | 2022-04-08 | Image coding method, action recognition method, and action recognition apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200322626A1 US20200322626A1 (en) | 2020-10-08 |
US11303925B2 true US11303925B2 (en) | 2022-04-12 |
Family
ID=66984194
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/903,938 Active US11303925B2 (en) | 2017-12-19 | 2020-06-17 | Image coding method, action recognition method, and action recognition apparatus |
US17/716,665 Active US11825115B2 (en) | 2017-12-19 | 2022-04-08 | Image coding method, action recognition method, and action recognition apparatus |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/716,665 Active US11825115B2 (en) | 2017-12-19 | 2022-04-08 | Image coding method, action recognition method, and action recognition apparatus |
Country Status (4)
Country | Link |
---|---|
US (2) | US11303925B2 (zh) |
EP (1) | EP3716212A4 (zh) |
CN (1) | CN109934881B (zh) |
WO (1) | WO2019120108A1 (zh) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079535B (zh) * | 2019-11-18 | 2022-09-16 | 华中科技大学 | 一种人体骨架动作识别方法、装置及终端 |
CN111267099B (zh) * | 2020-02-24 | 2023-02-28 | 东南大学 | 基于虚拟现实的陪护机器控制系统 |
US20210312236A1 (en) * | 2020-03-30 | 2021-10-07 | Cherry Labs, Inc. | System and method for efficient machine learning model training |
CN113705284A (zh) * | 2020-05-22 | 2021-11-26 | 杭州萤石软件有限公司 | 攀爬识别方法、装置及摄像机 |
CN111754619B (zh) * | 2020-06-29 | 2024-07-02 | 武汉市东旅科技有限公司 | 骨骼空间数据采集方法、采集装置、电子设备和存储介质 |
CN113971230A (zh) * | 2020-07-24 | 2022-01-25 | 北京达佳互联信息技术有限公司 | 动作搜索方法、装置、电子设备及存储介质 |
CN112446313A (zh) * | 2020-11-20 | 2021-03-05 | 山东大学 | 一种基于改进动态时间规整算法的排球动作识别方法 |
CN112507870A (zh) * | 2020-12-08 | 2021-03-16 | 南京代威科技有限公司 | 一种通过人体骨架提取技术的行为识别方法和系统 |
US11854305B2 (en) | 2021-05-09 | 2023-12-26 | International Business Machines Corporation | Skeleton-based action recognition using bi-directional spatial-temporal transformer |
WO2023122543A1 (en) * | 2021-12-20 | 2023-06-29 | Canon U.S.A., Inc. | Apparatus and method for gesture recognition stabilization |
WO2024038517A1 (ja) * | 2022-08-17 | 2024-02-22 | 日本電気株式会社 | 映像処理システム、映像処理方法、及び画質制御装置 |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040119716A1 (en) | 2002-12-20 | 2004-06-24 | Chang Joon Park | Apparatus and method for high-speed marker-free motion capture |
CN101763647A (zh) | 2010-02-02 | 2010-06-30 | 浙江大学 | 一种基于关键帧的实时摄像机跟踪方法 |
US20100271615A1 (en) * | 2009-02-20 | 2010-10-28 | Digital Signal Corporation | System and Method for Generating Three Dimensional Images Using Lidar and Video Measurements |
US20110228976A1 (en) * | 2010-03-19 | 2011-09-22 | Microsoft Corporation | Proxy training data for human body tracking |
US20120057761A1 (en) | 2010-09-01 | 2012-03-08 | Sony Corporation | Three dimensional human pose recognition method and apparatus |
CN102663449A (zh) | 2012-03-12 | 2012-09-12 | 西安电子科技大学 | 基于最大几何流向直方图的人体运动跟踪方法 |
CN103310191A (zh) | 2013-05-30 | 2013-09-18 | 上海交通大学 | 运动信息图像化的人体动作识别方法 |
CN104573665A (zh) | 2015-01-23 | 2015-04-29 | 北京理工大学 | 一种基于改进维特比算法的连续动作识别方法 |
US20150117540A1 (en) | 2013-10-29 | 2015-04-30 | Sony Corporation | Coding apparatus, decoding apparatus, coding data, coding method, decoding method, and program |
CN104850846A (zh) | 2015-06-02 | 2015-08-19 | 深圳大学 | 一种基于深度神经网络的人体行为识别方法及识别系统 |
US20150367174A1 (en) * | 2014-06-19 | 2015-12-24 | Sumitomo Rubber Industries, Ltd. | Golf swing analysis apparatus and golf club fitting apparatus |
US20160042227A1 (en) | 2014-08-06 | 2016-02-11 | BAE Systems Information and Electronic Systems Integraton Inc. | System and method for determining view invariant spatial-temporal descriptors for motion detection and analysis |
US20160100165A1 (en) | 2014-10-03 | 2016-04-07 | Microsoft Technology Licensing, Llc | Adapting Encoding Properties |
CN105930767A (zh) | 2016-04-06 | 2016-09-07 | 南京华捷艾米软件科技有限公司 | 一种基于人体骨架的动作识别方法 |
CN106056035A (zh) | 2016-04-06 | 2016-10-26 | 南京华捷艾米软件科技有限公司 | 一种基于体感技术的幼儿园智能监控方法 |
US20160328604A1 (en) * | 2014-01-07 | 2016-11-10 | Arb Labs Inc. | Systems and methods of monitoring activities at a gaming venue |
CN106384093A (zh) | 2016-09-13 | 2017-02-08 | 东北电力大学 | 一种基于降噪自动编码器和粒子滤波的人体动作识别方法 |
US9600717B1 (en) * | 2016-02-25 | 2017-03-21 | Zepp Labs, Inc. | Real-time single-view action recognition based on key pose analysis for sports videos |
US9633282B2 (en) | 2015-07-30 | 2017-04-25 | Xerox Corporation | Cross-trained convolutional neural networks using multimodal images |
CN106647282A (zh) | 2017-01-19 | 2017-05-10 | 北京工业大学 | 一种考虑末端运动误差的六自由度机器人轨迹规划方法 |
CN106897670A (zh) | 2017-01-19 | 2017-06-27 | 南京邮电大学 | 一种基于计算机视觉的快递暴力分拣识别方法 |
CN107301370A (zh) | 2017-05-08 | 2017-10-27 | 上海大学 | 一种基于Kinect三维骨架模型的肢体动作识别方法 |
US20200178851A1 (en) * | 2017-07-10 | 2020-06-11 | Georgia Tech Research Corporation | Systems and methods for tracking body movement |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1619037B1 (en) * | 2003-04-30 | 2011-07-06 | Mitsubishi Kagaku Media Co., Ltd. | Phase-change recording material and information recording medium |
JP4735795B2 (ja) * | 2003-12-26 | 2011-07-27 | 独立行政法人 宇宙航空研究開発機構 | 冗長マニピュレータの制御方法 |
US8786680B2 (en) * | 2011-06-21 | 2014-07-22 | Disney Enterprises, Inc. | Motion capture from body mounted cameras |
US20140347479A1 (en) * | 2011-11-13 | 2014-11-27 | Dor Givon | Methods, Systems, Apparatuses, Circuits and Associated Computer Executable Code for Video Based Subject Characterization, Categorization, Identification, Tracking, Monitoring and/or Presence Response |
US9826923B2 (en) * | 2013-10-31 | 2017-11-28 | Roshanak Houmanfar | Motion analysis method |
JP6760490B2 (ja) * | 2017-04-10 | 2020-09-23 | 富士通株式会社 | 認識装置、認識方法および認識プログラム |
JP2019036899A (ja) * | 2017-08-21 | 2019-03-07 | 株式会社東芝 | 情報処理装置、情報処理方法およびプログラム |
JP7409390B2 (ja) * | 2019-10-03 | 2024-01-09 | 富士通株式会社 | 運動認識方法、運動認識プログラムおよび情報処理装置 |
-
2017
- 2017-12-19 CN CN201711378734.3A patent/CN109934881B/zh active Active
-
2018
- 2018-12-11 EP EP18891958.3A patent/EP3716212A4/en active Pending
- 2018-12-11 WO PCT/CN2018/120337 patent/WO2019120108A1/zh unknown
-
2020
- 2020-06-17 US US16/903,938 patent/US11303925B2/en active Active
-
2022
- 2022-04-08 US US17/716,665 patent/US11825115B2/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040119716A1 (en) | 2002-12-20 | 2004-06-24 | Chang Joon Park | Apparatus and method for high-speed marker-free motion capture |
US20100271615A1 (en) * | 2009-02-20 | 2010-10-28 | Digital Signal Corporation | System and Method for Generating Three Dimensional Images Using Lidar and Video Measurements |
CN101763647A (zh) | 2010-02-02 | 2010-06-30 | 浙江大学 | 一种基于关键帧的实时摄像机跟踪方法 |
US20110228976A1 (en) * | 2010-03-19 | 2011-09-22 | Microsoft Corporation | Proxy training data for human body tracking |
US20120057761A1 (en) | 2010-09-01 | 2012-03-08 | Sony Corporation | Three dimensional human pose recognition method and apparatus |
CN102663449A (zh) | 2012-03-12 | 2012-09-12 | 西安电子科技大学 | 基于最大几何流向直方图的人体运动跟踪方法 |
CN103310191A (zh) | 2013-05-30 | 2013-09-18 | 上海交通大学 | 运动信息图像化的人体动作识别方法 |
US20150117540A1 (en) | 2013-10-29 | 2015-04-30 | Sony Corporation | Coding apparatus, decoding apparatus, coding data, coding method, decoding method, and program |
US20160328604A1 (en) * | 2014-01-07 | 2016-11-10 | Arb Labs Inc. | Systems and methods of monitoring activities at a gaming venue |
US20150367174A1 (en) * | 2014-06-19 | 2015-12-24 | Sumitomo Rubber Industries, Ltd. | Golf swing analysis apparatus and golf club fitting apparatus |
US20160042227A1 (en) | 2014-08-06 | 2016-02-11 | BAE Systems Information and Electronic Systems Integraton Inc. | System and method for determining view invariant spatial-temporal descriptors for motion detection and analysis |
US20160100165A1 (en) | 2014-10-03 | 2016-04-07 | Microsoft Technology Licensing, Llc | Adapting Encoding Properties |
CN104573665A (zh) | 2015-01-23 | 2015-04-29 | 北京理工大学 | 一种基于改进维特比算法的连续动作识别方法 |
CN104850846A (zh) | 2015-06-02 | 2015-08-19 | 深圳大学 | 一种基于深度神经网络的人体行为识别方法及识别系统 |
US9633282B2 (en) | 2015-07-30 | 2017-04-25 | Xerox Corporation | Cross-trained convolutional neural networks using multimodal images |
US9600717B1 (en) * | 2016-02-25 | 2017-03-21 | Zepp Labs, Inc. | Real-time single-view action recognition based on key pose analysis for sports videos |
CN105930767A (zh) | 2016-04-06 | 2016-09-07 | 南京华捷艾米软件科技有限公司 | 一种基于人体骨架的动作识别方法 |
CN106056035A (zh) | 2016-04-06 | 2016-10-26 | 南京华捷艾米软件科技有限公司 | 一种基于体感技术的幼儿园智能监控方法 |
CN106384093A (zh) | 2016-09-13 | 2017-02-08 | 东北电力大学 | 一种基于降噪自动编码器和粒子滤波的人体动作识别方法 |
CN106647282A (zh) | 2017-01-19 | 2017-05-10 | 北京工业大学 | 一种考虑末端运动误差的六自由度机器人轨迹规划方法 |
CN106897670A (zh) | 2017-01-19 | 2017-06-27 | 南京邮电大学 | 一种基于计算机视觉的快递暴力分拣识别方法 |
CN107301370A (zh) | 2017-05-08 | 2017-10-27 | 上海大学 | 一种基于Kinect三维骨架模型的肢体动作识别方法 |
US20200178851A1 (en) * | 2017-07-10 | 2020-06-11 | Georgia Tech Research Corporation | Systems and methods for tracking body movement |
Non-Patent Citations (11)
Title |
---|
Extended European Search Report issued in European Application No. 18891958.3 dated Jan. 14, 2021, 12 pages. |
Jing et al., "Sudden violence identification algorithm based on motion feature", Journal of Computer Applications, vol. 31 No. 2, Feb. 2011, 4 pages (With English translation). |
Liu et al., "Enhanced skeleton visualization for view invariant human action recognition," Pattern Recognition, vol. 68, Mar. 3, 2017, 18 pages. |
Lou et al., "A novel scheme of ROI detection and transcoding for mobile devices in high-definition videoconferencing," Proceedings of the 5th Workshop on Mobile Video, Feb. 27, 2013, 6 pages. |
Naka et al., "A Compression/Decompression Method for Streaming Based Humanoid Animation," Proceedings from Fourth Symposium on the Virtual Reality Modeling Language 1999, Feb. 23, 1999, 8 pages. |
Office Action issued in Chinese Application No. 201711378734.3 dated Jun. 3, 2020, 23 pages (With English Translation). |
Office Action issued in Chinese Application No. 201711378734.3 dated May 8, 2021, 28 pages (with English translation). |
PCT International Search Report and Written Opinion in International Application No. PCT/CN2018/120,337, dated Mar. 6, 2019, 17 pages (With English Translation). |
Wu, "Human Behavior Recognition Based on Multi-sensor Devices". Nanjing University of Science and Technology, Jul. 2014, 71 pages (With English Abstract). |
Ye et al., "A Survey on Human Motion Analysis from Depth Data," Big Data Analytics in the Social and Ubiquitous Context:5th International Workshop On Modeling Social Media, Jan. 1, 2013, 40 pages. |
Zou et al., "Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature," Big Data Analytics in the Social and Ubiquitous Context, 5th International Workshop On Modeling Social Media, Dec. 13, 2013, 12 pages. |
Also Published As
Publication number | Publication date |
---|---|
EP3716212A4 (en) | 2021-01-27 |
US11825115B2 (en) | 2023-11-21 |
CN109934881A (zh) | 2019-06-25 |
US20220232247A1 (en) | 2022-07-21 |
CN109934881B (zh) | 2022-02-18 |
US20200322626A1 (en) | 2020-10-08 |
EP3716212A1 (en) | 2020-09-30 |
WO2019120108A1 (zh) | 2019-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11303925B2 (en) | Image coding method, action recognition method, and action recognition apparatus | |
EP4002198A1 (en) | Posture acquisition method and device, and key point coordinate positioning model training method and device | |
CN110235138B (zh) | 用于外观搜索的系统和方法 | |
WO2021203667A1 (en) | Method, system and medium for identifying human behavior in a digital video using convolutional neural networks | |
CN109670380A (zh) | 动作识别、姿势估计的方法及装置 | |
CN104932804B (zh) | 一种智能虚拟装配动作识别方法 | |
Patil et al. | Real time facial expression recognition using RealSense camera and ANN | |
CN114937232B (zh) | 医废处理人员防护用具穿戴检测方法、系统和设备 | |
Wei et al. | Real-time facial expression recognition for affective computing based on Kinect | |
CN111723687A (zh) | 基于神经网路的人体动作识别方法和装置 | |
Loconsole et al. | Real-time emotion recognition: an improved hybrid approach for classification performance | |
WO2022009301A1 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
CN113139415A (zh) | 视频关键帧提取方法、计算机设备和存储介质 | |
Yan et al. | Human-object interaction recognition using multitask neural network | |
CN115205933A (zh) | 面部表情识别方法、装置、设备及可读存储介质 | |
Datcu et al. | Automatic recognition of facial expressions using bayesian belief networks | |
JP7396364B2 (ja) | 画像処理装置、画像処理方法及び画像処理プログラム | |
CN111008558B (zh) | 结合深度学习与关系建模的图片/视频重要人物检测方法 | |
Sadhu et al. | Person identification using Kinect sensor | |
Wu et al. | Human Pose Recognition Based on Openpose and Application in Safety Detection of Intelligent Factory | |
CN113408485A (zh) | 基于fpga和深度学习的老人室内跌倒检测方法及装置 | |
WO2022249278A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
CN117253290B (zh) | 基于yolopose模型的跳绳计数实现方法、装置及存储介质 | |
EP4435744A1 (en) | Activity recognition | |
Rind et al. | Identification of Human & various objects through Image Processing based system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: HUAWEI TECHNOLOGIES CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, JIANGLIU;YUAN, KEBIN;LIU, YUNHUI;SIGNING DATES FROM 20200513 TO 20200716;REEL/FRAME:054107/0330 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |