CN110858277A

CN110858277A - Method and device for obtaining attitude classification model

Info

Publication number: CN110858277A
Application number: CN201810958437.4A
Authority: CN
Inventors: 邵长东; 姚迪狄; 吴志华
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-08-22
Filing date: 2018-08-22
Publication date: 2020-03-03

Abstract

The application discloses a method and a device for obtaining a posture classification model, wherein the method comprises the following steps: performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model. The posture classification model obtained by the method has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; the gesture classification model only adopts the gesture recognition matching process to recognize the human body gesture, does not need to detect key points of human bones, is simple in recognition process and high in recognition efficiency, and reduces the processing time of single-frame images.

Description

Method and device for obtaining attitude classification model

Technical Field

The application relates to the field of computer vision, in particular to a method for obtaining a posture classification model. The application also relates to a device for obtaining the posture classification model and an electronic device. The application also relates to a gesture recognition method, a gesture recognition device and an electronic device. The application further relates to a gesture recognition method, a gesture recognition device and an electronic device.

Background

With the progress of technology and the development of market, intelligent devices based on computer vision are widely used, for example, various monitoring devices and intelligent game devices, which need to accurately analyze and recognize the posture information of a target object, so as to achieve the purpose of monitoring the target object or interacting with the target object.

At the present stage, human body posture recognition is an important research direction in the field of computer vision, is widely applied to scenes such as input of motion sensing games, fall detection, identity recognition, control of intelligent equipment and the like, and is substantially the positioning of key points of a human body.

At present, a mainstream gesture recognition model is a gesture recognition model based on skeletal key points, such as an AlphaPose model and an OpenPose model, which are complex and have high requirements on computing resources, and are difficult to operate in real time at a higher frame rate (for example, a frame rate of more than 24 FPS) on a mobile device, taking OpenPose as an example, an algorithm model is more than 200M in size, and when the gesture recognition model operates on the mobile device, the gesture recognition model can only operate at a lower frame rate even under the support of a powerful graphics processor, and the real-time performance of the gesture recognition cannot be ensured; in addition, the human body posture recognition process of the posture recognition model based on the bone key points needs two processes of bone key point detection and posture recognition matching, the recognition process is complex, and the recognition efficiency is low.

Disclosure of Invention

The application provides a method for obtaining a posture classification model, which aims to solve the problems that the existing mobile equipment cannot ensure the real-time performance of posture recognition, and a posture recognition model based on skeleton key points is complex in recognition process and low in recognition efficiency. The application further provides a device for obtaining the posture classification model and an electronic device. The application also provides a gesture recognition method, a gesture recognition device and an electronic device. The application further provides a gesture recognition method, a gesture recognition device and an electronic device.

The application provides a method for obtaining a posture classification model, which comprises the following steps:

performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;

and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.

Optionally, the performing model training according to the image containing the predetermined pose and the pose tag includes:

performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;

and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.

Optionally, the predetermined classification model is a trained image classification model, and performing model training on the predetermined classification model by using the posture features and the posture labels as training samples includes:

and carrying out transfer learning on the trained image classification model according to the training sample to obtain a posture classification model.

Optionally, the characterizing the image containing the predetermined pose includes:

converting the image containing the preset gesture into a YUV image;

extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;

and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.

Optionally, the extracting a moving target based on the Y component data of the YUV image includes:

extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,

extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,

and extracting a moving target by an optical flow method based on the Y component data of the YUV image.

Optionally, the performing pose recognition on the image including the predetermined pose by using the keypoint detection model includes:

performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;

and performing gesture recognition on the key points through a motion matching algorithm to obtain the gesture tag of the image containing the preset gesture.

The application also provides a gesture recognition method, which comprises the following steps:

acquiring an image to be recognized, which needs gesture recognition;

carrying out attitude classification on the image to be recognized through an attitude classification model to obtain an attitude classification result of the image to be recognized;

wherein the posture classification model is obtained by:

Optionally, the performing posture classification on the image to be recognized through the posture classification model includes:

performing characterization processing on the image to be recognized to obtain attitude characteristics contained in the image to be recognized;

and inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification.

Optionally, the characterizing the image to be recognized includes:

converting the image to be identified into a YUV image;

performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model;

and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.

converting the image containing the preset gesture into a YUV image;

Optionally, the performing, by the gesture classification model, gesture classification on the image to be recognized includes:

inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification;

the gesture features contained in the image to be recognized and the gesture features of the image containing the preset gesture are the same type of feature set.

Optionally, the characterizing the image to be recognized includes:

converting the image to be identified into a YUV image;

The present application further provides a device for obtaining a posture classification model, including:

the gesture tag obtaining unit is used for carrying out gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;

and the posture classification model obtaining unit is used for carrying out model training according to the image containing the preset posture and the posture label to obtain a posture classification model.

The present application further provides an electronic device, comprising:

a processor;

a memory for storing a program for obtaining a pose classification model, which when read and executed by the processor, performs the following operations:

The present application further provides a gesture recognition apparatus, including:

the device comprises a to-be-recognized image obtaining unit, a to-be-recognized image acquiring unit and a gesture recognizing unit, wherein the to-be-recognized image obtaining unit is used for obtaining an image to be recognized which needs gesture recognition;

and the gesture classification result obtaining unit is used for carrying out gesture classification on the image to be recognized through a gesture classification model to obtain a gesture classification result of the image to be recognized.

The present application further provides an electronic device, comprising:

a processor;

a memory for storing a gesture recognition program that when read executed by the processor performs the following operations:

acquiring an image to be recognized, which needs gesture recognition;

and carrying out posture classification on the image to be recognized through a posture classification model to obtain a posture classification result of the image to be recognized.

the attitude classification model obtaining unit is used for carrying out model training according to the image containing the preset attitude and the attitude label to obtain an attitude classification model;

and the attitude classification result obtaining unit is used for carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.

The present application further provides an electronic device, comprising:

a processor;

Compared with the prior art, the method has the following advantages:

according to the method for obtaining the posture classification model, the posture of the image containing the preset posture is identified through the key point detection model, and the posture label of the image containing the preset posture is obtained; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model. The posture classification model obtained by the method has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; in addition, the process of recognizing the human body posture by the posture classification model is only a process of posture recognition matching, human skeleton key point detection is not needed, the recognition process is simple, the recognition efficiency is high, and the processing time of a single-frame image is reduced.

Drawings

FIG. 1 is a flow chart of a method provided in a first embodiment of the present application;

FIG. 2 is a flow chart of a model training method provided in a first embodiment of the present application;

FIG. 3 is a flow chart of a method provided by a second embodiment of the present application;

FIG. 4 is a flow chart of gesture classification provided by a second embodiment of the present application;

FIG. 5 is a flow chart of a method provided by a third embodiment of the present application;

FIG. 6 is a flow chart of a model training method provided in a third embodiment of the present application;

FIG. 7 is a block diagram of the apparatus unit provided in the fourth embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device provided in a fifth embodiment of the present application;

FIG. 9 is a block diagram of the apparatus provided in the sixth embodiment of the present application;

FIG. 10 is a schematic diagram of an electronic device provided by a seventh embodiment of the present application;

FIG. 11 is a block diagram of the apparatus unit provided in the eighth embodiment of the present application;

fig. 12 is a schematic diagram of an electronic device according to a ninth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

Aiming at the existing gesture recognition scenes such as motion sensing game input and limb action detection, the application provides a method for obtaining a gesture classification model, a gesture recognition device corresponding to the method and electronic equipment in order to improve the recognition efficiency of a gesture recognition model to a target gesture and increase the application range of the gesture recognition model. The application also provides a gesture recognition method, a gesture recognition device corresponding to the method and electronic equipment. The application also provides a gesture recognition method, a device corresponding to the gesture recognition method and electronic equipment. The following provides embodiments to explain the method, apparatus, and electronic device in detail.

The first embodiment of the present application provides a method for obtaining a posture classification model, which can be applied to a mobile device terminal for recognizing a human posture. Fig. 1 is a flowchart of a method for obtaining a pose classification model according to a first embodiment of the present application, and the method according to this embodiment is described in detail below with reference to fig. 1. The following description refers to embodiments for the purpose of illustrating the principles of the methods, and is not intended to be limiting in actual use.

As shown in fig. 1, the method for obtaining a pose classification model provided in this embodiment includes the following steps:

s101, performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture.

The key point detection is one of basic algorithms in the computer vision field, and means that preset key points of an identified object are detected and obtained through a preset detection algorithm, and the posture information of the identified object can be obtained through posture recognition of the key points obtained through detection. For example, the detection of human skeleton key points (position Estimation) widely applied includes the detection of human skeleton key points of multiple persons and the detection of human skeleton key points of single person, the positions of the important points of the human body, such as 'wrist', 'elbow', 'shoulder' and 'head', are set as key points in advance, in the application, the key points obtained through detection can describe the skeleton information of the human body, and the detection method can be applied to a plurality of scenes, such as intelligent video monitoring, a patient monitoring system, human-computer interaction, virtual reality, human animation, intelligent home, intelligent security, athlete auxiliary training and the like.

In this embodiment, the key point detection model may be a bone key point detection model openpos, which is a real-time multi-user key point detection model.

The predetermined gesture refers to a gesture type preset in a specific application scene, for example, for a dance type application in a motion sensing game, the common situation is as follows: and indicating the user station to detect and match human skeleton key points at a preset position, then tracking and identifying the dance action of the user by using the detected human skeleton key points, comparing and grading the identified dance action with the dance action preset by the machine, wherein the preset dance action is a preset gesture in the process.

The source of the image containing the predetermined gesture may be a public data set or a private data set, for example, an image of a dance action pre-recorded by a camera, or an image in the public data set. The image type containing the predetermined pose may be an RGB image or a YUV image. In this embodiment, the image containing the predetermined gesture is an RGB image containing a predetermined dance motion.

In this embodiment, the above gesture recognition of the image including the predetermined gesture by the keypoint detection model may be performed as follows:

the key point detection is carried out on the image containing the preset human body posture through the skeleton key point detection model OpenPose, the key points in the image containing the preset human body posture are obtained, the process is substantially the position of each skeleton key point of the human body, and a foundation is provided for practical scenes such as follow-up further action recognition, action abnormity detection, intelligent monitoring, automatic driving and the like, and the method specifically can be as follows: the image containing the preset human body posture is used as the input of a skeleton key point detection model OpenPose, and the horizontal and vertical coordinates of each skeleton key point of the human body in the image are output; and performing gesture recognition on the obtained key points through a motion matching algorithm to obtain a recognition result, wherein the recognition result is the gesture label of the image containing the preset gesture.

And S102, performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.

After the pose tag of the image containing the preset pose is obtained through the above steps, the step is used for performing model training on the image containing the preset pose and the pose tag of the image, and obtaining a pose classification model capable of classifying images of the same category of the image containing the preset pose.

The images of the same category including the images of the predetermined gesture mean that the gestures included in the images have the same category as the predetermined gesture, for example, the predetermined gesture is a dance motion preset in the motion sensing game, and the images of the same category can be the acquired dance motion of the user.

In this embodiment, the process of performing model training according to the image containing the predetermined pose and the pose tag as shown in fig. 2 includes the following steps:

and S1021, performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture.

In this embodiment, the characterizing process of the image including the predetermined pose includes the following steps:

first, an image containing a predetermined pose is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. In this embodiment, the image with the predetermined pose is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in this embodiment, in the subsequent processing of the image, it is the texture of the image rather than the color of the image, so the RGB image with the predetermined pose needs to be converted into the YUV image, and specifically, the conversion can be performed by the following formula:

Y＝0.299R+0.587G+0.114B；

U＝-0.147R-0.289G+0.436B；

V＝0.615R-0.515G-0.100B。

and secondly, extracting a moving target based on the Y component data of the YUV image to obtain the contour data contained in the YUV image. The method for extracting the moving target based on the Y component data of the YUV image is more, for example, the moving target is extracted by a background difference algorithm, which specifically includes: selecting a proper background image, carrying out differential operation on the current frame and the background image to obtain a differential image, selecting a proper threshold value, and carrying out binarization on the differential image; the method can also extract the moving object by an optical flow method, and the method mainly aims to calculate an optical flow field, namely, under the condition of proper smoothness constraint, a motion field is estimated according to the space-time gradient of an image sequence, and the moving object and a scene are detected and segmented by analyzing the change of the motion field.

In this embodiment, the moving object is extracted by an interframe difference algorithm, and the basic principle is as follows: and extracting a motion area in the image by closed-value conversion by adopting pixel-based time difference between two or three adjacent frames of images in the image sequence. The specific method comprises the following steps: subtracting pixel values corresponding to adjacent frames to obtain a difference image, carrying out binarization processing on the difference image, and under the condition that the change of the environmental brightness is not large, if the change of the corresponding pixel value is less than a preset threshold value, considering the pixel as a background pixel; if the pixel value change of the image area is large, the change is considered to be caused by a moving object in the image, the image area is marked as a foreground pixel, and the position of the moving object in the image can be determined by utilizing the marked pixel area.

And finally, carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics. The normalization process aims to convert the YUV images into corresponding unique standard forms, and the unique standard forms have invariant characteristics to affine transformations such as translation, rotation and scaling.

And S1022, performing model training on a preset classification model by taking the posture features and the posture labels as training samples.

After the pose features and the pose labels of the image containing the preset pose are obtained through the steps, the steps are used for performing model training by taking the pose features and the pose labels obtained through the steps as training samples.

The predetermined classification model refers to a preselected image classification model which is trained and has a perfect image classification function, for example, an image classification model trained by using imagenet pictures and coffee models. In this embodiment, the method for performing model training on a predetermined classification model using the pose features and the pose labels as training samples includes: and performing transfer learning on the preselected image classification model which is trained and has a perfect image classification function according to the posture characteristics and the posture labels of the images containing the preset postures to obtain a new posture classification model which is required by the embodiment and can classify the images of the same category of the images containing the preset postures.

The process of the transfer learning comprises the following steps: and training the data set formed by the posture characteristics and the posture labels of the images containing the preset postures based on the preselected image classification model which is trained and has a perfect image classification function, and correspondingly adjusting the network architecture and other aspects of the image classification model according to the output requirements to obtain the posture classification model capable of classifying the images of the same category of the images containing the preset postures.

In the method for obtaining a pose classification model provided in this embodiment, a key point detection model (e.g., a bone key point detection model openpos) is used to perform pose recognition on an image including a predetermined pose, so as to obtain a pose tag of the image including the predetermined pose; performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture; and taking the posture features and the posture labels as training samples, and performing transfer learning on a preselected image classification model which is trained and has a perfect image classification function to obtain a posture classification model which is required by the embodiment and can classify the images of the same category including the images of the preset posture. Compared with the existing gesture recognition model based on the skeleton key points, the gesture classification model has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; in addition, the process of recognizing the human body posture by the posture classification model is only a process of posture recognition matching, human skeleton key point detection is not needed, the recognition process is simple, the recognition efficiency is high, and the processing time of a single-frame image is reduced.

The second embodiment of the present application provides a gesture recognition method, which is applicable to a gesture recognition scenario of a mobile device. Fig. 3 is a flowchart of a method provided in a second embodiment of the present application, and the method provided in this embodiment is described in detail below with reference to fig. 3.

As shown in fig. 3, the gesture recognition method provided in this embodiment includes the following steps:

s201, acquiring an image to be recognized, which needs gesture recognition.

The method comprises the step of acquiring the image to be recognized, which needs gesture recognition. The image to be recognized can be an image in any format containing posture information, such as a user limb action image captured in a motion sensing game.

S202, carrying out posture classification on the image to be recognized through a posture classification model to obtain a posture classification result of the image to be recognized.

The method comprises the following steps of carrying out gesture classification on the image to be recognized obtained in the previous step through a pre-trained gesture classification model, and obtaining a recognition result of the image to be recognized.

The posture classification model is obtained by the following steps: performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture; model training is performed according to the image with the predetermined posture and the posture label of the image to obtain a posture classification model, which is the posture classification model obtained in the first embodiment, and the detailed contents of this part refer to the related description provided in the first embodiment and are not repeated herein.

In this embodiment, the process of performing pose classification on the image to be recognized through the pose classification model is shown in fig. 4, and includes the following processes:

s2021, performing characterization processing on the image to be recognized to obtain the posture features contained in the image to be recognized.

In this embodiment, the characterizing process of the image to be recognized includes the following steps:

firstly, an image to be recognized is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. The image to be recognized is an image in any format that can be converted into a YUV image, in this embodiment, the image to be recognized is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in this embodiment, in the subsequent processing of the image, the texture of the image, not the color of the image, is mainly used, so the RGB image including the predetermined posture needs to be converted into the YUV image, and specifically, the conversion can be performed by the following formula:

Y＝0.299R+0.587G+0.114B；

U＝-0.147R-0.289G+0.436B；

V＝0.615R-0.515G-0.100B。

and secondly, extracting a moving target based on the Y component data of the YUV image to obtain the contour data contained in the YUV image. For example, the moving object may be extracted by a background difference algorithm, specifically: selecting a proper background image, carrying out differential operation on the current frame and the background image to obtain a differential image, selecting a proper threshold value, and carrying out binarization on the differential image; the method can also extract the moving object by an optical flow method, and the method mainly aims to calculate an optical flow field, namely, under the condition of proper smoothness constraint, a motion field is estimated according to the space-time gradient of an image sequence, and the moving object and a scene are detected and segmented by analyzing the change of the motion field.

And finally, carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics. The normalization process aims to convert the YUV images into corresponding unique standard forms which have invariant characteristics to affine transformations such as translation, rotation, scaling and the like.

And S2022, inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification, and obtaining a classification result of the image to be recognized.

The third embodiment of the present application provides a gesture recognition method. Fig. 5 is a flowchart of a method provided in a third embodiment of the present application, and the method provided in this embodiment is described in detail below with reference to fig. 5.

As shown in fig. 5, the gesture recognition method provided in this embodiment includes the following steps:

s301, performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture.

The key point detection means that preset key points of the identified object are detected and obtained through a preset detection algorithm, and the posture information of the identified object can be obtained through posture recognition of the key points obtained through detection.

The source of the image containing the predetermined gesture may be a public data set or a private data set, for example, a dance motion image pre-recorded through a camera, or a dance motion image in the public data set. The image type containing the predetermined pose may be an RGB image or a YUV image. In this embodiment, the image containing the predetermined gesture is an RGB image containing a predetermined dance motion.

The above gesture recognition of the image containing the predetermined gesture by the keypoint detection model may be as follows: the key point detection is carried out on the image containing the preset human body posture through the skeleton key point detection model OpenPose, the key points in the image containing the preset human body posture are obtained, the process is substantially the position of each skeleton key point of the human body, a foundation is provided for practical scenes such as follow-up further action recognition, action abnormity detection, intelligent monitoring, automatic driving and the like, and the specific implementation mode can be as follows: the image containing the preset human body posture is used as the input of a skeleton key point detection model OpenPose, and the horizontal and vertical coordinates of each skeleton key point of the human body in the image are output; and performing gesture recognition on the obtained key points through a motion matching algorithm to obtain a recognition result, wherein the recognition result is the gesture label of the image containing the preset gesture.

S302, performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.

In this embodiment, the process of performing model training according to the image containing the predetermined pose and the pose tag as shown in fig. 6 includes the following steps:

and S3021, performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture.

The image containing the preset gesture is characterized, and the method comprises the following steps:

first, an image containing a predetermined pose is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. In this embodiment, the image with the predetermined pose is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in the subsequent processing of the image in this embodiment, the texture of the image is mainly played instead of the color of the image, so the RGB image with the predetermined pose needs to be converted into the YUV image, which can be specifically converted by the following formula:

Y＝0.299R+0.587G+0.114B；

U＝-0.147R-0.289G+0.436B；

V＝0.615R-0.515G-0.100B。

In this embodiment, the moving object is extracted by an interframe difference algorithm, and the basic principle is as follows: and extracting a motion area in the image by closed-value conversion by adopting pixel-based time difference between two or three adjacent frames of images in the image sequence. The specific method comprises the following steps: subtracting pixel values corresponding to adjacent frame images to obtain a difference image, carrying out binarization processing on the difference image, and under the condition that the environmental brightness does not change much, if the change of the corresponding pixel value is less than a preset threshold value, considering the pixel as a background pixel; if the pixel value change of the image area is large, the change is considered to be caused by a moving object in the image, the image area is marked as a foreground pixel, and the position of the moving object in the image can be determined by utilizing the marked pixel area.

And S3022, performing model training on the preset classification model by taking the posture features and the posture labels as training samples.

The predetermined classification model refers to a preselected image classification model which is trained and has a perfect image classification function, for example, an image classification model trained by using imagenet pictures and coffee models. In this embodiment, the method for performing model training on a predetermined classification model using the pose features and the pose labels as training samples includes: and performing transfer learning on the preselected image classification model which is trained and has a perfect image classification function according to the posture characteristics and the posture labels of the images containing the preset postures to obtain a posture classification model which is required by the embodiment and can classify the images of the same category of the images containing the preset postures.

The process of the transfer learning comprises the following steps: and training the image classification model which is selected in advance, is trained on the basis of the image classification model which is trained completely and has a perfect image classification function on the basis of the new data set formed by the posture features and the posture labels of the images containing the preset postures, and correspondingly adjusting the network architecture and other aspects of the image classification model according to the output requirements to obtain a new posture classification model capable of classifying the images of the same category of the images containing the preset postures.

S303, carrying out posture classification on the image to be recognized through the posture classification model to obtain a posture classification result of the image to be recognized.

The image to be recognized may be an image containing posture information, such as a user limb motion image captured by a camera in a motion sensing game.

In this embodiment, the process of performing pose classification on the image to be recognized through the pose classification model may be as follows:

the image to be recognized is characterized in the same way as the image including the predetermined pose in the step S3021, and the pose feature included in the image to be recognized is obtained, where the pose feature included in the image to be recognized is the same type of feature set as the pose feature of the image including the predetermined pose. The process of obtaining the posture features contained in the image to be recognized is as follows: converting an image to be identified into a YUV image; extracting a moving target based on the Y component data of the YUV image to obtain contour data included in the YUV image, for example, extracting the moving target by a background difference algorithm or extracting the moving target by an optical flow method, in this embodiment, extracting the moving target by an inter-frame difference algorithm; and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics contained in the image to be recognized.

And inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification to obtain a classification result of the image to be recognized.

The fourth embodiment of the present application also provides a device for obtaining a pose classification model, which is substantially similar to the method embodiment and therefore is relatively simple to describe, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the device embodiment is only illustrative.

Referring to fig. 7, to understand the embodiment, fig. 7 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 7, the apparatus provided in the embodiment includes:

a pose tag obtaining unit 401, configured to perform pose recognition on an image including a predetermined pose through the key point detection model, and obtain a pose tag of the image including the predetermined pose;

a pose classification model obtaining unit 402, configured to perform model training according to the image and the pose tag that include the predetermined pose, and obtain a pose classification model.

The posture classification model obtaining unit 402 includes:

the image posture characteristic obtaining subunit is used for carrying out characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;

and the model training subunit is used for performing model training on the preset classification model by taking the posture characteristics and the posture labels as training samples.

The predetermined classification model is a trained image classification model, and the model training subunit is specifically configured to: and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.

The attitude feature obtaining subunit of the image includes:

converting the image containing the preset gesture into a YUV image;

The extracting of the moving target based on the Y component data of the YUV image comprises the following steps:

extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,

extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,

and extracting the moving object by an optical flow method based on the Y component data of the YUV image.

The posture tag obtaining unit 401 includes:

the key point obtaining subunit is used for performing key point detection on the image containing the preset posture through the key point detection model to obtain key points in the image containing the preset posture;

and the label obtaining subunit is used for carrying out gesture recognition on the key points through an action matching algorithm to obtain a gesture label of the image containing the preset gesture.

In the foregoing embodiment, a method and a device for obtaining a pose classification model are provided, and in addition, a fifth embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:

please refer to fig. 8 for understanding the present embodiment, fig. 8 is a schematic view of an electronic device provided in the present embodiment.

As shown in fig. 8, the electronic apparatus includes: a processor 501; a memory 502;

a memory 502 for storing a program for obtaining a pose classification model, which when read executed by the processor 501 performs the following operations:

performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;

For example, the electronic device is a computer, and the computer can perform pose recognition on an image containing a predetermined pose through a key point detection model to obtain a pose tag of the image containing the predetermined pose; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.

Optionally, performing model training according to the image containing the predetermined pose and the pose tag, including:

Optionally, the predetermined classification model is an image classification model after training, and the model training is performed on the predetermined classification model by using the posture feature and the posture label as training samples, including:

and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.

Optionally, the characterizing the image including the predetermined pose includes:

converting the image containing the preset gesture into a YUV image;

Optionally, extracting a moving target based on the Y component data of the YUV image includes:

Optionally, performing pose recognition on the image including the predetermined pose through the key point detection model, including:

and performing gesture recognition on the key points through a motion matching algorithm to obtain a gesture label of the image containing the preset gesture.

The sixth embodiment of the present application also provides a gesture recognition apparatus, since the apparatus embodiment is substantially similar to the method embodiment, so that the description is relatively simple, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the apparatus embodiment is only illustrative.

Please refer to fig. 9 for understanding the embodiment, fig. 9 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 9, the apparatus provided in the embodiment includes:

an image to be recognized obtaining unit 601, configured to obtain an image to be recognized that needs gesture recognition;

a pose classification result obtaining unit 602, configured to perform pose classification on the image to be recognized through the pose classification model, and obtain a pose classification result of the image to be recognized.

The posture classification model is obtained by the following steps:

The posture classification result obtaining unit 602 includes:

the gesture feature obtaining subunit is used for performing characterization processing on the image to be recognized to obtain the gesture features contained in the image to be recognized;

and the gesture classification subunit is used for inputting the gesture features contained in the image to be recognized into the gesture classification model for gesture classification.

The posture feature obtaining subunit included in the image to be recognized is specifically configured to:

converting an image to be identified into a YUV image;

In the foregoing embodiment, a gesture recognition method and a gesture recognition apparatus are provided, and in addition, a seventh embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:

please refer to fig. 10 for understanding the present embodiment, fig. 10 is a schematic view of an electronic device provided in the present embodiment.

As shown in fig. 10, the electronic apparatus includes: a processor 701; a memory 702;

the memory 702 is used for storing a program for obtaining a pose classification model, which when read and executed by the processor 701, performs the following operations:

acquiring an image to be recognized, which needs gesture recognition;

wherein the posture classification model is obtained by the following method:

Optionally, the gesture classification of the image to be recognized through the gesture classification model includes:

performing characterization processing on an image to be recognized to obtain attitude characteristics contained in the image to be recognized;

and inputting the posture characteristics contained in the image to be recognized into a posture classification model for posture classification.

Optionally, the characterizing the image to be recognized includes:

converting an image to be identified into a YUV image;

The eighth embodiment of the present application further provides a gesture recognition apparatus, since the apparatus embodiment is substantially similar to the method embodiment, and therefore the description is relatively simple, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the apparatus embodiment is only illustrative.

Please refer to fig. 11 for understanding the embodiment, fig. 11 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 11, the apparatus provided in the embodiment includes:

a pose tag obtaining unit 801, configured to perform pose recognition on an image including a predetermined pose through the key point detection model, and obtain a pose tag of the image including the predetermined pose;

a pose classification model obtaining unit 802, configured to perform model training according to an image including a predetermined pose and a pose tag, to obtain a pose classification model;

a pose classification result obtaining unit 803, configured to perform pose classification on the image to be recognized through the pose classification model, and obtain a pose classification result of the image to be recognized.

The pose classification model obtaining unit 802 includes:

the gesture feature obtaining subunit is used for performing characterization processing on the image containing the preset gesture to obtain a gesture feature of the image containing the preset gesture;

The predetermined classification model is an image classification model after training, and the model training subunit is specifically configured to:

A model training subunit comprising:

converting the image containing the preset gesture into a YUV image;

The posture classification result obtaining unit 803 includes:

the gesture classification subunit is used for inputting gesture features contained in the image to be recognized into a gesture classification model for gesture classification;

The posture characteristic obtaining subunit included in the image to be recognized comprises:

converting an image to be identified into a YUV image;

The posture tag obtaining unit 801 includes:

and the gesture tag obtaining subunit is used for performing gesture recognition on the key points through a motion matching algorithm to obtain a gesture tag of the image containing the preset gesture.

In the foregoing embodiment, a gesture recognition method and a gesture recognition apparatus are provided, and in addition, a ninth embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:

please refer to fig. 12 for understanding the present embodiment, fig. 12 is a schematic view of an electronic device provided in the present embodiment.

As shown in fig. 12, the electronic apparatus includes: a processor 901; a memory 902;

the memory 902 is used for storing a program for obtaining a pose classification model, which when read and executed by the processor 901 performs the following operations:

performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture; performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model; and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.

converting the image containing the preset gesture into a YUV image;

inputting the posture characteristics contained in the image to be recognized into a posture classification model for posture classification;

Optionally, the characterizing the image to be recognized includes:

converting an image to be identified into a YUV image;

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. A method of obtaining a pose classification model, comprising:

2. The method of claim 1, wherein the model training from the image containing the predetermined pose and the pose tag comprises:

3. The method of claim 2, wherein the predetermined classification model is a trained image classification model, and the model training of the predetermined classification model using the pose features and the pose labels as training samples comprises:

4. The method of claim 2, wherein the characterizing the image containing the predetermined pose comprises:

converting the image containing the preset gesture into a YUV image;

5. The method of claim 4, wherein extracting a motion target based on Y component data of the YUV image comprises:

6. The method of claim 1, wherein the performing pose recognition on the image containing the predetermined pose by the keypoint detection model comprises:

7. A gesture recognition method, comprising:

acquiring an image to be recognized, which needs gesture recognition;

wherein the posture classification model is obtained by:

8. The method of claim 7, wherein the pose classification of the image to be recognized by the pose classification model comprises:

9. The method according to claim 8, wherein the characterizing the image to be recognized comprises:

converting the image to be identified into a YUV image;

10. The method of claim 9, wherein extracting a motion target based on Y component data of the YUV images comprises:

11. A gesture recognition method, comprising:

12. The method of claim 11, wherein the model training from the image containing the predetermined pose and the pose tag comprises:

13. The method of claim 12, wherein the predetermined classification model is a trained image classification model, and the model training of the predetermined classification model using the pose features and the pose labels as training samples comprises:

14. The method of claim 12, wherein the characterizing the image containing the predetermined pose comprises:

converting the image containing the preset gesture into a YUV image;

15. The method of claim 14, wherein extracting a motion target based on Y component data of the YUV images comprises:

16. The method according to any one of claims 12-15, wherein the pose classification of the image to be recognized by the pose classification model comprises:

17. The method according to claim 16, wherein the characterizing the image to be recognized comprises:

converting the image to be identified into a YUV image;

18. The method of claim 17, wherein extracting a motion target based on Y component data of the YUV images comprises:

19. The method of claim 11, wherein the performing pose recognition on the image containing the predetermined pose by the keypoint detection model comprises:

20. An apparatus for obtaining a pose classification model, comprising:

21. An attitude recognition apparatus characterized by comprising:

22. An attitude recognition apparatus characterized by comprising: