CN110858277A - Method and device for obtaining attitude classification model - Google Patents

Method and device for obtaining attitude classification model Download PDF

Info

Publication number
CN110858277A
CN110858277A CN201810958437.4A CN201810958437A CN110858277A CN 110858277 A CN110858277 A CN 110858277A CN 201810958437 A CN201810958437 A CN 201810958437A CN 110858277 A CN110858277 A CN 110858277A
Authority
CN
China
Prior art keywords
image
posture
gesture
preset
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810958437.4A
Other languages
Chinese (zh)
Inventor
邵长东
姚迪狄
吴志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810958437.4A priority Critical patent/CN110858277A/en
Publication of CN110858277A publication Critical patent/CN110858277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for obtaining a posture classification model, wherein the method comprises the following steps: performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model. The posture classification model obtained by the method has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; the gesture classification model only adopts the gesture recognition matching process to recognize the human body gesture, does not need to detect key points of human bones, is simple in recognition process and high in recognition efficiency, and reduces the processing time of single-frame images.

Description

Method and device for obtaining attitude classification model
Technical Field
The application relates to the field of computer vision, in particular to a method for obtaining a posture classification model. The application also relates to a device for obtaining the posture classification model and an electronic device. The application also relates to a gesture recognition method, a gesture recognition device and an electronic device. The application further relates to a gesture recognition method, a gesture recognition device and an electronic device.
Background
With the progress of technology and the development of market, intelligent devices based on computer vision are widely used, for example, various monitoring devices and intelligent game devices, which need to accurately analyze and recognize the posture information of a target object, so as to achieve the purpose of monitoring the target object or interacting with the target object.
At the present stage, human body posture recognition is an important research direction in the field of computer vision, is widely applied to scenes such as input of motion sensing games, fall detection, identity recognition, control of intelligent equipment and the like, and is substantially the positioning of key points of a human body.
At present, a mainstream gesture recognition model is a gesture recognition model based on skeletal key points, such as an AlphaPose model and an OpenPose model, which are complex and have high requirements on computing resources, and are difficult to operate in real time at a higher frame rate (for example, a frame rate of more than 24 FPS) on a mobile device, taking OpenPose as an example, an algorithm model is more than 200M in size, and when the gesture recognition model operates on the mobile device, the gesture recognition model can only operate at a lower frame rate even under the support of a powerful graphics processor, and the real-time performance of the gesture recognition cannot be ensured; in addition, the human body posture recognition process of the posture recognition model based on the bone key points needs two processes of bone key point detection and posture recognition matching, the recognition process is complex, and the recognition efficiency is low.
Disclosure of Invention
The application provides a method for obtaining a posture classification model, which aims to solve the problems that the existing mobile equipment cannot ensure the real-time performance of posture recognition, and a posture recognition model based on skeleton key points is complex in recognition process and low in recognition efficiency. The application further provides a device for obtaining the posture classification model and an electronic device. The application also provides a gesture recognition method, a gesture recognition device and an electronic device. The application further provides a gesture recognition method, a gesture recognition device and an electronic device.
The application provides a method for obtaining a posture classification model, which comprises the following steps:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
Optionally, the performing model training according to the image containing the predetermined pose and the pose tag includes:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
Optionally, the predetermined classification model is a trained image classification model, and performing model training on the predetermined classification model by using the posture features and the posture labels as training samples includes:
and carrying out transfer learning on the trained image classification model according to the training sample to obtain a posture classification model.
Optionally, the characterizing the image containing the predetermined pose includes:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, the extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
Optionally, the performing pose recognition on the image including the predetermined pose by using the keypoint detection model includes:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain the gesture tag of the image containing the preset gesture.
The application also provides a gesture recognition method, which comprises the following steps:
acquiring an image to be recognized, which needs gesture recognition;
carrying out attitude classification on the image to be recognized through an attitude classification model to obtain an attitude classification result of the image to be recognized;
wherein the posture classification model is obtained by:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
Optionally, the performing posture classification on the image to be recognized through the posture classification model includes:
performing characterization processing on the image to be recognized to obtain attitude characteristics contained in the image to be recognized;
and inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification.
Optionally, the characterizing the image to be recognized includes:
converting the image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, the extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
The application also provides a gesture recognition method, which comprises the following steps:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model;
and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
Optionally, the performing model training according to the image containing the predetermined pose and the pose tag includes:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
Optionally, the predetermined classification model is a trained image classification model, and performing model training on the predetermined classification model by using the posture features and the posture labels as training samples includes:
and carrying out transfer learning on the trained image classification model according to the training sample to obtain a posture classification model.
Optionally, the characterizing the image containing the predetermined pose includes:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, the extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
Optionally, the performing, by the gesture classification model, gesture classification on the image to be recognized includes:
performing characterization processing on the image to be recognized to obtain attitude characteristics contained in the image to be recognized;
inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification;
the gesture features contained in the image to be recognized and the gesture features of the image containing the preset gesture are the same type of feature set.
Optionally, the characterizing the image to be recognized includes:
converting the image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, the extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
Optionally, the performing pose recognition on the image including the predetermined pose by using the keypoint detection model includes:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain the gesture tag of the image containing the preset gesture.
The present application further provides a device for obtaining a posture classification model, including:
the gesture tag obtaining unit is used for carrying out gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
and the posture classification model obtaining unit is used for carrying out model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
The present application further provides an electronic device, comprising:
a processor;
a memory for storing a program for obtaining a pose classification model, which when read and executed by the processor, performs the following operations:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
The present application further provides a gesture recognition apparatus, including:
the device comprises a to-be-recognized image obtaining unit, a to-be-recognized image acquiring unit and a gesture recognizing unit, wherein the to-be-recognized image obtaining unit is used for obtaining an image to be recognized which needs gesture recognition;
and the gesture classification result obtaining unit is used for carrying out gesture classification on the image to be recognized through a gesture classification model to obtain a gesture classification result of the image to be recognized.
The present application further provides an electronic device, comprising:
a processor;
a memory for storing a gesture recognition program that when read executed by the processor performs the following operations:
acquiring an image to be recognized, which needs gesture recognition;
and carrying out posture classification on the image to be recognized through a posture classification model to obtain a posture classification result of the image to be recognized.
The present application further provides a gesture recognition apparatus, including:
the gesture tag obtaining unit is used for carrying out gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
the attitude classification model obtaining unit is used for carrying out model training according to the image containing the preset attitude and the attitude label to obtain an attitude classification model;
and the attitude classification result obtaining unit is used for carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
The present application further provides an electronic device, comprising:
a processor;
a memory for storing a gesture recognition program that when read executed by the processor performs the following operations:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model;
and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
Compared with the prior art, the method has the following advantages:
according to the method for obtaining the posture classification model, the posture of the image containing the preset posture is identified through the key point detection model, and the posture label of the image containing the preset posture is obtained; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model. The posture classification model obtained by the method has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; in addition, the process of recognizing the human body posture by the posture classification model is only a process of posture recognition matching, human skeleton key point detection is not needed, the recognition process is simple, the recognition efficiency is high, and the processing time of a single-frame image is reduced.
Drawings
FIG. 1 is a flow chart of a method provided in a first embodiment of the present application;
FIG. 2 is a flow chart of a model training method provided in a first embodiment of the present application;
FIG. 3 is a flow chart of a method provided by a second embodiment of the present application;
FIG. 4 is a flow chart of gesture classification provided by a second embodiment of the present application;
FIG. 5 is a flow chart of a method provided by a third embodiment of the present application;
FIG. 6 is a flow chart of a model training method provided in a third embodiment of the present application;
FIG. 7 is a block diagram of the apparatus unit provided in the fourth embodiment of the present application;
fig. 8 is a schematic diagram of an electronic device provided in a fifth embodiment of the present application;
FIG. 9 is a block diagram of the apparatus provided in the sixth embodiment of the present application;
FIG. 10 is a schematic diagram of an electronic device provided by a seventh embodiment of the present application;
FIG. 11 is a block diagram of the apparatus unit provided in the eighth embodiment of the present application;
fig. 12 is a schematic diagram of an electronic device according to a ninth embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
Aiming at the existing gesture recognition scenes such as motion sensing game input and limb action detection, the application provides a method for obtaining a gesture classification model, a gesture recognition device corresponding to the method and electronic equipment in order to improve the recognition efficiency of a gesture recognition model to a target gesture and increase the application range of the gesture recognition model. The application also provides a gesture recognition method, a gesture recognition device corresponding to the method and electronic equipment. The application also provides a gesture recognition method, a device corresponding to the gesture recognition method and electronic equipment. The following provides embodiments to explain the method, apparatus, and electronic device in detail.
The first embodiment of the present application provides a method for obtaining a posture classification model, which can be applied to a mobile device terminal for recognizing a human posture. Fig. 1 is a flowchart of a method for obtaining a pose classification model according to a first embodiment of the present application, and the method according to this embodiment is described in detail below with reference to fig. 1. The following description refers to embodiments for the purpose of illustrating the principles of the methods, and is not intended to be limiting in actual use.
As shown in fig. 1, the method for obtaining a pose classification model provided in this embodiment includes the following steps:
s101, performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture.
The key point detection is one of basic algorithms in the computer vision field, and means that preset key points of an identified object are detected and obtained through a preset detection algorithm, and the posture information of the identified object can be obtained through posture recognition of the key points obtained through detection. For example, the detection of human skeleton key points (position Estimation) widely applied includes the detection of human skeleton key points of multiple persons and the detection of human skeleton key points of single person, the positions of the important points of the human body, such as 'wrist', 'elbow', 'shoulder' and 'head', are set as key points in advance, in the application, the key points obtained through detection can describe the skeleton information of the human body, and the detection method can be applied to a plurality of scenes, such as intelligent video monitoring, a patient monitoring system, human-computer interaction, virtual reality, human animation, intelligent home, intelligent security, athlete auxiliary training and the like.
In this embodiment, the key point detection model may be a bone key point detection model openpos, which is a real-time multi-user key point detection model.
The predetermined gesture refers to a gesture type preset in a specific application scene, for example, for a dance type application in a motion sensing game, the common situation is as follows: and indicating the user station to detect and match human skeleton key points at a preset position, then tracking and identifying the dance action of the user by using the detected human skeleton key points, comparing and grading the identified dance action with the dance action preset by the machine, wherein the preset dance action is a preset gesture in the process.
The source of the image containing the predetermined gesture may be a public data set or a private data set, for example, an image of a dance action pre-recorded by a camera, or an image in the public data set. The image type containing the predetermined pose may be an RGB image or a YUV image. In this embodiment, the image containing the predetermined gesture is an RGB image containing a predetermined dance motion.
In this embodiment, the above gesture recognition of the image including the predetermined gesture by the keypoint detection model may be performed as follows:
the key point detection is carried out on the image containing the preset human body posture through the skeleton key point detection model OpenPose, the key points in the image containing the preset human body posture are obtained, the process is substantially the position of each skeleton key point of the human body, and a foundation is provided for practical scenes such as follow-up further action recognition, action abnormity detection, intelligent monitoring, automatic driving and the like, and the method specifically can be as follows: the image containing the preset human body posture is used as the input of a skeleton key point detection model OpenPose, and the horizontal and vertical coordinates of each skeleton key point of the human body in the image are output; and performing gesture recognition on the obtained key points through a motion matching algorithm to obtain a recognition result, wherein the recognition result is the gesture label of the image containing the preset gesture.
And S102, performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
After the pose tag of the image containing the preset pose is obtained through the above steps, the step is used for performing model training on the image containing the preset pose and the pose tag of the image, and obtaining a pose classification model capable of classifying images of the same category of the image containing the preset pose.
The images of the same category including the images of the predetermined gesture mean that the gestures included in the images have the same category as the predetermined gesture, for example, the predetermined gesture is a dance motion preset in the motion sensing game, and the images of the same category can be the acquired dance motion of the user.
In this embodiment, the process of performing model training according to the image containing the predetermined pose and the pose tag as shown in fig. 2 includes the following steps:
and S1021, performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture.
In this embodiment, the characterizing process of the image including the predetermined pose includes the following steps:
first, an image containing a predetermined pose is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. In this embodiment, the image with the predetermined pose is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in this embodiment, in the subsequent processing of the image, it is the texture of the image rather than the color of the image, so the RGB image with the predetermined pose needs to be converted into the YUV image, and specifically, the conversion can be performed by the following formula:
Y=0.299R+0.587G+0.114B;
U=-0.147R-0.289G+0.436B;
V=0.615R-0.515G-0.100B。
and secondly, extracting a moving target based on the Y component data of the YUV image to obtain the contour data contained in the YUV image. The method for extracting the moving target based on the Y component data of the YUV image is more, for example, the moving target is extracted by a background difference algorithm, which specifically includes: selecting a proper background image, carrying out differential operation on the current frame and the background image to obtain a differential image, selecting a proper threshold value, and carrying out binarization on the differential image; the method can also extract the moving object by an optical flow method, and the method mainly aims to calculate an optical flow field, namely, under the condition of proper smoothness constraint, a motion field is estimated according to the space-time gradient of an image sequence, and the moving object and a scene are detected and segmented by analyzing the change of the motion field.
In this embodiment, the moving object is extracted by an interframe difference algorithm, and the basic principle is as follows: and extracting a motion area in the image by closed-value conversion by adopting pixel-based time difference between two or three adjacent frames of images in the image sequence. The specific method comprises the following steps: subtracting pixel values corresponding to adjacent frames to obtain a difference image, carrying out binarization processing on the difference image, and under the condition that the change of the environmental brightness is not large, if the change of the corresponding pixel value is less than a preset threshold value, considering the pixel as a background pixel; if the pixel value change of the image area is large, the change is considered to be caused by a moving object in the image, the image area is marked as a foreground pixel, and the position of the moving object in the image can be determined by utilizing the marked pixel area.
And finally, carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics. The normalization process aims to convert the YUV images into corresponding unique standard forms, and the unique standard forms have invariant characteristics to affine transformations such as translation, rotation and scaling.
And S1022, performing model training on a preset classification model by taking the posture features and the posture labels as training samples.
After the pose features and the pose labels of the image containing the preset pose are obtained through the steps, the steps are used for performing model training by taking the pose features and the pose labels obtained through the steps as training samples.
The predetermined classification model refers to a preselected image classification model which is trained and has a perfect image classification function, for example, an image classification model trained by using imagenet pictures and coffee models. In this embodiment, the method for performing model training on a predetermined classification model using the pose features and the pose labels as training samples includes: and performing transfer learning on the preselected image classification model which is trained and has a perfect image classification function according to the posture characteristics and the posture labels of the images containing the preset postures to obtain a new posture classification model which is required by the embodiment and can classify the images of the same category of the images containing the preset postures.
The process of the transfer learning comprises the following steps: and training the data set formed by the posture characteristics and the posture labels of the images containing the preset postures based on the preselected image classification model which is trained and has a perfect image classification function, and correspondingly adjusting the network architecture and other aspects of the image classification model according to the output requirements to obtain the posture classification model capable of classifying the images of the same category of the images containing the preset postures.
In the method for obtaining a pose classification model provided in this embodiment, a key point detection model (e.g., a bone key point detection model openpos) is used to perform pose recognition on an image including a predetermined pose, so as to obtain a pose tag of the image including the predetermined pose; performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture; and taking the posture features and the posture labels as training samples, and performing transfer learning on a preselected image classification model which is trained and has a perfect image classification function to obtain a posture classification model which is required by the embodiment and can classify the images of the same category including the images of the preset posture. Compared with the existing gesture recognition model based on the skeleton key points, the gesture classification model has a simpler network structure and lower requirements on computing resources, and can run on mobile equipment in real time; in addition, the process of recognizing the human body posture by the posture classification model is only a process of posture recognition matching, human skeleton key point detection is not needed, the recognition process is simple, the recognition efficiency is high, and the processing time of a single-frame image is reduced.
The second embodiment of the present application provides a gesture recognition method, which is applicable to a gesture recognition scenario of a mobile device. Fig. 3 is a flowchart of a method provided in a second embodiment of the present application, and the method provided in this embodiment is described in detail below with reference to fig. 3.
As shown in fig. 3, the gesture recognition method provided in this embodiment includes the following steps:
s201, acquiring an image to be recognized, which needs gesture recognition.
The method comprises the step of acquiring the image to be recognized, which needs gesture recognition. The image to be recognized can be an image in any format containing posture information, such as a user limb action image captured in a motion sensing game.
S202, carrying out posture classification on the image to be recognized through a posture classification model to obtain a posture classification result of the image to be recognized.
The method comprises the following steps of carrying out gesture classification on the image to be recognized obtained in the previous step through a pre-trained gesture classification model, and obtaining a recognition result of the image to be recognized.
The posture classification model is obtained by the following steps: performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture; model training is performed according to the image with the predetermined posture and the posture label of the image to obtain a posture classification model, which is the posture classification model obtained in the first embodiment, and the detailed contents of this part refer to the related description provided in the first embodiment and are not repeated herein.
In this embodiment, the process of performing pose classification on the image to be recognized through the pose classification model is shown in fig. 4, and includes the following processes:
s2021, performing characterization processing on the image to be recognized to obtain the posture features contained in the image to be recognized.
In this embodiment, the characterizing process of the image to be recognized includes the following steps:
firstly, an image to be recognized is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. The image to be recognized is an image in any format that can be converted into a YUV image, in this embodiment, the image to be recognized is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in this embodiment, in the subsequent processing of the image, the texture of the image, not the color of the image, is mainly used, so the RGB image including the predetermined posture needs to be converted into the YUV image, and specifically, the conversion can be performed by the following formula:
Y=0.299R+0.587G+0.114B;
U=-0.147R-0.289G+0.436B;
V=0.615R-0.515G-0.100B。
and secondly, extracting a moving target based on the Y component data of the YUV image to obtain the contour data contained in the YUV image. For example, the moving object may be extracted by a background difference algorithm, specifically: selecting a proper background image, carrying out differential operation on the current frame and the background image to obtain a differential image, selecting a proper threshold value, and carrying out binarization on the differential image; the method can also extract the moving object by an optical flow method, and the method mainly aims to calculate an optical flow field, namely, under the condition of proper smoothness constraint, a motion field is estimated according to the space-time gradient of an image sequence, and the moving object and a scene are detected and segmented by analyzing the change of the motion field.
In this embodiment, the moving object is extracted by an interframe difference algorithm, and the basic principle is as follows: and extracting a motion area in the image by closed-value conversion by adopting pixel-based time difference between two or three adjacent frames of images in the image sequence. The specific method comprises the following steps: subtracting pixel values corresponding to adjacent frames to obtain a difference image, carrying out binarization processing on the difference image, and under the condition that the change of the environmental brightness is not large, if the change of the corresponding pixel value is less than a preset threshold value, considering the pixel as a background pixel; if the pixel value change of the image area is large, the change is considered to be caused by a moving object in the image, the image area is marked as a foreground pixel, and the position of the moving object in the image can be determined by utilizing the marked pixel area.
And finally, carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics. The normalization process aims to convert the YUV images into corresponding unique standard forms which have invariant characteristics to affine transformations such as translation, rotation, scaling and the like.
And S2022, inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification, and obtaining a classification result of the image to be recognized.
The third embodiment of the present application provides a gesture recognition method. Fig. 5 is a flowchart of a method provided in a third embodiment of the present application, and the method provided in this embodiment is described in detail below with reference to fig. 5.
As shown in fig. 5, the gesture recognition method provided in this embodiment includes the following steps:
s301, performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture.
The key point detection means that preset key points of the identified object are detected and obtained through a preset detection algorithm, and the posture information of the identified object can be obtained through posture recognition of the key points obtained through detection.
The predetermined gesture refers to a gesture type preset in a specific application scene, for example, for a dance type application in a motion sensing game, the common situation is as follows: and indicating the user station to detect and match human skeleton key points at a preset position, then tracking and identifying the dance action of the user by using the detected human skeleton key points, comparing and grading the identified dance action with the dance action preset by the machine, wherein the preset dance action is a preset gesture in the process.
The source of the image containing the predetermined gesture may be a public data set or a private data set, for example, a dance motion image pre-recorded through a camera, or a dance motion image in the public data set. The image type containing the predetermined pose may be an RGB image or a YUV image. In this embodiment, the image containing the predetermined gesture is an RGB image containing a predetermined dance motion.
The above gesture recognition of the image containing the predetermined gesture by the keypoint detection model may be as follows: the key point detection is carried out on the image containing the preset human body posture through the skeleton key point detection model OpenPose, the key points in the image containing the preset human body posture are obtained, the process is substantially the position of each skeleton key point of the human body, a foundation is provided for practical scenes such as follow-up further action recognition, action abnormity detection, intelligent monitoring, automatic driving and the like, and the specific implementation mode can be as follows: the image containing the preset human body posture is used as the input of a skeleton key point detection model OpenPose, and the horizontal and vertical coordinates of each skeleton key point of the human body in the image are output; and performing gesture recognition on the obtained key points through a motion matching algorithm to obtain a recognition result, wherein the recognition result is the gesture label of the image containing the preset gesture.
S302, performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
After the pose tag of the image containing the preset pose is obtained through the above steps, the step is used for performing model training on the image containing the preset pose and the pose tag of the image, and obtaining a pose classification model capable of classifying images of the same category of the image containing the preset pose.
In this embodiment, the process of performing model training according to the image containing the predetermined pose and the pose tag as shown in fig. 6 includes the following steps:
and S3021, performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture.
The image containing the preset gesture is characterized, and the method comprises the following steps:
first, an image containing a predetermined pose is converted into a YUV image. The YUV image is divided into three components, the "Y" component representing brightness, i.e., a gray value, and the "U" and "V" components representing color and saturation, for specifying the color of a pixel. In this embodiment, the image with the predetermined pose is an RGB image, and compared with the RGB image, the data storage space and the data transmission bandwidth occupied by the YUV image are much smaller, and in the subsequent processing of the image in this embodiment, the texture of the image is mainly played instead of the color of the image, so the RGB image with the predetermined pose needs to be converted into the YUV image, which can be specifically converted by the following formula:
Y=0.299R+0.587G+0.114B;
U=-0.147R-0.289G+0.436B;
V=0.615R-0.515G-0.100B。
and secondly, extracting a moving target based on the Y component data of the YUV image to obtain the contour data contained in the YUV image. The method for extracting the moving target based on the Y component data of the YUV image is more, for example, the moving target is extracted by a background difference algorithm, which specifically includes: selecting a proper background image, carrying out differential operation on the current frame and the background image to obtain a differential image, selecting a proper threshold value, and carrying out binarization on the differential image; the method can also extract the moving object by an optical flow method, and the method mainly aims to calculate an optical flow field, namely, under the condition of proper smoothness constraint, a motion field is estimated according to the space-time gradient of an image sequence, and the moving object and a scene are detected and segmented by analyzing the change of the motion field.
In this embodiment, the moving object is extracted by an interframe difference algorithm, and the basic principle is as follows: and extracting a motion area in the image by closed-value conversion by adopting pixel-based time difference between two or three adjacent frames of images in the image sequence. The specific method comprises the following steps: subtracting pixel values corresponding to adjacent frame images to obtain a difference image, carrying out binarization processing on the difference image, and under the condition that the environmental brightness does not change much, if the change of the corresponding pixel value is less than a preset threshold value, considering the pixel as a background pixel; if the pixel value change of the image area is large, the change is considered to be caused by a moving object in the image, the image area is marked as a foreground pixel, and the position of the moving object in the image can be determined by utilizing the marked pixel area.
And finally, carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics. The normalization process aims to convert the YUV images into corresponding unique standard forms which have invariant characteristics to affine transformations such as translation, rotation, scaling and the like.
And S3022, performing model training on the preset classification model by taking the posture features and the posture labels as training samples.
After the pose features and the pose labels of the image containing the preset pose are obtained through the steps, the steps are used for performing model training by taking the pose features and the pose labels obtained through the steps as training samples.
The predetermined classification model refers to a preselected image classification model which is trained and has a perfect image classification function, for example, an image classification model trained by using imagenet pictures and coffee models. In this embodiment, the method for performing model training on a predetermined classification model using the pose features and the pose labels as training samples includes: and performing transfer learning on the preselected image classification model which is trained and has a perfect image classification function according to the posture characteristics and the posture labels of the images containing the preset postures to obtain a posture classification model which is required by the embodiment and can classify the images of the same category of the images containing the preset postures.
The process of the transfer learning comprises the following steps: and training the image classification model which is selected in advance, is trained on the basis of the image classification model which is trained completely and has a perfect image classification function on the basis of the new data set formed by the posture features and the posture labels of the images containing the preset postures, and correspondingly adjusting the network architecture and other aspects of the image classification model according to the output requirements to obtain a new posture classification model capable of classifying the images of the same category of the images containing the preset postures.
S303, carrying out posture classification on the image to be recognized through the posture classification model to obtain a posture classification result of the image to be recognized.
The image to be recognized may be an image containing posture information, such as a user limb motion image captured by a camera in a motion sensing game.
In this embodiment, the process of performing pose classification on the image to be recognized through the pose classification model may be as follows:
the image to be recognized is characterized in the same way as the image including the predetermined pose in the step S3021, and the pose feature included in the image to be recognized is obtained, where the pose feature included in the image to be recognized is the same type of feature set as the pose feature of the image including the predetermined pose. The process of obtaining the posture features contained in the image to be recognized is as follows: converting an image to be identified into a YUV image; extracting a moving target based on the Y component data of the YUV image to obtain contour data included in the YUV image, for example, extracting the moving target by a background difference algorithm or extracting the moving target by an optical flow method, in this embodiment, extracting the moving target by an inter-frame difference algorithm; and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics contained in the image to be recognized.
And inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification to obtain a classification result of the image to be recognized.
The fourth embodiment of the present application also provides a device for obtaining a pose classification model, which is substantially similar to the method embodiment and therefore is relatively simple to describe, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the device embodiment is only illustrative.
Referring to fig. 7, to understand the embodiment, fig. 7 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 7, the apparatus provided in the embodiment includes:
a pose tag obtaining unit 401, configured to perform pose recognition on an image including a predetermined pose through the key point detection model, and obtain a pose tag of the image including the predetermined pose;
a pose classification model obtaining unit 402, configured to perform model training according to the image and the pose tag that include the predetermined pose, and obtain a pose classification model.
The posture classification model obtaining unit 402 includes:
the image posture characteristic obtaining subunit is used for carrying out characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and the model training subunit is used for performing model training on the preset classification model by taking the posture characteristics and the posture labels as training samples.
The predetermined classification model is a trained image classification model, and the model training subunit is specifically configured to: and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.
The attitude feature obtaining subunit of the image includes:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
The extracting of the moving target based on the Y component data of the YUV image comprises the following steps:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
The posture tag obtaining unit 401 includes:
the key point obtaining subunit is used for performing key point detection on the image containing the preset posture through the key point detection model to obtain key points in the image containing the preset posture;
and the label obtaining subunit is used for carrying out gesture recognition on the key points through an action matching algorithm to obtain a gesture label of the image containing the preset gesture.
In the foregoing embodiment, a method and a device for obtaining a pose classification model are provided, and in addition, a fifth embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:
please refer to fig. 8 for understanding the present embodiment, fig. 8 is a schematic view of an electronic device provided in the present embodiment.
As shown in fig. 8, the electronic apparatus includes: a processor 501; a memory 502;
a memory 502 for storing a program for obtaining a pose classification model, which when read executed by the processor 501 performs the following operations:
performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
For example, the electronic device is a computer, and the computer can perform pose recognition on an image containing a predetermined pose through a key point detection model to obtain a pose tag of the image containing the predetermined pose; and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
Optionally, performing model training according to the image containing the predetermined pose and the pose tag, including:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
Optionally, the predetermined classification model is an image classification model after training, and the model training is performed on the predetermined classification model by using the posture feature and the posture label as training samples, including:
and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.
Optionally, the characterizing the image including the predetermined pose includes:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
Optionally, performing pose recognition on the image including the predetermined pose through the key point detection model, including:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain a gesture label of the image containing the preset gesture.
The sixth embodiment of the present application also provides a gesture recognition apparatus, since the apparatus embodiment is substantially similar to the method embodiment, so that the description is relatively simple, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the apparatus embodiment is only illustrative.
Please refer to fig. 9 for understanding the embodiment, fig. 9 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 9, the apparatus provided in the embodiment includes:
an image to be recognized obtaining unit 601, configured to obtain an image to be recognized that needs gesture recognition;
a pose classification result obtaining unit 602, configured to perform pose classification on the image to be recognized through the pose classification model, and obtain a pose classification result of the image to be recognized.
The posture classification model is obtained by the following steps:
performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
The posture classification result obtaining unit 602 includes:
the gesture feature obtaining subunit is used for performing characterization processing on the image to be recognized to obtain the gesture features contained in the image to be recognized;
and the gesture classification subunit is used for inputting the gesture features contained in the image to be recognized into the gesture classification model for gesture classification.
The posture feature obtaining subunit included in the image to be recognized is specifically configured to:
converting an image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
The extracting of the moving target based on the Y component data of the YUV image comprises the following steps:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
In the foregoing embodiment, a gesture recognition method and a gesture recognition apparatus are provided, and in addition, a seventh embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:
please refer to fig. 10 for understanding the present embodiment, fig. 10 is a schematic view of an electronic device provided in the present embodiment.
As shown in fig. 10, the electronic apparatus includes: a processor 701; a memory 702;
the memory 702 is used for storing a program for obtaining a pose classification model, which when read and executed by the processor 701, performs the following operations:
acquiring an image to be recognized, which needs gesture recognition;
carrying out attitude classification on the image to be recognized through an attitude classification model to obtain an attitude classification result of the image to be recognized;
wherein the posture classification model is obtained by the following method:
performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
Optionally, the gesture classification of the image to be recognized through the gesture classification model includes:
performing characterization processing on an image to be recognized to obtain attitude characteristics contained in the image to be recognized;
and inputting the posture characteristics contained in the image to be recognized into a posture classification model for posture classification.
Optionally, the characterizing the image to be recognized includes:
converting an image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
The eighth embodiment of the present application further provides a gesture recognition apparatus, since the apparatus embodiment is substantially similar to the method embodiment, and therefore the description is relatively simple, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the apparatus embodiment is only illustrative.
Please refer to fig. 11 for understanding the embodiment, fig. 11 is a block diagram of a unit of the apparatus provided in the embodiment, and as shown in fig. 11, the apparatus provided in the embodiment includes:
a pose tag obtaining unit 801, configured to perform pose recognition on an image including a predetermined pose through the key point detection model, and obtain a pose tag of the image including the predetermined pose;
a pose classification model obtaining unit 802, configured to perform model training according to an image including a predetermined pose and a pose tag, to obtain a pose classification model;
a pose classification result obtaining unit 803, configured to perform pose classification on the image to be recognized through the pose classification model, and obtain a pose classification result of the image to be recognized.
The pose classification model obtaining unit 802 includes:
the gesture feature obtaining subunit is used for performing characterization processing on the image containing the preset gesture to obtain a gesture feature of the image containing the preset gesture;
and the model training subunit is used for performing model training on the preset classification model by taking the posture characteristics and the posture labels as training samples.
The predetermined classification model is an image classification model after training, and the model training subunit is specifically configured to:
and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.
A model training subunit comprising:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
The extracting of the moving target based on the Y component data of the YUV image comprises the following steps:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
The posture classification result obtaining unit 803 includes:
the gesture feature obtaining subunit is used for performing characterization processing on the image to be recognized to obtain the gesture features contained in the image to be recognized;
the gesture classification subunit is used for inputting gesture features contained in the image to be recognized into a gesture classification model for gesture classification;
the gesture features contained in the image to be recognized and the gesture features of the image containing the preset gesture are the same type of feature set.
The posture characteristic obtaining subunit included in the image to be recognized comprises:
converting an image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
The extracting of the moving target based on the Y component data of the YUV image comprises the following steps:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
The posture tag obtaining unit 801 includes:
the key point obtaining subunit is used for performing key point detection on the image containing the preset posture through the key point detection model to obtain key points in the image containing the preset posture;
and the gesture tag obtaining subunit is used for performing gesture recognition on the key points through a motion matching algorithm to obtain a gesture tag of the image containing the preset gesture.
In the foregoing embodiment, a gesture recognition method and a gesture recognition apparatus are provided, and in addition, a ninth embodiment of the present application further provides an electronic device, where the embodiment of the electronic device is as follows:
please refer to fig. 12 for understanding the present embodiment, fig. 12 is a schematic view of an electronic device provided in the present embodiment.
As shown in fig. 12, the electronic apparatus includes: a processor 901; a memory 902;
the memory 902 is used for storing a program for obtaining a pose classification model, which when read and executed by the processor 901 performs the following operations:
performing gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture; performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model; and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
Optionally, performing model training according to the image containing the predetermined pose and the pose tag, including:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
Optionally, the predetermined classification model is an image classification model after training, and the model training is performed on the predetermined classification model by using the posture feature and the posture label as training samples, including:
and carrying out transfer learning on the trained image classification model according to the training samples to obtain a posture classification model.
Optionally, the characterizing the image including the predetermined pose includes:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
Optionally, the gesture classification of the image to be recognized through the gesture classification model includes:
performing characterization processing on an image to be recognized to obtain attitude characteristics contained in the image to be recognized;
inputting the posture characteristics contained in the image to be recognized into a posture classification model for posture classification;
the gesture features contained in the image to be recognized and the gesture features of the image containing the preset gesture are the same type of feature set.
Optionally, the characterizing the image to be recognized includes:
converting an image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
Optionally, extracting a moving target based on the Y component data of the YUV image includes:
extracting a moving target by an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target by a background difference algorithm based on Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting the moving object by an optical flow method based on the Y component data of the YUV image.
Optionally, performing pose recognition on the image including the predetermined pose through the key point detection model, including:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain a gesture label of the image containing the preset gesture.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims (22)

1. A method of obtaining a pose classification model, comprising:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
2. The method of claim 1, wherein the model training from the image containing the predetermined pose and the pose tag comprises:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
3. The method of claim 2, wherein the predetermined classification model is a trained image classification model, and the model training of the predetermined classification model using the pose features and the pose labels as training samples comprises:
and carrying out transfer learning on the trained image classification model according to the training sample to obtain a posture classification model.
4. The method of claim 2, wherein the characterizing the image containing the predetermined pose comprises:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
5. The method of claim 4, wherein extracting a motion target based on Y component data of the YUV image comprises:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
6. The method of claim 1, wherein the performing pose recognition on the image containing the predetermined pose by the keypoint detection model comprises:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain the gesture tag of the image containing the preset gesture.
7. A gesture recognition method, comprising:
acquiring an image to be recognized, which needs gesture recognition;
carrying out attitude classification on the image to be recognized through an attitude classification model to obtain an attitude classification result of the image to be recognized;
wherein the posture classification model is obtained by:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
and performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
8. The method of claim 7, wherein the pose classification of the image to be recognized by the pose classification model comprises:
performing characterization processing on the image to be recognized to obtain attitude characteristics contained in the image to be recognized;
and inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification.
9. The method according to claim 8, wherein the characterizing the image to be recognized comprises:
converting the image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
10. The method of claim 9, wherein extracting a motion target based on Y component data of the YUV images comprises:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
11. A gesture recognition method, comprising:
performing gesture recognition on an image containing a preset gesture through a key point detection model to obtain a gesture tag of the image containing the preset gesture;
performing model training according to the image containing the preset posture and the posture label to obtain a posture classification model;
and carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
12. The method of claim 11, wherein the model training from the image containing the predetermined pose and the pose tag comprises:
performing characterization processing on the image containing the preset posture to obtain the posture characteristic of the image containing the preset posture;
and taking the posture features and the posture labels as training samples, and carrying out model training on a preset classification model.
13. The method of claim 12, wherein the predetermined classification model is a trained image classification model, and the model training of the predetermined classification model using the pose features and the pose labels as training samples comprises:
and carrying out transfer learning on the trained image classification model according to the training sample to obtain a posture classification model.
14. The method of claim 12, wherein the characterizing the image containing the predetermined pose comprises:
converting the image containing the preset gesture into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
15. The method of claim 14, wherein extracting a motion target based on Y component data of the YUV images comprises:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
16. The method according to any one of claims 12-15, wherein the pose classification of the image to be recognized by the pose classification model comprises:
performing characterization processing on the image to be recognized to obtain attitude characteristics contained in the image to be recognized;
inputting the posture characteristics contained in the image to be recognized into the posture classification model for posture classification;
the gesture features contained in the image to be recognized and the gesture features of the image containing the preset gesture are the same type of feature set.
17. The method according to claim 16, wherein the characterizing the image to be recognized comprises:
converting the image to be identified into a YUV image;
extracting a moving target based on Y component data of the YUV image to obtain contour data contained in the YUV image;
and carrying out normalization processing on the contour data contained in the YUV image to obtain the attitude characteristics.
18. The method of claim 17, wherein extracting a motion target based on Y component data of the YUV images comprises:
extracting a moving target through an interframe difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
extracting a moving target through a background difference algorithm based on the Y component data of the YUV image; alternatively, the first and second electrodes may be,
and extracting a moving target by an optical flow method based on the Y component data of the YUV image.
19. The method of claim 11, wherein the performing pose recognition on the image containing the predetermined pose by the keypoint detection model comprises:
performing key point detection on the image containing the preset posture through a key point detection model to obtain key points in the image containing the preset posture;
and performing gesture recognition on the key points through a motion matching algorithm to obtain the gesture tag of the image containing the preset gesture.
20. An apparatus for obtaining a pose classification model, comprising:
the gesture tag obtaining unit is used for carrying out gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
and the posture classification model obtaining unit is used for carrying out model training according to the image containing the preset posture and the posture label to obtain a posture classification model.
21. An attitude recognition apparatus characterized by comprising:
the device comprises a to-be-recognized image obtaining unit, a to-be-recognized image acquiring unit and a gesture recognizing unit, wherein the to-be-recognized image obtaining unit is used for obtaining an image to be recognized which needs gesture recognition;
and the gesture classification result obtaining unit is used for carrying out gesture classification on the image to be recognized through a gesture classification model to obtain a gesture classification result of the image to be recognized.
22. An attitude recognition apparatus characterized by comprising:
the gesture tag obtaining unit is used for carrying out gesture recognition on the image containing the preset gesture through the key point detection model to obtain a gesture tag of the image containing the preset gesture;
the attitude classification model obtaining unit is used for carrying out model training according to the image containing the preset attitude and the attitude label to obtain an attitude classification model;
and the attitude classification result obtaining unit is used for carrying out attitude classification on the image to be recognized through the attitude classification model to obtain an attitude classification result of the image to be recognized.
CN201810958437.4A 2018-08-22 2018-08-22 Method and device for obtaining attitude classification model Pending CN110858277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810958437.4A CN110858277A (en) 2018-08-22 2018-08-22 Method and device for obtaining attitude classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810958437.4A CN110858277A (en) 2018-08-22 2018-08-22 Method and device for obtaining attitude classification model

Publications (1)

Publication Number Publication Date
CN110858277A true CN110858277A (en) 2020-03-03

Family

ID=69635782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810958437.4A Pending CN110858277A (en) 2018-08-22 2018-08-22 Method and device for obtaining attitude classification model

Country Status (1)

Country Link
CN (1) CN110858277A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899192A (en) * 2020-07-23 2020-11-06 北京字节跳动网络技术有限公司 Interaction method, interaction device, electronic equipment and computer-readable storage medium
CN111931725A (en) * 2020-09-23 2020-11-13 北京无垠创新科技有限责任公司 Human body action recognition method, device and storage medium
CN112102947A (en) * 2020-04-13 2020-12-18 国家体育总局体育科学研究所 Apparatus and method for body posture assessment
CN113190104A (en) * 2021-01-18 2021-07-30 郭奕忠 Method for realizing man-machine interaction by recognizing human actions through visual analysis by intelligent equipment
CN114393575A (en) * 2021-12-17 2022-04-26 重庆特斯联智慧科技股份有限公司 Robot control method and system based on high-efficiency recognition of user posture
CN115270997A (en) * 2022-09-20 2022-11-01 中国人民解放军32035部队 Rocket target attitude stability discrimination method based on transfer learning and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646425A (en) * 2013-11-20 2014-03-19 深圳先进技术研究院 A method and a system for body feeling interaction
CN104616028A (en) * 2014-10-14 2015-05-13 北京中科盘古科技发展有限公司 Method for recognizing posture and action of human limbs based on space division study
CN107609479A (en) * 2017-08-09 2018-01-19 上海交通大学 Attitude estimation method and system based on the sparse Gaussian process with noise inputs
CN108062526A (en) * 2017-12-15 2018-05-22 厦门美图之家科技有限公司 A kind of estimation method of human posture and mobile terminal
CN108304819A (en) * 2018-02-12 2018-07-20 北京易真学思教育科技有限公司 Gesture recognition system and method, storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646425A (en) * 2013-11-20 2014-03-19 深圳先进技术研究院 A method and a system for body feeling interaction
CN104616028A (en) * 2014-10-14 2015-05-13 北京中科盘古科技发展有限公司 Method for recognizing posture and action of human limbs based on space division study
CN107609479A (en) * 2017-08-09 2018-01-19 上海交通大学 Attitude estimation method and system based on the sparse Gaussian process with noise inputs
CN108062526A (en) * 2017-12-15 2018-05-22 厦门美图之家科技有限公司 A kind of estimation method of human posture and mobile terminal
CN108304819A (en) * 2018-02-12 2018-07-20 北京易真学思教育科技有限公司 Gesture recognition system and method, storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭钧 等: "基于多神经网络融合的运动人体姿态识别" *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102947A (en) * 2020-04-13 2020-12-18 国家体育总局体育科学研究所 Apparatus and method for body posture assessment
CN112102947B (en) * 2020-04-13 2024-02-13 国家体育总局体育科学研究所 Apparatus and method for body posture assessment
CN111899192A (en) * 2020-07-23 2020-11-06 北京字节跳动网络技术有限公司 Interaction method, interaction device, electronic equipment and computer-readable storage medium
CN111899192B (en) * 2020-07-23 2022-02-01 北京字节跳动网络技术有限公司 Interaction method, interaction device, electronic equipment and computer-readable storage medium
US11842425B2 (en) 2020-07-23 2023-12-12 Beijing Bytedance Network Technology Co., Ltd. Interaction method and apparatus, and electronic device and computer-readable storage medium
CN111931725A (en) * 2020-09-23 2020-11-13 北京无垠创新科技有限责任公司 Human body action recognition method, device and storage medium
CN111931725B (en) * 2020-09-23 2023-10-13 北京无垠创新科技有限责任公司 Human motion recognition method, device and storage medium
CN113190104A (en) * 2021-01-18 2021-07-30 郭奕忠 Method for realizing man-machine interaction by recognizing human actions through visual analysis by intelligent equipment
CN114393575A (en) * 2021-12-17 2022-04-26 重庆特斯联智慧科技股份有限公司 Robot control method and system based on high-efficiency recognition of user posture
CN114393575B (en) * 2021-12-17 2024-04-02 重庆特斯联智慧科技股份有限公司 Robot control method and system based on high-efficiency recognition of user gestures
CN115270997A (en) * 2022-09-20 2022-11-01 中国人民解放军32035部队 Rocket target attitude stability discrimination method based on transfer learning and related device
CN115270997B (en) * 2022-09-20 2022-12-27 中国人民解放军32035部队 Rocket target attitude stability discrimination method based on transfer learning and related device

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN110858277A (en) Method and device for obtaining attitude classification model
JP6636154B2 (en) Face image processing method and apparatus, and storage medium
CN108701376B (en) Recognition-based object segmentation of three-dimensional images
Oreifej et al. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences
WO2018177379A1 (en) Gesture recognition, gesture control and neural network training methods and apparatuses, and electronic device
Singh et al. Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods
US9710698B2 (en) Method, apparatus and computer program product for human-face features extraction
Li et al. Finding the secret of image saliency in the frequency domain
CN109684925B (en) Depth image-based human face living body detection method and device
CN108229324B (en) Gesture tracking method and device, electronic equipment and computer storage medium
WO2019133403A1 (en) Multi-resolution feature description for object recognition
US9639943B1 (en) Scanning of a handheld object for 3-dimensional reconstruction
GB2555136A (en) A method for analysing media content
CN107358189B (en) Object detection method in indoor environment based on multi-view target extraction
JP2007148663A (en) Object-tracking device, object-tracking method, and program
Li et al. Treat samples differently: Object tracking with semi-supervised online covboost
US11157765B2 (en) Method and system for determining physical characteristics of objects
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
JP2013037539A (en) Image feature amount extraction device and program thereof
Jetley et al. 3D activity recognition using motion history and binary shape templates
CN112965602A (en) Gesture-based human-computer interaction method and device
US20220207917A1 (en) Facial expression image processing method and apparatus, and electronic device
Fan et al. A feature-based object tracking approach for realtime image processing on mobile devices
CN112580395A (en) Depth information-based 3D face living body recognition method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination