CN115798033A

CN115798033A - Piano training method, system, equipment and storage medium based on gesture recognition

Info

Publication number: CN115798033A
Application number: CN202111060343.3A
Authority: CN
Inventors: 孙晓静; 黄龙祥; 汪博; 朱力; 吕方璐
Original assignee: Shenzhen Guangjian Technology Co Ltd
Current assignee: Shenzhen Guangjian Technology Co Ltd
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2023-03-14

Abstract

The invention provides a piano training method, a system, equipment and a storage medium based on gesture recognition, which comprises the following steps: acquiring a hand motion video of a user when playing a piano, and extracting a key frame image of hand motion from the hand motion video; extracting a plurality of key points of a hand from each key frame image and generating a corresponding key point sequence; identifying a corresponding standard key point sequence according to the key point sequence, and comparing key points in each key point sequence with key points in the standard key point sequence to generate a standard degree of hand gesture; and extracting the key frame images with the standard degree lower than a preset threshold value to generate error gesture images, and generating training guidance information according to the error gesture images and the corresponding standard images. The gesture evaluation method and the gesture evaluation device can obtain the gesture evaluation condition of the user in real time, and are convenient for the user to carry out targeted training.

Description

Piano training method, system, equipment and storage medium based on gesture recognition

Technical Field

The invention relates to computer vision, in particular to a piano training method, a system, equipment and a storage medium based on gesture recognition.

Background

With the improvement of computer hardware computing power, strong data processing and learning capabilities are shown by relying on big data deep learning. Under the promotion of deep learning, various technologies of artificial intelligence, such as voice recognition technology, image recognition technology, data mining technology, etc., have been substantially developed and successfully applied to various products. In the field of computer vision, deep learning has become a key point and a focus of various researches, and is one of the commonly used methods for solving the problem of complex environment. Computer vision, a milestone in the history of human science and technology development, plays a crucial role in the development of intelligent technology, and undoubtedly receives extensive attention from both academic and industrial circles. In the existing deep learning method, the convolutional neural network obtains good results in the field of gesture recognition by virtue of own advantages.

However, in the field of education and teaching, teachers or parents are still used to supervise and guide students. In piano teaching, long-time, multi-frequency association is a fallib door to increase piano skills. This undoubtedly presents a high challenge to piano practicers, as piano practice is relatively tedious and requires the presence of professional guidance.

Traditional gesture recognition utilizes wearable equipment to realize its function, and wearable equipment uses loaded down with trivial details, the operation is inconvenient, can't provide comfortable, joyful use sense. In addition, in the using process, professional staff are required to participate in the whole process, and a user cannot visually see the training effect of the user. At present, an intelligent system for assisting a piano practicer to train a piano by utilizing a computer vision technology is not mature, one of main reasons is that the technology applied to the field of piano teaching is very few at present, and no people explore the application of combination between the two. Secondly, the method is limited by computer technology, and the large amount of deep learning calculation is usually based on cloud calculation. With the development of technology, embedded computing platforms can be transplanted to a convolutional neural network, so that the convolutional neural network can be deployed in embedded computing equipment.

Disclosure of Invention

In view of the defects in the prior art, the invention aims to provide a piano training method, a system, equipment and a storage medium based on gesture recognition.

The piano training method based on gesture recognition provided by the invention comprises the following steps:

step S1: acquiring a hand motion video of a user when playing a piano, and extracting a key frame image of hand motion from the hand motion video;

step S2: extracting a plurality of key points of a hand from each key frame image and generating a corresponding key point sequence;

and step S3: identifying a corresponding standard key point sequence according to the key point sequence, and comparing key points in each key point sequence with key points in the standard key point sequence to generate a standard degree of hand gesture;

and step S4: and extracting the key frame images with the standard degree lower than a preset threshold value to generate error gesture images, and generating training guidance information according to the error gesture images and the corresponding standard images.

Preferably, the key frame image includes at least:

-a gesture start point image;

-a gesture end point image;

a gesture pose image extracted according to a preset interval time between the gesture start image and the gesture end image.

Preferably, the step S2 includes the steps of:

step S201: acquiring a gesture attitude estimation model based on a convolutional neural network;

step S202: inputting each key frame image into the gesture posture estimation model, and extracting key points of the hand;

step S203: and extracting the key points of each key frame image to generate the coordinates of each key point.

Preferably, the step S3 includes the steps of:

step S301: performing action recognition according to the key point sequence to determine a corresponding standard action category, and determining a corresponding standard key point sequence according to the standard action category;

step S302: comparing the key point sequences with standard key point sequences one by one, and generating the distance between two corresponding key points;

step S303: and generating a standard degree for evaluating the hand gesture in the key frame image and the hand gesture in the corresponding standard image deviation according to the distance between all the corresponding two key points.

Preferably, the step S4 includes the steps of:

step S401: extracting the key frame image with the standard degree lower than a preset threshold value to generate an error gesture image;

step S402: generating a point location moving direction according to the distance between the corresponding key points in the error gesture image and the corresponding standard image;

step S403: and generating the training guidance information according to the plurality of point location moving directions.

Preferably, the method further comprises the following steps:

step S5: dividing the plurality of key frame images into a plurality of action groups according to the standard image, wherein each action group at least comprises a gesture starting point image and a gesture end point image;

step S6: and determining the completion time of each action group according to the gesture starting point image and the gesture end point image in each action group, and comparing the completion time of a plurality of action groups with the standard time corresponding to the standard image to determine the consistency value of the action.

Preferably, the training of the gesture recognition model includes the following steps:

step M1: acquiring a plurality of gesture posture images, marking key points on the gesture posture images, and dividing the marked gesture posture images into a training image set and a test image set according to the gesture posture images;

step M2: inputting the training image set into a convolutional neural network model for training to generate the gesture recognition model;

step M3: and testing the gesture recognition model according to the test image set to determine and store the optimal parameters of the gesture recognition model.

The piano training system based on gesture recognition provided by the invention comprises the following modules:

the key frame extraction module is used for acquiring a hand motion video of a user when the user plays a piano and extracting a key frame image of hand motion from the hand motion video;

the key point extraction module is used for extracting a plurality of key points of the hand from each key frame image and generating a corresponding key point sequence;

the standard degree generating module is used for identifying a corresponding standard key point sequence according to the key point sequence, comparing key points in each key point sequence with key points in the standard key point sequence and generating the standard degree of the hand gesture;

and the guide information generation module is used for extracting the key frame images with the standard degree lower than a preset threshold value to generate error gesture images, and generating training guide information according to the error gesture images and the corresponding standard images.

The piano training device based on gesture recognition provided by the invention comprises:

a processor;

a memory module having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the gesture-based recognition piano training method via execution of the executable instructions.

According to the present invention, there is provided a computer readable storage medium for storing a program, which when executed, performs the steps of the piano training method based on gesture recognition.

Compared with the prior art, the invention has the following beneficial effects:

according to the method, a hand motion video of a user playing a piano is collected, a key frame image of hand motion is extracted from the video, key points of the hand are extracted according to the key frame image, the key points in the image are compared with key points in a pre-stored standard key point sequence, the standard degree of hand posture is generated, then the key frame image with the standard degree lower than a preset threshold value is extracted to generate an error gesture image, training guidance information is generated according to the error gesture image and the corresponding standard image, namely, the gesture action evaluation condition of the user can be obtained in real time, and the method is convenient for the user to carry out targeted training.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts. Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of the steps of a piano training method based on gesture recognition in an embodiment of the present invention;

FIG. 2 is a flowchart of the steps of a piano training method based on gesture recognition in accordance with a variation of the present invention;

FIG. 3 is a flowchart illustrating a process of extracting key points of a hand according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the steps of generating the standard degree of hand gesture in an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the steps of training guidance information generation in an embodiment of the present invention;

FIG. 6 is a flowchart illustrating the steps of training a gesture recognition model according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a piano training system based on gesture recognition according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a piano training device based on gesture recognition in the embodiment of the present invention;

fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating steps of a piano training method based on gesture recognition in an embodiment of the present invention, and as shown in fig. 1, the piano training method based on gesture recognition provided in the present invention includes the following steps:

in an embodiment of the present invention, the key frame image includes at least:

-a gesture start point image;

-a gesture endpoint image;

fig. 3 is a flowchart of a step of extracting key points of a hand according to an embodiment of the present invention, and as shown in fig. 3, the step S2 includes the following steps:

fig. 4 is a flowchart of the step of generating the standard degree of the hand gesture in the embodiment of the present invention, and as shown in fig. 4, the step S3 includes the following steps:

In the embodiment of the present invention, in order to compare the key point sequence with the standard key point sequence, the storage order of the key point sequence is the same as that of the corresponding key point in the standard key point sequence.

And step S4: extracting key frame images with the standard degree lower than a preset threshold value to generate error gesture images, and generating training guidance information according to the error gesture images and the corresponding standard images;

fig. 5 is a flowchart of a step of generating training guidance information in the embodiment of the present invention, and as shown in fig. 5, the step S4 includes the following steps:

Fig. 6 is a flowchart of steps of training a gesture recognition model in an embodiment of the present invention, and as shown in fig. 6, the steps of training the gesture recognition model include:

Fig. 2 is a flowchart illustrating steps of a piano training method based on gesture recognition according to a modification of the present invention, and as shown in fig. 2, the piano training method based on gesture recognition according to the present invention further includes the following steps:

Fig. 7 is a schematic block diagram of a piano training system based on gesture recognition in the embodiment of the present invention, and as shown in fig. 7, the piano training system based on gesture recognition is characterized by including the following modules:

the key frame extraction module is used for acquiring a hand motion video of a user playing a piano and extracting a key frame image of hand motion from the hand motion video;

The embodiment of the invention also provides piano training equipment based on gesture recognition, which comprises a processor; a memory having stored therein executable instructions of the processor. Wherein the processor is configured to perform the steps of the gesture-based recognition piano training method via execution of executable instructions.

As described above, in this embodiment, a video of hand motion when a user plays a piano is acquired, a keyframe image of the hand motion is extracted from the video, a keypoint of the hand is extracted according to the keyframe image, the keypoint in the image is compared with a keypoint in a pre-stored standard keypoint sequence, a standard degree of a hand gesture is generated, a keyframe image with the standard degree lower than a preset threshold value is extracted to generate an error gesture image, and training guidance information is generated according to the error gesture image and a corresponding standard image, so that a gesture action evaluation condition of the user can be obtained in real time, and the user can perform targeted training conveniently.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.

Fig. 8 is a schematic structural diagram of a piano training device based on gesture recognition in the embodiment of the present invention, and as shown in fig. 8, the piano training device based on gesture recognition provided in the present invention includes the following modules:

the human-computer interaction module is used for acquiring a user instruction;

the information acquisition module is used for acquiring a hand motion video and extracting key frame images of hand motion from the hand motion video so as to remove redundant information. The definition of the key frame is: including a complete hand motion for a certain motion cycle. And performing hand detection and positioning on the extracted key frame.

The gesture recognition module is used for recognizing the acquired gesture, namely recognizing a corresponding standard image according to the key frame image; specifically, the input is a key frame which is processed in a previous stage, a convolutional neural network is adopted to build a gesture posture estimation sub-network, and the final output of the network is a three-dimensional coordinate point of a hand key point. The output of the attitude estimation sub-network is the input of the motion recognition sub-network, the motion recognition network is built by adopting a convolutional neural network to complete the recognition of the three-dimensional coordinates of the input gesture key points, and finally the output is the motion recognition result.

The gesture evaluation module is used for quantifying and evaluating gesture postures and finishing the evaluation of the current hand posture according to the hand skeleton key point data and the posture classification result generated by the gesture recognition module; specifically, the evaluation criteria include, but are not limited to, an action completion integrity evaluation, an action completion time evaluation, and an action consistency evaluation, and the evaluation results are stored. Specifically, according to the recognition result, an evaluation standard of key point distance error quantification, a completion time evaluation standard and a completion action fluency evaluation standard are adopted. And comparing the current hand gesture with gesture data in the standard image according to the evaluation standard of the distance error quantization of the key points, and quantizing the error into a number. And (4) finishing time evaluation standard, comparing the whole action finishing time with each time phase threshold, giving a corresponding evaluation result, finishing the action fluency evaluation standard, and judging whether the evaluation period is stopped for a long time or not.

And the information display module is used for displaying relevant information, such as the standard degree, the training guidance information and the suggestion.

The voice module is used for voice playing;

the storage module is used for storing the collected video;

and the external power supply is used for supplying power to the whole set of training equipment.

When the piano training equipment based on gesture recognition provided by the invention is used, a user transmits instructions of carriers in forms of characters, voice and the like through the man-machine interaction module, and the piano action training is started. The man-machine interaction module immediately enters the training song selection module, a user gives out selected piano songs according to instructions of carriers in forms of characters, voice and the like, after the man-machine interaction module obtains the instructions, the training period is started, and the camera starts to work. In the piano song training cycle, the camera lasts the work. And after the training tracks are finished, the control module obtains an instruction and closes the camera. The video data in the training period is stored in a storage module. After the video data are processed in the early stage, the video data are input into a gesture recognition module to recognize key points, and then the video data are input into a gesture evaluation module to finally give an evaluation result of the whole training period. And the information display module is used for displaying the evaluation result.

The embodiment of the invention also provides a computer readable storage medium for storing a program, and the program realizes the steps of the multiple structured light module depth reconstruction method when being executed. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the section on the multiple structured light module deep reconstruction method described above in this description, when the program product is run on the terminal device.

As shown above, when the program of the computer-readable storage medium of this embodiment is executed, by acquiring a video of hand motion when a user plays a piano, extracting a key frame image of the hand motion from the video, extracting a key point of the hand according to the key frame image, comparing the key point in the image with a key point in a pre-stored standard key point sequence, generating a standard degree of hand gesture, further extracting a key frame image with the standard degree lower than a preset threshold value to generate an error gesture image, and generating training guidance information according to the error gesture image and the corresponding standard image, the gesture motion evaluation condition of the user can be obtained in real time, so that the user can perform targeted training conveniently.

Fig. 9 is a schematic structural diagram of a computer-readable storage medium in an embodiment of the present invention. Referring to fig. 9, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In the embodiment of the invention, by acquiring a hand motion video when a user plays a piano, extracting a key frame image of hand motion from the video, extracting key points of a hand according to the key frame image, comparing the key points in the image with key points in a pre-stored standard key point sequence to generate the standard degree of hand gesture, further extracting the key frame image with the standard degree lower than a preset threshold value to generate an error gesture image, and generating training guidance information according to the error gesture image and the corresponding standard image, the gesture action evaluation condition of the user can be obtained in real time, so that the user can train in a targeted manner.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has described specific embodiments of the present invention. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A piano training method based on gesture recognition is characterized by comprising the following steps:

2. The gesture-based recognition piano training method of claim 1, wherein the keyframe image comprises at least:

-a gesture start point image;

-a gesture endpoint image;

a gesture pose image extracted according to a preset interval time, located between the gesture start point image and the gesture end point image.

3. The gesture-based recognition piano training method according to claim 1, wherein said step S2 comprises the steps of:

step S202: inputting each key frame image into the gesture posture estimation model for recognition, and extracting key points of the hand;

4. The gesture-based recognition piano training method of claim 1, wherein said step S3 comprises the steps of:

step S302: comparing the key point sequences with the standard key point sequences one by one, and generating the distance between two corresponding key points;

5. The gesture-based recognition piano training method of claim 1, wherein said step S4 comprises the steps of:

6. The gesture-based recognition piano training method of claim 1, further comprising the steps of:

7. The method for piano training based on gesture recognition according to claim 3, wherein the step of training the gesture recognition model comprises the steps of:

8. A piano training system based on gesture recognition is characterized by comprising the following modules:

9. A gesture-recognition-based piano training device, comprising:

a processor;

a memory module having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the gesture-based piano training method of any one of claims 1-7 via execution of the executable instructions.

10. A computer-readable storage medium storing a program, wherein the program when executed implements the steps of the gesture-based piano training method of any one of claims 1 to 7.