CN116994542A

CN116994542A - Method and device for generating musical instrument training information, electronic equipment and computer readable medium

Info

Publication number: CN116994542A
Application number: CN202210420927.5A
Authority: CN
Inventors: 曾文韬; 徐豪骏; 李山亭
Original assignee: Shanghai Miaoke Information Technology Co ltd
Current assignee: Shanghai Miaoke Information Technology Co ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2023-11-03

Abstract

Embodiments of the present disclosure disclose a method, an apparatus, an electronic device, and a computer readable medium for generating musical instrument training information. One embodiment of the method comprises the following steps: the method comprises the steps of receiving musical instrument sounds played by a user in a preset time period to obtain playing audio of the user; converting the user playing audio into an audio mel frequency spectrum; inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by a user; generating user practice information according to the user playing note set; and displaying the user musical instrument training information. This embodiment improves the learning efficiency of the user to learn the musical instrument.

Description

Method and device for generating musical instrument training information, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for generating musical instrument training information, an electronic device, and a computer readable medium.

Background

The user is required to exercise constantly for a skilled playing instrument. The user often needs instruction in practicing playing the musical instrument. Currently, providing instruction for a user to practice a musical instrument is often a manual instruction.

However, when the above manner is adopted to provide guidance for the user, there are often the following technical problems:

first, manual mode can't provide the instruction for the user anytime and anywhere, and in addition, manual instruction can produce the omission, leads to the user to practice the exercise quantity of playing the musical instrument in-process to the part of not mastering lower, and the pertinence of exercise is lower, causes the user to study the learning efficiency of playing the musical instrument lower.

Secondly, when providing guidance for the user, providing a large amount of guidance for the user with lower mastering degree can lead to the user not being fully understood, further reducing the learning efficiency of the user for learning the musical instrument, and in addition, providing a large amount of guidance for the user with lower mastering degree can influence the mind and state of the user, thereby being unfavorable for the learning of the user on the musical instrument.

Third, a relatively accurate guidance cannot be automatically provided to the user, resulting in a low learning efficiency of the user to learn the musical instrument.

Fourth, after the musical instrument is played for many times, the tone can change, and the tone change of the musical instrument caused by the playing for many times cannot be automatically identified, so that the tuning of the musical instrument cannot be performed in time, and the tone is inaccurate.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a musical instrument practice information generating method, apparatus, electronic device, and computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for generating musical instrument training information, the method including: the method comprises the steps of receiving musical instrument sounds played by a user in a preset time period to obtain playing audio of the user; converting the user playing audio into an audio mel frequency spectrum; inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by a user; generating user practice information according to the user playing note set; and displaying the user musical instrument training information.

In a second aspect, some embodiments of the present disclosure provide a musical instrument training information generating apparatus, including: the sound receiving unit is configured to receive sound of the musical instrument played by the user in a preset time period to obtain playing audio of the user; a conversion unit configured to convert the user playing audio into an audio mel frequency spectrum; the input unit is configured to input the audio mel frequency spectrum into a pre-trained note identification model to obtain a user played note set; a generation unit configured to generate user practice information according to the user play note set; and the display unit is configured to display the user practice information.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantageous effects: through the method for generating the piano practice information, which is disclosed by the embodiment of the invention, the learning efficiency of piano practice of a user is improved. Specifically, the reason why the learning efficiency of the user exercising the musical instrument is low is that: the manual mode can not provide guidance for the user anytime and anywhere, in addition, the manual guidance can be missed, so that the exercise amount of the user on the part which is not mastered in the process of exercising the musical instrument is lower, the pertinence of exercise is lower, and the learning efficiency of the user for learning the musical instrument is lower. Based on this, in the method for generating the piano practice information according to some embodiments of the present disclosure, first, the sound of the musical instrument played by the user in the preset time period is received, so as to obtain the playing audio of the user. Thus, user play audio characterizing the instrument sounds played by the user can be obtained. The user playing audio is then converted to an audio mel spectrum. Thus, an audio mel spectrum for inputting the note recognition model can be obtained. Secondly, inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by the user. Thus, a set of user-played notes characterizing the notes played by the user can be obtained. Then, user practice information is generated according to the set of the user's playing notes. Thus, user practice information characterizing the user's playing pitch, rhythm and integrity can be obtained. And finally, displaying the user musical instrument training information. Therefore, the user can further grasp the playing level of the user according to the user practice information and train the user with pertinence. Because the user exercise information is generated, the instruction can be provided for the user at any time and any place. In the subsequent process of training the piano, the user can conduct targeted training according to the piano training information, so that the training amount and the training pertinence of the unoccupied part are improved, and the learning efficiency of the user for training the piano is improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a method of generating musical instrument training information according to the present disclosure;

fig. 2 is a schematic structural view of some embodiments of a piano practice information generating apparatus according to the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates a flow 100 of some embodiments of a piano practice information generation method according to the present disclosure. The generation method of the practice musical instrument information comprises the following steps:

Step 101, performing radio reception processing on musical instrument sounds played by a user in a preset time period to obtain playing audio of the user.

In some embodiments, an execution body (for example, a computing device) of the piano practice information generating method may perform radio reception processing on musical instrument sounds played by a user in a preset time period to obtain user playing audio. In practice, the executing body may perform the sound receiving process on the instrument sound played by the user in the preset time period by using the microphone. Thus, user play audio characterizing the instrument sounds played by the user can be obtained.

The computing device may be hardware or software. When the computing device is hardware, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein. It should be appreciated that there may be any number of computing devices as desired for an implementation.

Alternatively, first, the user playing audio may be input to a pre-trained user voice command recognition model, to obtain a command recognition result. The user voice command recognition model may be a neural network model that takes user playing audio as input and a command recognition result as output. The user-played audio may include user command speech. The user command voice may be a voice spoken by the user during the playing process. For example, the user command voice may be "a key-pad prompt is required". The command recognition result may be a recognition result representing a user command voice. For example, the command recognition result is "0", which may characterize the user command speech as a keypad presentation. The command recognition result is "1", and the user command voice can be represented as an exemplary video presentation. The keypad may then be displayed in response to the command recognition result characterizing the keypad presentation such that the keypad dynamically presents the corresponding key. For example, in response to the command recognition result characterizing a keypad presentation, a keypad may be displayed such that the keypad renders keys corresponding to notes that need to be prompted in a blinking manner. Here, no setting is made for the specific manner of displaying the keypad. For example, the keypad may be raised for display. For another example, the keypad may also be displayed in a fade-in manner. Thereafter, in response to the command recognition result characterizing the demonstration video presentation, a demonstration video corresponding to the user playing audio may be played. The exemplary video may be a pre-stored video corresponding to the user playing audio. Thus, the user can be prompted to correctly play the key.

Alternatively, the user voice command recognition model may be trained by the executing entity or other computing device by:

first, a sample set is obtained. The samples in the sample set comprise sample playing audio and sample command identification results corresponding to the sample playing audio.

Second, the following training steps are performed based on the sample set:

and a first sub-step of respectively inputting sample playing audios of at least one sample in the sample set to the initial neural network to obtain a prediction command recognition result corresponding to each sample in the at least one sample. The initial neural network may be various neural networks capable of obtaining a predicted command recognition result from a sample playing audio, for example, a convolutional neural network, a deep neural network, and the like.

And a second sub-step of comparing the predicted command recognition result corresponding to each of the at least one sample with the corresponding sample command recognition result.

And a third sub-step of determining whether the initial neural network reaches a preset optimization target according to the comparison result. The optimization target may be that the accuracy of the recognition result of the prediction command generated by the initial neural network is greater than a preset accuracy threshold.

And a fourth sub-step, in response to determining that the initial neural network reaches the optimization target, using the initial neural network as a trained user voice command recognition model.

And a fifth sub-step of, in response to determining that the initial neural network has not reached the optimization objective, adjusting network parameters of the initial neural network, and forming a sample set using unused samples, and performing the training step again using the adjusted initial neural network as the initial neural network. As an example, the network parameters of the initial neural network described above may be adjusted using a back propagation algorithm (Back Propgation Algorithm, BP algorithm) and a gradient descent method (e.g., a random small batch gradient descent algorithm).

Alternatively, first, in response to detecting that the mode confirm operation information record characterizes the user selection of the first mode, the execution subject may determine whether the user play audio is identical to a preset audio. The mode confirmation operation information may be information for recording a mode selected by the user. The first mode may be a strict mode, i.e. the user playing audio and the preset audio must be identical. Secondly, in response to the user playing audio being different from the preset audio, a musical instrument playing prompt screen may be displayed. In practice, in response to the difference between the user playing audio and the musical notes corresponding to the instrument sounds respectively represented by the preset audio, the instrument playing prompt screen can be displayed. The musical instrument playing prompt screen can be a screen displayed by a small keyboard. The musical instrument playing prompt screen can also be a screen played when demonstration video is displayed. Then, the instrument sound continuously played by the user can be continuously received, and the continuously played audio of the user is obtained as the playing audio of the user. In practice, the microphone may be used to continue the sound reception process for the instrument sound that the user continues to play. And then, in response to the fact that the user playing audio obtained after the radio reception processing is continued is the same as the preset audio, closing the musical instrument playing prompt picture. In practice, in response to the user playing audio obtained after the radio reception processing is continued being the same as the preset audio, the keypad can be folded downwards or folded slightly. Thus, by presenting the play prompt to the user selecting the strict mode, a more skilled user may be helped to further proficient the overall play process.

Alternatively, first, a picture of a user playing a musical instrument may be photographed, resulting in a musical instrument video. Then, the piano position information can be obtained by performing key calibration processing on the musical instrument video. In practice, for the first frame video frame included in the musical instrument video, the object detection network may be utilized to locate the smallest circumscribed rectangular frame corresponding to the piano keyboard in the first frame video frame, and determine the position information of the rectangular frame in the first frame video frame as the piano position information. The target detection network may be R-CNN (Region-based Convolutional Neural Networks). The position information may be image coordinates of the rectangular frame in the first frame video frame. And then, according to the piano position information, recording the picture of the piano played by the user to obtain the piano playing video. Wherein, the piano-playing video is the same as the piano in the musical instrument video. In practice, the user can record the picture of the piano as the position of the piano is unchanged. And secondly, generating a hand gesture recognition result according to the piano playing video and the hand gesture model trained in advance. The hand gesture model may be a neural network model that takes a piano playing video as input and a hand gesture recognition result as output. In practice, the piano playing video can be input into a hand gesture model trained in advance, and a hand gesture recognition result is obtained. The hand gesture recognition result may be a preset number of coordinates representing the hand gesture. For example, the hand gesture recognition result may include 42 hand node coordinates. Finally, in response to the hand gesture recognition result meeting the preset joint point coordinate condition, an demonstration video corresponding to the user playing audio can be played so as to prompt the user of correct fingering. The preset node coordinate condition may be that a hand node coordinate different from the preset node coordinate exists in the hand gesture recognition result.

Alternatively, the hand gesture model may be trained by the executing subject or other computing device by:

first, a sample set is obtained. The samples in the sample set comprise sample piano playing videos and sample hand gesture recognition results corresponding to the sample piano playing videos.

Second, the following training steps are performed based on the sample set:

and a first sub-step of respectively inputting sample piano-playing videos of at least one sample in the sample set to an initial neural network to obtain a predicted hand gesture recognition result corresponding to each sample in the at least one sample. The initial neural network may be various neural networks capable of obtaining a predicted hand gesture recognition result from a sample piano playing video, for example, a convolutional neural network, a deep neural network, and the like.

And a second sub-step of comparing the predicted hand gesture recognition result corresponding to each of the at least one sample with the corresponding sample hand gesture recognition result.

And a third sub-step of determining whether the initial neural network reaches a preset optimization target according to the comparison result. The optimization target may be that the accuracy of the predicted hand gesture recognition result generated by the initial neural network is greater than a preset accuracy threshold.

And a fourth sub-step, in response to determining that the initial neural network reaches the optimization target, using the initial neural network as a hand gesture model after training.

Step 102, converting the user playing audio into an audio mel frequency spectrum.

In some embodiments, the executing entity may convert the user playing audio into a mel-frequency spectrum form to obtain an audio mel-frequency spectrum. Thus, an audio mel spectrum for inputting the note recognition model can be obtained.

Step 103, inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by the user.

In some embodiments, the executing entity may input the audio mel spectrum into a pre-trained note recognition model to obtain a set of played notes for the user. The note recognition model may be a neural network model that takes an audio mel spectrum as an input and a set of notes played by a user as an output. Thus, a set of user-played notes characterizing the notes played by the user can be obtained.

In some optional implementations of some embodiments, the note recognition model may be the execution subject or other execution subject for training the note recognition model, which is trained by:

first, a sample set is obtained. The samples in the sample set comprise a sample mel frequency spectrum and a sample playing note set corresponding to the sample mel frequency spectrum.

Second, the following training steps are performed based on the sample set:

and a first sub-step of respectively inputting sample mel spectrums of at least one sample in the sample set into an initial neural network to obtain a predicted playing note set corresponding to each sample in the at least one sample. The initial neural network may be various neural networks capable of obtaining a predicted playing note set from a sample mel spectrum, for example, a convolutional neural network, a deep neural network, and the like.

A second sub-step of comparing the set of predicted playing notes corresponding to each of the at least one sample with the set of corresponding sample playing notes.

And a third sub-step of determining whether the initial neural network reaches a preset optimization target according to the comparison result. The optimization objective may be that the accuracy of the predicted playing note set generated by the initial neural network is greater than a preset accuracy threshold.

And a fourth sub-step, in response to determining that the initial neural network reaches the optimization objective, using the initial neural network as a trained note recognition model.

Step 104, generating the user practice information according to the note set played by the user.

In some embodiments, the executing body may generate the user practice information according to the set of notes played by the user. In practice, the score of the set of the played notes of the user can be obtained, so that the piano practice information of the user is obtained. Thus, user practice information characterizing the user's playing pitch, rhythm and integrity can be obtained.

Optionally, before generating the user practice information, in a first step, in response to detecting that the mode confirmation operation information records that the user selects the second mode, the user playing notes corresponding to the preset note set in the user playing note set may be determined as target user playing notes, so as to obtain the target user playing note set. The second mode may be a loose mode, that is, the notes played by the user include preset notes. In the second step, the number of target user-played notes included in the target user-played note set may be determined as a target number. In the third step, the number of preset notes included in the preset note set may be determined as the preset number of notes. Fourth, in response to determining that the target number is different from the preset number of notes, a musical instrument playing prompt screen may be displayed. In practice, in response to the target number being smaller than the preset number of notes, a musical instrument playing prompt screen may be displayed. And fifthly, the sound of the musical instrument continuously played by the user can be continuously received, and the continuously played audio of the user is obtained as the playing audio of the user. And sixthly, closing the musical instrument playing prompt screen in response to the fact that the target number corresponding to the user playing note set corresponding to the user playing audio obtained after the radio reception processing is the same as the preset note number.

The foregoing is provided as an invention point of the embodiments of the present disclosure, and solves the second technical problem mentioned in the background art, when providing guidance for a user, providing a large amount of guidance for a user with a low mastering level may result in the user failing to understand the whole, further reducing learning efficiency of the user to learn to play a musical instrument, and in addition, providing a large amount of guidance for a user with a low mastering level may affect the mind of the user, which is not beneficial for the user to learn the musical instrument. Factors that cause lowering of learning efficiency of a user to learn a musical instrument and are unfavorable for learning of the musical instrument by the user are as follows: when providing guidance for users, providing a large amount of guidance for users with lower mastering degree can lead the users to be unable to fully understand, further reduces learning efficiency of the users to learn musical instruments, and in addition, providing a large amount of guidance for users with lower mastering degree can influence the mind of the users, thereby being unfavorable for the learning of the users on the musical instruments. If the above factors are solved, an effect of improving the learning efficiency of the user to learn the musical instrument can be achieved. In order to achieve the effect, the present disclosure sets a second mode, determines the user playing notes of the preset note set in the user playing note set as target user playing notes, obtains the target user playing note set, determines the number of target user playing notes included in the target user playing note set as target number, and displays a musical instrument playing prompt screen for the user when the target number is different from the preset note number. Thus, the target user-played notes included in the target user-played note set are all notes that the user has played correctly. Therefore, the multi-bullet of the user can be contained, even if the user plays a plurality of bullets in the playing process, the user can play the notes correctly as long as the notes are needed to play correctly, and the user is prompted only when the user does not play the notes needed to play, so that the prompting times are reduced, a large number of guidance is avoided, and the learning efficiency of the user for learning to play musical instruments is improved. And the prompting times are reduced, so that the influence on the mind states of the user is reduced, and the learning of the musical instrument by the user is facilitated.

Alternatively, in the first step, the executing body may determine the number of the user-played notes included in the set of the user-played notes as the number of the user-played notes. And a second step, generating a pitch score according to the number of the played notes and the number of the musical score notes of the user. In practice, the product of the quotient of the number of the user-played notes and the number of the score notes and 100 may be determined as the intonation score. And thirdly, generating a duration score according to the note duration value and the standard note duration value corresponding to each user playing note in the user playing note set. In practice, first, a score corresponding to each user's playing note may be generated according to the following equation:

where duration_i represents the score corresponding to each user's playing note. cnt denotes a note duration corresponding to a note played by the user. StdCnt represents the standard note duration.

And then, averaging the score corresponding to each user playing note in the user playing note set to obtain a time value score.

Fourth, a rhythm score may be generated according to the starting time and the standard starting time corresponding to each user's playing note in the set of user's playing notes. In practice, first, a time difference between a start time corresponding to each of the user-played notes in the set of user-played notes and a standard start time may be determined. Second, the sum of the absolute values of the respective time differences obtained may be determined as the total time difference. Then, a difference between the preset time threshold and the above-described total time difference may be determined as a time value. Then, in response to the time value being equal to or greater than 0, a product of the time value and 100 is determined as a tempo score. In response to the time value being less than 0, 0 is determined as a cadence score.

And fifthly, generating the user practice information according to the pitch score, the time value score and the rhythm score. In practice, the sum of the pitch score, the time value score and the rhythm score can be determined as a user practice score, and the user practice score, the pitch score, the time value score and the rhythm score are filled into a preset practice corpus template to obtain user practice information. The preset piano training corpus template may be a preset corpus template. The preset piano practice corpus template can be filled into a user piano practice score, a intonation score, a time value score and a rhythm score. For example, the preset piano practice corpus may be "your play score is: []. Wherein the pitch score is [ ]. The chronicity score is [ ]. The tempo score is [ ]. "

The above-mentioned matters are taken as an invention point of the embodiment of the present disclosure, and solve the third technical problem mentioned in the background art, namely that a relatively accurate instruction cannot be automatically provided for a user, so that the learning efficiency of learning to play a musical instrument by the user is relatively low. Factors that cause the learning efficiency of the user to learn the musical instrument to be low are as follows: and can not automatically provide more accurate guidance for users. If the above factors are solved, an effect of improving the learning efficiency of the user to learn the musical instrument can be achieved. To achieve this effect, the present disclosure generates a intonation score, a duration score, and a rhythm score, respectively, and finally generates user practice information. Therefore, the pitch score, the time value score and the rhythm score are respectively generated to regenerate the user practice information, so that the accuracy of the guidance provided for the user is improved, and the learning efficiency of the user for learning the musical instrument is further improved.

And 105, displaying the user practice information.

In some embodiments, the executing entity may present the user exercise information on an associated display device. Therefore, the user can further grasp the playing level of the user according to the user practice information and train the user with pertinence.

Alternatively, in the first step, a history deviated pitch set and a history key-playing number set may be acquired. The history deviated pitches in the above-described history deviated pitch set may be absolute values of differences between standard pitches of the corresponding key and the average pitches of histories of the user playing the key. The historical average pitch may be an average value of pitches of the keys played by the user in a preset historical period. The number of times of the historical key playing in the set of times of the historical key playing may be the number of times of the user striking the key corresponding to the number of times of the historical key playing in a preset historical period of time. Second, an updated offset pitch set may be generated based on the user playing audio and the historical offset pitch set. In practice, first, it is possible to determine the pitch of each key in the user's playing audio, and determine the absolute value of the difference between the pitch of each key and the standard pitch as the current deviated pitch set. Then, for each key, the updated offset pitch corresponding to that key can be determined using the following equation, resulting in an updated offset pitch set:

NewDB_DeviateVec[n]＝λ×DB_DeviateVec[n]+(1-λ)×DeviateVec[h]。

Where newdb_deviatevec n represents an update offset pitch in the update offset pitch set. n represents the key of the corresponding instrument. For example, when the instrument is a piano, n may be a value from an integer between [1,88 ]. For example, when n is 11, newdb_deviatevec [11] indicates that the update corresponding to the 11 th key in the musical instrument deviates from the pitch. DB_DevieVec [ n ] represents a history of deviated pitches in the history of deviated pitch set. DeviateVec n represents the current offset pitch in the current offset pitch set corresponding to the historical offset pitch described above. Lambda is a preset parameter.

And thirdly, generating an updated key playing times set according to the user playing audio frequency and the historical key playing times set. In practice, the number of times each key is struck, which is characterized by the above-described user-played audio, may be added to the above-described historical key-played number of times set. Fourth, a key weight set may be generated based on the gaussian distribution and the updated key play times set described above. The Gaussian distribution can be sampled from a Gaussian distribution with a mean value of 0 and a standard deviation of 3, and the Gaussian distribution is sampled from an abscissa range of [ -2 [ ³ ，2 ³ ]88 values are sampled equidistantly, and the ordinate of the 88 values is normalized to obtain the Gaussian distribution. Wherein from the abscissa range [ -2 ³ ，2 ³ ]The 88 values sampled equidistantly correspond in turn to the 88 keys of the piano. In practice, first, the above updated key play times set is normalizedAnd processing to obtain the normalized updated key playing times set. Then, for each key, the key weight corresponding to the key can be determined by the following equation, thereby obtaining a key weight set:

ConfidenceVec[n]＝ρ×KeyStrickDist[n]+(1-ρ)×GaussVec[n]。

wherein, confidenceVec [ n ] represents the key weight of the n-th key among the key weights. Keystrucklist [ n ] represents the updated key playing times of the nth key after the normalized updated key playing times are concentrated. GaussVec [ n ] represents the value of the ordinate corresponding to the nth key in the Gaussian distribution. ρ is a preset parameter.

Fifth, a user offset pitch value may be generated based on the updated offset pitch set and the key weight set. In practice, dot product processing may be performed on each of the update off-pitch values in the update off-pitch set and each of the key weights in the key weight set to obtain the user off-pitch value. And sixthly, displaying the tuning prompt information to prompt the user to tune the organ in response to the fact that the deviation pitch value of the user is larger than a preset deviation pitch threshold value. The tuning prompt information can be information for prompting a user to tune the tuning. For example, the tuning prompt information may be: the pitch of the keys changes, please the user to tune the key in time-!

The foregoing is taken as an invention point of the embodiments of the present disclosure, and solves the technical problem mentioned in the background art that the musical instrument is changed in pitch after multiple playing, and the musical instrument cannot be automatically identified due to the change of the pitch caused by multiple playing, so that tuning cannot be performed in time, and the pitch is inaccurate. Factors that lead to inaccurate pitch are as follows: after the musical instrument is played for many times, the tone can change, and the tone change of the musical instrument caused by the playing for many times cannot be automatically identified, so that the musical instrument cannot be tuned in time, and the tone is inaccurate. If the above factors are solved, the effect of improving the accuracy of the tone can be achieved. To achieve this effect, the present disclosure generates a user deviated pitch value from a historical deviated pitch set, a historical key play count set and a user play audio based on a gaussian distribution, and prompts the user to tune the piano when the user deviated pitch value is greater than a preset deviated pitch threshold. Therefore, the user can be prompted to tune the instrument to calibrate the musical instrument when the musical instrument changes in the musical instrument level due to the multiple performances. Thereby improving accuracy of the intonation.

Through the method for generating the piano practice information, which is disclosed by the embodiment of the invention, the learning efficiency of piano practice of a user is improved. Specifically, the reason why the learning efficiency of the user exercising the musical instrument is low is that: the manual mode can not provide guidance for the user anytime and anywhere, in addition, the manual guidance can be missed, so that the exercise amount of the user on the part which is not mastered in the process of exercising the musical instrument is lower, the pertinence of exercise is lower, and the learning efficiency of the user for learning the musical instrument is lower. Based on this, in the method for generating the piano practice information according to some embodiments of the present disclosure, first, the sound of the musical instrument played by the user in the preset time period is received, so as to obtain the playing audio of the user. Thus, user play audio characterizing the instrument sounds played by the user can be obtained. The user playing audio is then converted to an audio mel spectrum. Thus, an audio mel spectrum for inputting the note recognition model can be obtained. Secondly, inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by the user. Thus, a set of user-played notes characterizing the notes played by the user can be obtained. Then, user practice information is generated according to the set of the user's playing notes. Thus, user practice information characterizing the user's playing pitch, rhythm and integrity can be obtained. And finally, displaying the user musical instrument training information. Therefore, the user can further grasp the playing level of the user according to the user practice information and train the user with pertinence. Because the user exercise information is generated, the instruction can be provided for the user at any time and any place. In the subsequent process of training the piano, the user can conduct targeted training according to the piano training information, so that the training amount and the training pertinence of the unoccupied part are improved, and the learning efficiency of the user for training the piano is improved.

With continued reference to fig. 2, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of a musical instrument training information generating apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable to various electronic devices.

As shown in fig. 2, the piano practice information generating apparatus 200 of some embodiments includes: a sound reception unit 201, a conversion unit 202, an input unit 203, a generation unit 204, and a presentation unit 205. Wherein, the sound receiving unit 201 is configured to receive sound of the musical instrument played by the user in a preset time period, so as to obtain playing audio of the user; the conversion unit 202 is configured to convert the above user playing audio into an audio mel spectrum; the input unit 203 is configured to input the audio mel spectrum to a pre-trained note recognition model, resulting in a set of user-played notes; the generating unit 204 is configured to generate user practice information according to the above-mentioned user-played note set; the display unit 205 is configured to display the user practice information.

It will be appreciated that the elements described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting benefits described above for the method are equally applicable to the apparatus 200 and the units contained therein, and are not described in detail herein.

Referring now to FIG. 3, a schematic diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the method comprises the steps of receiving musical instrument sounds played by a user in a preset time period to obtain playing audio of the user; converting the user playing audio into an audio mel frequency spectrum; inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by a user; generating user practice information according to the user playing note set; and displaying the user musical instrument training information.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a sound reception unit, a conversion unit, an input unit, a generation unit, and a presentation unit. The names of these units do not limit the unit itself in some cases, and for example, the display unit may also be described as "a unit that displays the user exercise information" described above.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A musical instrument training information generation method comprises the following steps:

the method comprises the steps of receiving musical instrument sounds played by a user in a preset time period to obtain playing audio of the user;

converting the user playing audio into an audio mel frequency spectrum;

inputting the audio mel frequency spectrum into a pre-trained note identification model to obtain a note set played by a user;

generating user musical instrument training information according to the user playing note set;

and displaying the user piano practice information.

2. The method of claim 1, wherein after the receiving the musical instrument sounds played by the user for the preset period of time, the method further comprises:

determining whether the user playing audio is the same as a preset audio in response to detecting that a mode confirmation operation information record characterizes the user to select a first mode;

displaying a musical instrument playing prompt screen in response to the user playing audio being different from the preset audio;

carrying out continuous radio processing on the musical instrument sound continuously played by the user to obtain continuous playing audio of the user as playing audio of the user;

and closing the musical instrument playing prompt picture in response to the fact that the user playing audio frequency obtained after the radio receiving processing is the same as the preset audio frequency.

3. The method of claim 1, wherein after the receiving the musical instrument sounds played by the user for the preset period of time, the method further comprises:

inputting the user playing audio to a pre-trained user voice command recognition model to obtain a command recognition result;

representing a keypad presentation in response to the command recognition result, displaying a keypad such that the keypad dynamically presents corresponding keys;

and playing the demonstration video corresponding to the user playing audio in response to the command identification result representing the demonstration video presentation.

4. The method of claim 1, wherein after the receiving the musical instrument sounds played by the user for the preset period of time, the method further comprises:

shooting a picture of a user playing a musical instrument to obtain a musical instrument video;

performing key calibration processing on the musical instrument video to obtain piano position information;

recording a picture of playing the piano by a user according to the piano position information to obtain a piano playing video, wherein the piano playing video is the same as the piano in the musical instrument video;

Generating a hand gesture recognition result according to the piano playing video and a hand gesture model trained in advance;

and playing an demonstration video corresponding to the user playing audio in response to the hand gesture recognition result meeting a preset node coordinate condition so as to prompt the user of correct fingering.

5. A method according to claim 1, wherein the note recognition model is trained by:

obtaining a sample set, wherein samples in the sample set comprise a sample Mel spectrum and a sample playing note set corresponding to the sample Mel spectrum;

the following training steps are performed based on the sample set:

respectively inputting sample mel frequency spectrums of at least one sample in a sample set into an initial neural network to obtain a predicted playing note set corresponding to each sample in the at least one sample;

comparing the set of predicted playing notes corresponding to each sample of the at least one sample with the set of corresponding sample playing notes;

determining whether the initial neural network reaches a preset optimization target according to the comparison result;

in response to determining that the initial neural network meets the optimization objective, taking the initial neural network as a trained note recognition model;

In response to determining that the initial neural network does not meet the optimization objective, adjusting network parameters of the initial neural network, and using the unused samples to form a sample set, using the adjusted initial neural network as the initial neural network, again performing the training step.

6. The method of claim 3, wherein the user voice command recognition model is trained by:

acquiring a sample set, wherein a sample in the sample set comprises sample playing audio and a sample command identification result corresponding to the sample playing audio;

the following training steps are performed based on the sample set:

respectively inputting sample playing audio of at least one sample in a sample set to an initial neural network to obtain a prediction command recognition result corresponding to each sample in the at least one sample;

comparing a predicted command recognition result corresponding to each of the at least one sample with a corresponding sample command recognition result;

in response to determining that the initial neural network reaches the optimization target, taking the initial neural network as a trained user voice command recognition model;

7. The method of claim 4, wherein the hand gesture model is trained by:

acquiring a sample set, wherein samples in the sample set comprise sample piano playing videos and sample hand gesture recognition results corresponding to the sample piano playing videos;

the following training steps are performed based on the sample set:

respectively inputting sample piano playing videos of at least one sample in a sample set into an initial neural network to obtain a predicted hand gesture recognition result corresponding to each sample in the at least one sample;

comparing the predicted hand gesture recognition result corresponding to each sample in the at least one sample with the corresponding sample hand gesture recognition result;

in response to determining that the initial neural network reaches the optimization target, taking the initial neural network as a hand gesture model with training completed;

8. A musical instrument practice information generating apparatus comprising:

the sound receiving unit is configured to receive sound of the musical instrument played by the user in a preset time period to obtain playing audio of the user;

a conversion unit configured to convert the user playing audio into an audio mel spectrum;

the input unit is configured to input the audio mel frequency spectrum into a pre-trained note identification model to obtain a user played note set;

a generation unit configured to generate user practice information according to the user play note set;

and the display unit is configured to display the user practice information.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-7.