Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be understood that in the description of the embodiments of the present invention, if there is any description of "first", "second", etc., it is only for the purpose of distinguishing technical features, and it is not to be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the precedence of the indicated technical features. "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the increasing living standard, the demand of people for improving the music literacy is increasing day by day, and the application of the artificial intelligence technology to the learning of musical instrument playing becomes an important means. The directions of voice processing technology, deep learning and the like in the artificial intelligence software technology provide a means for processing and learning the performance audio information. The performance evaluation function is improved through the artificial intelligence technology, and the performance audio information of the user is evaluated quickly and scientifically.
The performance evaluation method provided by the embodiment of the invention replaces professional music teachers with computer equipment to provide evaluation for performance trainees, and provides real-time reminding and guidance for the performance trainees, so that the performance evaluation method is widely applied to the fields of musical instrument learning, singing exercise and the like.
In the related art, the audio evaluation method aims at the problem of deductive accuracy of a player, and an accuracy level can be obtained only by comparing audio data with an existing music score. This solution has a number of disadvantages: the accuracy level can be obtained only by comparing the audio data with the existing music score, the rhythm sense, expressive force, musicality, style and the like of a player cannot be comprehensively evaluated, and the comprehensive performance level cannot be rapidly improved by the user.
Based on the above, the embodiment of the present invention provides a performance evaluation method and apparatus based on eigen decomposition (eigen decomposition), wherein performance audio information of a user is obtained and input to a performance evaluation model to obtain reconstructed score information and reconstructed audio information; accuracy level is obtained through reconstructing music score information and a target playing music score, skill level is obtained through playing audio information and master playing audio information, accuracy and playing skill of the playing audio information are reasonably and scientifically evaluated, difference between the playing audio information and the master playing audio information is visually shown, and comprehensive playing level of a user is improved.
It should be understood that eigen decomposition is a method of decomposing a matrix into the product of its eigenvalues and the matrix represented by the eigenvectors. In practical applications, the auto-correlation feature components are usually decomposed from the acquired data to obtain feature vectors of the acquired data. And performing feature recognition through the data to extract important information which is worthy of attention. In addition, by calculating the distance between the feature vector of the target data and the feature vector of the user data, the user data can be scientifically and reasonably evaluated, and the level of the user data in all aspects can be accurately evaluated.
Referring to fig. 1 and fig. 2, fig. 1 and fig. 2 show a flow of a performance evaluation method based on feature decomposition according to an embodiment of the present invention. As shown in fig. 2, the performance evaluation method based on feature decomposition according to the embodiment of the present invention includes the following steps:
and S100, collecting the target performance music score and the master performance audio information to form a training set in which the target performance music score corresponds to the master performance audio information one by one.
It should be understood that, in order to establish the standard of performance evaluation, the target performance score and the corresponding master performance audio information need to be collected, so as to facilitate evaluation of the performance audio information of the user. Meanwhile, the application range of the training set can be expanded by collecting target performance music scores and master performance audio information, wherein the target performance music scores include but are not limited to music score information of guitar, piano, violin, drum, vertical bamboo and other instruments. And the master performance audio information is master performance audio corresponding to the target performance music score, and is stored in a unified format to form a training set for performance evaluation.
And S200, inputting the training set into the performance evaluation model for training, and updating the performance evaluation model.
It should be understood that the performance evaluation model is an auto-supervision model based on an Auto Encoder (AE). In the training process, the parameters of the performance evaluation model are obtained by training the minimized error, so that the target performance music score and the master performance audio information in the training set are close to the target performance music score and the master performance audio information as much as possible after being reconstructed by the performance evaluation model. Therefore, the more the target playing music score and master playing audio information of the training set are, the more the playing evaluation model is fully trained, and the obtained model parameters have more reference significance.
It should be understood that the performance evaluation model is a deep learning network model, and the training of 4 neural network-characterized component units of a music score encoder, an audio encoder, a music score decoder and a global decoder is realized end to end by minimizing the music score information error and the performance audio information error before and after reconstruction.
Referring to fig. 3, step S200 may be implemented by the following steps:
and S210, generating a reconstructed performance score according to the target performance score.
It should be understood that the score encoder is provided with multiple network layers, and thus, the output of the target performance score after inference by the score encoder is generally set as a 256-dimensional vector in length, i.e., the score information vector includes 256 target audio sequences. In addition, the score decoder is provided with a loop layer and is adapted to output a parameter value for each target audio sequence. Therefore, the score information vector is input into a score decoder, and the score information vector is converted into a reconstructed performance score.
And S220, generating reconstructed audio information according to the target performance music score and the master performance audio information.
It should be understood that the audio encoder is provided with a plurality of network layers, and therefore, the output of the audio information performed by the master after being inferred by the audio encoder is generally set as a vector with a length of 256 dimensions, that is, the audio information vector includes 256 target audio sequences. In addition, the score decoder is provided with a loop layer and is adapted to output a parameter value for each target audio sequence. Therefore, the score information vector and the audio information vector are combined to obtain a global information vector, the global information vector is input into a global decoder, and the global information vector is converted into reconstructed audio information.
It should be appreciated that by combining the score information vector and the performance skill information vector, the global information vector can be made to characterize both the accuracy and skill of the performance audio information.
Referring to fig. 7, fig. 7 is a schematic diagram of a performance evaluation method based on feature decomposition according to an embodiment of the present invention.
It is to be understood that the global information vector is converted into reconstructed audio information by decoding the global information vector. It can be seen that the music score information vector plays two roles in the reconstruction of the music score information and the reconstruction of the audio information in the training process, the playing skill information vector only participates in the synthesis of the reconstruction of the audio information, and the error value of the reconstruction of the audio information is smaller than the preset audio error threshold value, so that the playing skill information only contains playing related information, and therefore the playing skill in the playing audio information can be represented.
And S230, updating the parameters of the performance evaluation model, so that the errors between the reconstructed performance music score and the target performance music score and between the reconstructed audio information and the master performance audio information are smaller than a preset error threshold.
It should be understood that the error of reconstructing the rendered score is composed by the cross entropy of each generated audio sequence, which is calculated as follows:
wherein L is
NSIs the error value of the reconstructed score information, N represents the length of the generated score sequence, t represents the number of steps in the sequence,
distribution between all notes, p, predicted for a score decoder
tIs a true note distribution.
It should be understood that by updating the parameters of the score encoder and the score decoder, the error value L between the performance score and the target performance score is reconstructedNSLess than the preset music score error threshold, can effectively reduce the distortion of the performance evaluation model and ensure the reconstruction of the performance evaluation modelAnd (5) playing the reference value of the music score.
It should be understood that the error of the reconstructed audio information is composed of the error of the reconstructed audio information generated at each moment and the audio information played by the master, and the calculation formula is as follows:
wherein L is
MSAn error value for reconstructing the audio information, N representing the length of the sequence of the generated score, t representing the number of steps in the sequence,
audio vector data reconstructed for time t, v
tAnd (5) performing audio vector data of the audio information for the master at the time t.
It should be appreciated that by updating the parameters of the audio encoder and the global decoder, the error value L between the reconstructed audio information and the master performance audio information is madeMSAnd the fidelity of the performance evaluation model is ensured and the reference value of the reconstructed audio information is improved when the audio error is smaller than the preset audio error threshold.
S300, acquiring the performance audio information of a user, and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
referring to fig. 4, step S300 can be implemented by the following steps:
s310, inputting the performance audio information into the performance evaluation model;
s320, extracting a music score information vector from the performance audio information;
and S330, generating reconstructed music score information according to the music score information vector.
It should be understood that, as shown in step S210 in the above embodiment, the music score information is passed through the music score encoder, and a music score information vector is extracted from the music score information; and inputting the score information vector into a score decoder to obtain reconstructed score information. The reconstruction process is consistent with that of the target performance music score, and the reconstructed performance music score is generated by the music score encoder and the music score decoder, and is not described in detail herein.
And S400, determining the accuracy level according to the matching degree of the reconstructed music score information and the target performance music score.
Referring to fig. 5, step S400 may be implemented by:
and S410, calculating the cross entropy of the reconstructed music score information and the corresponding note sequence in the target performance music score.
It should be understood that the calculation of the cross entropy between the reconstructed score information and the corresponding note sequence in the target performance score is consistent with the calculation process of step S230 in the above embodiment, and is not described herein again.
And S420, acquiring the matching number of notes in the reconstructed music score information according to the cross entropy and a preset music score matching threshold.
It should be understood that by comparing the cross entropy with the preset score matching threshold, the number of notes of the reconstructed score information and the target performance score within the score matching threshold can be obtained, and the accuracy of the performance score information can be reflected.
And S430, determining the accuracy level according to the matching number.
It will be appreciated that from the ratio of the number of matches to the total number of notes, the accuracy level is found as shown in the following equation:
wherein M ismatchedTo match the number, MtotalIs the total number of notes.
And S500, determining the skill level according to the information vector distance value between the performance audio information and the master performance audio information.
Referring to fig. 6, step S500 may be implemented by the following steps:
and S510, extracting the performance skill information vector from the performance audio information.
And S520, extracting the master skill information vector from the master performance audio information.
It should be understood that the process of extracting the performance skill information vector and the master skill information vector is consistent with the manner of extracting the score information vector from the performance audio information in step S310 in the above embodiment, and will not be described herein again.
It should be understood that, through the trained audio encoder, the master skill information vector is extracted from the master performance audio information, so that the fidelity of the master skill information vector can be further ensured, and the accuracy rate of the skill level is prevented from being influenced by distortion and errors of the master skill information vector.
It should be understood that the score encoder has an influence on both the reconstructed score information and the reconstructed audio information, and therefore the updating of the parameters thereof comes from the supervisory signals of the reconstructed score information and the reconstructed audio information, and the updating of the parameters thereof also balances the errors of the reconstructed score information and the reconstructed audio information.
S530, calculating the skill distance between the playing skill information vector and the master skill information vector.
It should be understood that, from the performance skill information vector and the master skill information vector, the calculation formula of the skill distance is as follows:
wherein v is1For the vector of the playing skill information, v2Distance (v) for master skill information vector1,v2) Has a value range of [ -1, +1]。
And S540, determining the skill level according to the skill distance.
It should be understood that the Distance of skill (v) is obtained1,v2) Then, since the greater the skill distance, the lower the skill level, the calculation formula for obtaining the skill level is as follows:
Sperform=(1-|Distance(v1,v2)|)*100
wherein S isperformIs a skill level.
And S600, obtaining a comprehensive evaluation result according to the accuracy level and the skill level.
It should be understood that the accuracy level and the weighted average of the skill level are calculated to obtain the comprehensive evaluation result of the performance audio information. Therefore, the influence of the accuracy level and the skill level on the comprehensive evaluation result is fully reflected, the calculation process of the comprehensive evaluation result is more reasonable, and the comprehensive evaluation result obtained by the user has higher reference value.
Illustratively, the combined evaluation results take the average of the accuracy level and the skill level. Therefore, the accuracy level and the skill level are calculated from the steps S400 and S500, and the result S is comprehensively evaluatedtotalThe calculation formula of (a) is as follows:
Stotal=(Saccuracy+Sperform)/2
referring to fig. 8, fig. 8 is a schematic structural diagram of a performance evaluation device based on feature decomposition according to another embodiment of the present invention. The performance evaluation device based on the feature decomposition provided by the embodiment of the invention comprises:
the acquisition module 710 is configured to acquire performance audio information of a user, and input the performance audio information to a preset performance evaluation model to obtain reconstructed music score information;
a matching module 720, configured to determine an accuracy level according to a matching degree between the reconstructed score information and the target performance score;
the calculating module 730 is used for determining the skill level according to the information vector distance value between the performance audio information and the master performance audio information;
and the evaluation module 740 is used for obtaining a comprehensive evaluation result according to the accuracy level and the skill level.
The performance evaluation device based on feature decomposition provided by the embodiment of the invention further comprises:
the collecting module 750 is configured to collect the target performance score and the master performance audio information to form a training set in which the target performance score and the master performance audio information correspond to each other one by one;
and the training module 760 is used for inputting the training set into the performance evaluation model for training and updating the performance evaluation model.
Referring to fig. 9, in the performance evaluation apparatus based on feature decomposition according to the embodiment of the present invention, the training module 760 further includes:
a score reconstruction module 761 for generating a reconstructed performance score according to the target performance score;
an audio reconstruction module 762 for generating reconstructed audio information according to the target performance score and the master performance audio information;
an updating module 763, configured to update parameters of the performance evaluation model, so that errors between the reconstructed performance score and the target performance score and errors between the reconstructed audio information and the master performance audio information are smaller than a preset error threshold.
In the performance evaluating apparatus based on feature decomposition according to the embodiment of the present invention, the score reconstructing module 761 further includes:
a score encoder for extracting a score information vector from the performance audio information;
and the music score decoder is used for generating reconstructed music score information according to the music score information vector.
In the performance evaluating apparatus based on feature decomposition provided in the embodiment of the present invention, the matching module 720 is further configured to:
calculating the cross entropy of the reconstructed music score information and the corresponding note sequence in the target playing music score;
acquiring the matching number of notes in the reconstructed music score information according to the cross entropy and a preset music score matching threshold;
from the number of matches, the accuracy level is determined.
In the performance evaluating apparatus based on feature decomposition provided in the embodiment of the present invention, the audio reconstructing module 762 further includes:
an audio encoder for extracting a performance skill information vector from the performance audio information; extracting a master skill information vector from master performance audio information;
the calculation module 730 is further configured to: calculating the skill distance between the playing skill information vector and the master skill information vector;
determining a skill level based on the skill distance.
It should be noted that, because the content of information interaction, execution process, and the like between the modules of the apparatus is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to specifically in the method embodiment section, and are not described herein again.
Fig. 10 illustrates an electronic device 800 provided by an embodiment of the invention. The electronic device 800 includes, but is not limited to:
a memory 820 for storing programs;
and a control processor 810 for executing the program stored in the memory 820, wherein when the control processor 810 executes the program stored in the memory 820, the control processor 810 is configured to execute the performance evaluation method based on feature decomposition.
Control processor 810 and memory 820 may be connected by a bus or other means.
The memory 820 is a non-transitory computer readable storage medium that can be used to store non-transitory software programs and non-transitory computer executable programs, such as the method for performance profiling based on feature decomposition described in any of the embodiments of the present invention. The control processor 810 implements the above-described method of performance evaluation based on feature decomposition by executing non-transitory software programs and instructions stored in the memory 820.
The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store performance evaluation methods that perform the above-described feature decomposition-based performance evaluation. Further, the memory 820 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 820 may optionally include memory located remotely from the control processor 810, which may be connected to the control processor 810 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions necessary to implement the above-described feature-decomposition-based performance evaluation method are stored in the memory 820 and, when executed by the one or more control processors 810, perform the feature-decomposition-based performance evaluation method provided by any of the embodiments of the present invention.
The embodiment of the invention also provides a storage medium, which stores computer executable instructions, and the computer executable instructions are used for executing the performance evaluation method based on the feature decomposition.
In one embodiment, the storage medium stores computer-executable instructions, which are executed by one or more control processors 810, for example, by one control processor 810 in the electronic device 800, so that the one or more control processors 810 can execute the method for evaluating performance based on feature decomposition according to any embodiment of the present invention.
The above described embodiments are merely illustrative, wherein elements illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention is not limited thereto. Under the shared conditions, various equivalent modifications or substitutions can be made, and the equivalent modifications or substitutions are included in the scope of the invention defined by the claims.