CN113851146A - Performance evaluation method and device based on feature decomposition - Google Patents
Performance evaluation method and device based on feature decomposition Download PDFInfo
- Publication number
- CN113851146A CN113851146A CN202111131753.2A CN202111131753A CN113851146A CN 113851146 A CN113851146 A CN 113851146A CN 202111131753 A CN202111131753 A CN 202111131753A CN 113851146 A CN113851146 A CN 113851146A
- Authority
- CN
- China
- Prior art keywords
- performance
- information
- audio information
- music score
- reconstructed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 66
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 67
- 238000013210 evaluation model Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 9
- 238000013473 artificial intelligence Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 235000017166 Bambusa arundinacea Nutrition 0.000 description 1
- 235000017491 Bambusa tulda Nutrition 0.000 description 1
- 241001330002 Bambuseae Species 0.000 description 1
- 235000015334 Phyllostachys viridis Nutrition 0.000 description 1
- 239000011425 bamboo Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Auxiliary Devices For Music (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The embodiment of the invention discloses a performance evaluation method and a device thereof based on feature decomposition, wherein the method comprises the following steps: acquiring performance audio information of a user, and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information; determining the accuracy level according to the matching degree of the reconstructed music score information and the target performance music score; determining skill level according to an information vector distance value between the playing audio information and preset master playing audio information and the information vector distance value; and comprehensively evaluating the playing audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result. The scheme of the embodiment of the invention can reasonably and scientifically evaluate the accuracy rate and the playing skill of the input audio signal, visually show the difference between the playing audio information and the master playing audio information, and improve the rationality and the accuracy of the playing evaluation.
Description
Technical Field
The invention relates to an artificial intelligence technology, in particular to a performance evaluation method and a performance evaluation device based on feature decomposition.
Background
With the increasing living standard, people's demand for improving the literacy of music is increasing day by day, and the learning of musical instrument playing is an important way. However, learning musical instrument performance requires 1-to-1 instruction and a lot of practice by professional teachers, so that educational resources are strained, learning costs are high, and a lot of people cannot acquire sufficient professional instructional education time. In recent years, a large number of training products are produced to relieve the dependence of people on expensive educational resources, the aim is to improve the function and performance through an artificial intelligence technology, and the products can be close to the professional level of human teachers in the process of guiding musical instrument playing. One of core technologies in the products is an instrument performance evaluation method, in the existing method, generally, vocal music characteristics (such as pitch, starting point, ending point and the like) played by a user are extracted based on professional equipment or an audio signal processing method to form a user performance spectrogram, and then, evaluation of the user performance is given by combining with a standard spectrogram. Therefore, the existing evaluation method is greatly different from evaluation of professional teachers, and the comprehensive performance level cannot be improved by using the products under the guidance of professional persons.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the invention provides a performance evaluation method and device based on feature decomposition, which can evaluate the performance skill of an input audio signal and improve the reasonability and accuracy of performance evaluation.
In a first aspect, an embodiment of the present invention provides a performance evaluation method based on feature decomposition, where the method includes:
acquiring performance audio information of a user, and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
determining an accuracy level according to the matching degree between the reconstructed music score information and the target performance music score;
determining skill level according to an information vector distance value between the playing audio information and preset master playing audio information and the information vector distance value;
and comprehensively evaluating the playing audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result.
In a second aspect, an embodiment of the present invention provides a performance evaluation apparatus based on feature decomposition, including:
the acquisition module is used for acquiring the performance audio information of a user and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
the matching module is used for determining the accuracy level according to the matching degree between the reconstructed music score information and the target performance music score;
the computing module is used for determining skill level according to an information vector distance value between the playing audio information and preset master playing audio information and the information vector distance value;
and the evaluation module is used for comprehensively evaluating the playing audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result.
In a third aspect, an embodiment of the present invention provides an electronic device, including: the performance evaluation method based on the feature decomposition provided by the embodiment of the invention is realized when the control processor executes the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the performance evaluation method based on feature decomposition according to the embodiment of the present invention is implemented.
According to the embodiment of the invention, through acquiring the performance audio information of a user, the performance audio information is input into a preset performance evaluation model to obtain reconstructed music score information; accuracy level is obtained through reconstructing music score information and a target playing music score, skill level is obtained through playing audio information and master playing audio information, accuracy and playing skill of the playing audio information are reasonably and scientifically evaluated, difference between the playing audio information and the master playing audio information is visually shown, and comprehensive playing level of a user is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a schematic flow chart of a performance evaluation method based on feature decomposition according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a performance evaluation method based on feature decomposition according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of an implementation of step S200 in FIG. 2;
FIG. 4 is a schematic diagram of an implementation of step S300 in FIG. 1;
FIG. 5 is a diagram illustrating an implementation of step S400 in FIG. 1;
FIG. 6 is a schematic diagram of an implementation of step S500 in FIG. 1
FIG. 7 is a schematic diagram of a performance evaluation method based on feature decomposition according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a performance evaluation device based on feature decomposition according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of the structure of the training module of FIG. 8;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be understood that in the description of the embodiments of the present invention, if there is any description of "first", "second", etc., it is only for the purpose of distinguishing technical features, and it is not to be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the precedence of the indicated technical features. "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the increasing living standard, the demand of people for improving the music literacy is increasing day by day, and the application of the artificial intelligence technology to the learning of musical instrument playing becomes an important means. The directions of voice processing technology, deep learning and the like in the artificial intelligence software technology provide a means for processing and learning the performance audio information. The performance evaluation function is improved through the artificial intelligence technology, and the performance audio information of the user is evaluated quickly and scientifically.
The performance evaluation method provided by the embodiment of the invention replaces professional music teachers with computer equipment to provide evaluation for performance trainees, and provides real-time reminding and guidance for the performance trainees, so that the performance evaluation method is widely applied to the fields of musical instrument learning, singing exercise and the like.
In the related art, the audio evaluation method aims at the problem of deductive accuracy of a player, and an accuracy level can be obtained only by comparing audio data with an existing music score. This solution has a number of disadvantages: the accuracy level can be obtained only by comparing the audio data with the existing music score, the rhythm sense, expressive force, musicality, style and the like of a player cannot be comprehensively evaluated, and the comprehensive performance level cannot be rapidly improved by the user.
Based on the above, the embodiment of the present invention provides a performance evaluation method and apparatus based on eigen decomposition (eigen decomposition), wherein performance audio information of a user is obtained and input to a performance evaluation model to obtain reconstructed score information and reconstructed audio information; accuracy level is obtained through reconstructing music score information and a target playing music score, skill level is obtained through playing audio information and master playing audio information, accuracy and playing skill of the playing audio information are reasonably and scientifically evaluated, difference between the playing audio information and the master playing audio information is visually shown, and comprehensive playing level of a user is improved.
It should be understood that eigen decomposition is a method of decomposing a matrix into the product of its eigenvalues and the matrix represented by the eigenvectors. In practical applications, the auto-correlation feature components are usually decomposed from the acquired data to obtain feature vectors of the acquired data. And performing feature recognition through the data to extract important information which is worthy of attention. In addition, by calculating the distance between the feature vector of the target data and the feature vector of the user data, the user data can be scientifically and reasonably evaluated, and the level of the user data in all aspects can be accurately evaluated.
Referring to fig. 1 and fig. 2, fig. 1 and fig. 2 show a flow of a performance evaluation method based on feature decomposition according to an embodiment of the present invention. As shown in fig. 2, the performance evaluation method based on feature decomposition according to the embodiment of the present invention includes the following steps:
and S100, collecting the target performance music score and the master performance audio information to form a training set in which the target performance music score corresponds to the master performance audio information one by one.
It should be understood that, in order to establish the standard of performance evaluation, the target performance score and the corresponding master performance audio information need to be collected, so as to facilitate evaluation of the performance audio information of the user. Meanwhile, the application range of the training set can be expanded by collecting target performance music scores and master performance audio information, wherein the target performance music scores include but are not limited to music score information of guitar, piano, violin, drum, vertical bamboo and other instruments. And the master performance audio information is master performance audio corresponding to the target performance music score, and is stored in a unified format to form a training set for performance evaluation.
And S200, inputting the training set into the performance evaluation model for training, and updating the performance evaluation model.
It should be understood that the performance evaluation model is an auto-supervision model based on an Auto Encoder (AE). In the training process, the parameters of the performance evaluation model are obtained by training the minimized error, so that the target performance music score and the master performance audio information in the training set are close to the target performance music score and the master performance audio information as much as possible after being reconstructed by the performance evaluation model. Therefore, the more the target playing music score and master playing audio information of the training set are, the more the playing evaluation model is fully trained, and the obtained model parameters have more reference significance.
It should be understood that the performance evaluation model is a deep learning network model, and the training of 4 neural network-characterized component units of a music score encoder, an audio encoder, a music score decoder and a global decoder is realized end to end by minimizing the music score information error and the performance audio information error before and after reconstruction.
Referring to fig. 3, step S200 may be implemented by the following steps:
and S210, generating a reconstructed performance score according to the target performance score.
It should be understood that the score encoder is provided with multiple network layers, and thus, the output of the target performance score after inference by the score encoder is generally set as a 256-dimensional vector in length, i.e., the score information vector includes 256 target audio sequences. In addition, the score decoder is provided with a loop layer and is adapted to output a parameter value for each target audio sequence. Therefore, the score information vector is input into a score decoder, and the score information vector is converted into a reconstructed performance score.
And S220, generating reconstructed audio information according to the target performance music score and the master performance audio information.
It should be understood that the audio encoder is provided with a plurality of network layers, and therefore, the output of the audio information performed by the master after being inferred by the audio encoder is generally set as a vector with a length of 256 dimensions, that is, the audio information vector includes 256 target audio sequences. In addition, the score decoder is provided with a loop layer and is adapted to output a parameter value for each target audio sequence. Therefore, the score information vector and the audio information vector are combined to obtain a global information vector, the global information vector is input into a global decoder, and the global information vector is converted into reconstructed audio information.
It should be appreciated that by combining the score information vector and the performance skill information vector, the global information vector can be made to characterize both the accuracy and skill of the performance audio information.
Referring to fig. 7, fig. 7 is a schematic diagram of a performance evaluation method based on feature decomposition according to an embodiment of the present invention.
It is to be understood that the global information vector is converted into reconstructed audio information by decoding the global information vector. It can be seen that the music score information vector plays two roles in the reconstruction of the music score information and the reconstruction of the audio information in the training process, the playing skill information vector only participates in the synthesis of the reconstruction of the audio information, and the error value of the reconstruction of the audio information is smaller than the preset audio error threshold value, so that the playing skill information only contains playing related information, and therefore the playing skill in the playing audio information can be represented.
And S230, updating the parameters of the performance evaluation model, so that the errors between the reconstructed performance music score and the target performance music score and between the reconstructed audio information and the master performance audio information are smaller than a preset error threshold.
It should be understood that the error of reconstructing the rendered score is composed by the cross entropy of each generated audio sequence, which is calculated as follows:
wherein L isNSIs the error value of the reconstructed score information, N represents the length of the generated score sequence, t represents the number of steps in the sequence,distribution between all notes, p, predicted for a score decodertIs a true note distribution.
It should be understood that by updating the parameters of the score encoder and the score decoder, the error value L between the performance score and the target performance score is reconstructedNSLess than the preset music score error threshold, can effectively reduce the distortion of the performance evaluation model and ensure the reconstruction of the performance evaluation modelAnd (5) playing the reference value of the music score.
It should be understood that the error of the reconstructed audio information is composed of the error of the reconstructed audio information generated at each moment and the audio information played by the master, and the calculation formula is as follows:
wherein L isMSAn error value for reconstructing the audio information, N representing the length of the sequence of the generated score, t representing the number of steps in the sequence,audio vector data reconstructed for time t, vtAnd (5) performing audio vector data of the audio information for the master at the time t.
It should be appreciated that by updating the parameters of the audio encoder and the global decoder, the error value L between the reconstructed audio information and the master performance audio information is madeMSAnd the fidelity of the performance evaluation model is ensured and the reference value of the reconstructed audio information is improved when the audio error is smaller than the preset audio error threshold.
S300, acquiring the performance audio information of a user, and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
referring to fig. 4, step S300 can be implemented by the following steps:
s310, inputting the performance audio information into the performance evaluation model;
s320, extracting a music score information vector from the performance audio information;
and S330, generating reconstructed music score information according to the music score information vector.
It should be understood that, as shown in step S210 in the above embodiment, the music score information is passed through the music score encoder, and a music score information vector is extracted from the music score information; and inputting the score information vector into a score decoder to obtain reconstructed score information. The reconstruction process is consistent with that of the target performance music score, and the reconstructed performance music score is generated by the music score encoder and the music score decoder, and is not described in detail herein.
And S400, determining the accuracy level according to the matching degree of the reconstructed music score information and the target performance music score.
Referring to fig. 5, step S400 may be implemented by:
and S410, calculating the cross entropy of the reconstructed music score information and the corresponding note sequence in the target performance music score.
It should be understood that the calculation of the cross entropy between the reconstructed score information and the corresponding note sequence in the target performance score is consistent with the calculation process of step S230 in the above embodiment, and is not described herein again.
And S420, acquiring the matching number of notes in the reconstructed music score information according to the cross entropy and a preset music score matching threshold.
It should be understood that by comparing the cross entropy with the preset score matching threshold, the number of notes of the reconstructed score information and the target performance score within the score matching threshold can be obtained, and the accuracy of the performance score information can be reflected.
And S430, determining the accuracy level according to the matching number.
It will be appreciated that from the ratio of the number of matches to the total number of notes, the accuracy level is found as shown in the following equation:
wherein M ismatchedTo match the number, MtotalIs the total number of notes.
And S500, determining the skill level according to the information vector distance value between the performance audio information and the master performance audio information.
Referring to fig. 6, step S500 may be implemented by the following steps:
and S510, extracting the performance skill information vector from the performance audio information.
And S520, extracting the master skill information vector from the master performance audio information.
It should be understood that the process of extracting the performance skill information vector and the master skill information vector is consistent with the manner of extracting the score information vector from the performance audio information in step S310 in the above embodiment, and will not be described herein again.
It should be understood that, through the trained audio encoder, the master skill information vector is extracted from the master performance audio information, so that the fidelity of the master skill information vector can be further ensured, and the accuracy rate of the skill level is prevented from being influenced by distortion and errors of the master skill information vector.
It should be understood that the score encoder has an influence on both the reconstructed score information and the reconstructed audio information, and therefore the updating of the parameters thereof comes from the supervisory signals of the reconstructed score information and the reconstructed audio information, and the updating of the parameters thereof also balances the errors of the reconstructed score information and the reconstructed audio information.
S530, calculating the skill distance between the playing skill information vector and the master skill information vector.
It should be understood that, from the performance skill information vector and the master skill information vector, the calculation formula of the skill distance is as follows:
wherein v is1For the vector of the playing skill information, v2Distance (v) for master skill information vector1,v2) Has a value range of [ -1, +1]。
And S540, determining the skill level according to the skill distance.
It should be understood that the Distance of skill (v) is obtained1,v2) Then, since the greater the skill distance, the lower the skill level, the calculation formula for obtaining the skill level is as follows:
Sperform=(1-|Distance(v1,v2)|)*100
wherein S isperformIs a skill level.
And S600, obtaining a comprehensive evaluation result according to the accuracy level and the skill level.
It should be understood that the accuracy level and the weighted average of the skill level are calculated to obtain the comprehensive evaluation result of the performance audio information. Therefore, the influence of the accuracy level and the skill level on the comprehensive evaluation result is fully reflected, the calculation process of the comprehensive evaluation result is more reasonable, and the comprehensive evaluation result obtained by the user has higher reference value.
Illustratively, the combined evaluation results take the average of the accuracy level and the skill level. Therefore, the accuracy level and the skill level are calculated from the steps S400 and S500, and the result S is comprehensively evaluatedtotalThe calculation formula of (a) is as follows:
Stotal=(Saccuracy+Sperform)/2
referring to fig. 8, fig. 8 is a schematic structural diagram of a performance evaluation device based on feature decomposition according to another embodiment of the present invention. The performance evaluation device based on the feature decomposition provided by the embodiment of the invention comprises:
the acquisition module 710 is configured to acquire performance audio information of a user, and input the performance audio information to a preset performance evaluation model to obtain reconstructed music score information;
a matching module 720, configured to determine an accuracy level according to a matching degree between the reconstructed score information and the target performance score;
the calculating module 730 is used for determining the skill level according to the information vector distance value between the performance audio information and the master performance audio information;
and the evaluation module 740 is used for obtaining a comprehensive evaluation result according to the accuracy level and the skill level.
The performance evaluation device based on feature decomposition provided by the embodiment of the invention further comprises:
the collecting module 750 is configured to collect the target performance score and the master performance audio information to form a training set in which the target performance score and the master performance audio information correspond to each other one by one;
and the training module 760 is used for inputting the training set into the performance evaluation model for training and updating the performance evaluation model.
Referring to fig. 9, in the performance evaluation apparatus based on feature decomposition according to the embodiment of the present invention, the training module 760 further includes:
a score reconstruction module 761 for generating a reconstructed performance score according to the target performance score;
an audio reconstruction module 762 for generating reconstructed audio information according to the target performance score and the master performance audio information;
an updating module 763, configured to update parameters of the performance evaluation model, so that errors between the reconstructed performance score and the target performance score and errors between the reconstructed audio information and the master performance audio information are smaller than a preset error threshold.
In the performance evaluating apparatus based on feature decomposition according to the embodiment of the present invention, the score reconstructing module 761 further includes:
a score encoder for extracting a score information vector from the performance audio information;
and the music score decoder is used for generating reconstructed music score information according to the music score information vector.
In the performance evaluating apparatus based on feature decomposition provided in the embodiment of the present invention, the matching module 720 is further configured to:
calculating the cross entropy of the reconstructed music score information and the corresponding note sequence in the target playing music score;
acquiring the matching number of notes in the reconstructed music score information according to the cross entropy and a preset music score matching threshold;
from the number of matches, the accuracy level is determined.
In the performance evaluating apparatus based on feature decomposition provided in the embodiment of the present invention, the audio reconstructing module 762 further includes:
an audio encoder for extracting a performance skill information vector from the performance audio information; extracting a master skill information vector from master performance audio information;
the calculation module 730 is further configured to: calculating the skill distance between the playing skill information vector and the master skill information vector;
determining a skill level based on the skill distance.
It should be noted that, because the content of information interaction, execution process, and the like between the modules of the apparatus is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to specifically in the method embodiment section, and are not described herein again.
Fig. 10 illustrates an electronic device 800 provided by an embodiment of the invention. The electronic device 800 includes, but is not limited to:
a memory 820 for storing programs;
and a control processor 810 for executing the program stored in the memory 820, wherein when the control processor 810 executes the program stored in the memory 820, the control processor 810 is configured to execute the performance evaluation method based on feature decomposition.
The memory 820 is a non-transitory computer readable storage medium that can be used to store non-transitory software programs and non-transitory computer executable programs, such as the method for performance profiling based on feature decomposition described in any of the embodiments of the present invention. The control processor 810 implements the above-described method of performance evaluation based on feature decomposition by executing non-transitory software programs and instructions stored in the memory 820.
The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store performance evaluation methods that perform the above-described feature decomposition-based performance evaluation. Further, the memory 820 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 820 may optionally include memory located remotely from the control processor 810, which may be connected to the control processor 810 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions necessary to implement the above-described feature-decomposition-based performance evaluation method are stored in the memory 820 and, when executed by the one or more control processors 810, perform the feature-decomposition-based performance evaluation method provided by any of the embodiments of the present invention.
The embodiment of the invention also provides a storage medium, which stores computer executable instructions, and the computer executable instructions are used for executing the performance evaluation method based on the feature decomposition.
In one embodiment, the storage medium stores computer-executable instructions, which are executed by one or more control processors 810, for example, by one control processor 810 in the electronic device 800, so that the one or more control processors 810 can execute the method for evaluating performance based on feature decomposition according to any embodiment of the present invention.
The above described embodiments are merely illustrative, wherein elements illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention is not limited thereto. Under the shared conditions, various equivalent modifications or substitutions can be made, and the equivalent modifications or substitutions are included in the scope of the invention defined by the claims.
Claims (10)
1. A performance evaluation method based on feature decomposition is characterized by comprising the following steps:
acquiring performance audio information of a user, and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
determining an accuracy level according to the matching degree between the reconstructed music score information and the target performance music score;
calculating an information vector distance value between the playing audio information and preset master playing audio information, and determining skill level according to the information vector distance value;
and comprehensively evaluating the playing audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result.
2. The method of claim 1, wherein before the obtaining the performance audio information of the user, further comprising:
collecting a target performance music score and the master performance audio information to form a training set in which the target performance music score and the master performance audio information are in one-to-one correspondence;
and inputting the training set into a performance evaluation model for training, and updating the performance evaluation model.
3. The method according to claim 2, wherein the inputting the training set into a performance evaluation model for training and updating the performance evaluation model comprises:
generating a reconstructed playing music score according to the target playing music score;
generating reconstructed audio information according to the target performance music score and the master performance audio information;
and updating the parameters of the performance evaluation model, so that the errors between the reconstructed performance music score and the target performance music score and between the reconstructed audio information and the master performance audio information are smaller than a preset error threshold.
4. The method of claim 3, wherein the acquiring of the user performance audio information and the inputting of the user performance audio information into a preset performance evaluation model to obtain reconstructed score information comprises:
inputting the performance audio information into the performance evaluation model;
extracting a score information vector from the performance audio information;
and generating the reconstructed music score information according to the music score information vector.
5. The method of claim 4, wherein determining an accuracy level based on a degree of match between the reconstructed score information and a target performance score comprises:
calculating the cross entropy of the reconstructed music score information and the corresponding note sequence in the target playing music score;
acquiring the matching number of notes in the reconstructed music score information according to the cross entropy and a preset music score matching threshold;
determining the accuracy level based on the number of matches.
6. The method of claim 4, wherein determining the skill level based on calculating an information vector distance value between the performance audio information and a predetermined master performance audio information, and based on the information vector distance value, comprises:
extracting a performance skill information vector from the performance audio information;
extracting a master skill information vector from the master performance audio information;
calculating a skill distance between the performance skill information vector and the master skill information vector;
determining the skill level based on the skill distance.
7. The method according to claim 1, wherein the comprehensive evaluation of the performance audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result comprises:
and calculating the weighted average value of the accuracy level and the skill level to obtain the comprehensive evaluation result of the playing audio information.
8. A performance evaluation device based on feature decomposition is characterized by comprising:
the acquisition module is used for acquiring the performance audio information of a user and inputting the performance audio information into a preset performance evaluation model to obtain reconstructed music score information;
the matching module is used for determining the accuracy level according to the matching degree between the reconstructed music score information and the target performance music score;
the computing module is used for computing an information vector distance value between the playing audio information and the master playing audio information and determining the skill level according to the information vector distance value;
and the evaluation module is used for comprehensively evaluating the playing audio information according to the accuracy level and the skill level to obtain a comprehensive evaluation result.
9. An electronic device, comprising: a memory, a control processor and a computer program stored on the memory and executable on the control processor, when executing the computer program, implementing the method for performance evaluation based on feature decomposition according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of characterizing decomposition-based performance evaluation according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111131753.2A CN113851146A (en) | 2021-09-26 | 2021-09-26 | Performance evaluation method and device based on feature decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111131753.2A CN113851146A (en) | 2021-09-26 | 2021-09-26 | Performance evaluation method and device based on feature decomposition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113851146A true CN113851146A (en) | 2021-12-28 |
Family
ID=78980221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111131753.2A Pending CN113851146A (en) | 2021-09-26 | 2021-09-26 | Performance evaluation method and device based on feature decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113851146A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767847A (en) * | 2017-09-29 | 2018-03-06 | 小叶子(北京)科技有限公司 | A kind of intelligent piano performance assessment method and system |
CN108492817A (en) * | 2018-02-11 | 2018-09-04 | 北京光年无限科技有限公司 | A kind of song data processing method and performance interactive system based on virtual idol |
KR102107588B1 (en) * | 2018-10-31 | 2020-05-07 | 미디어스코프 주식회사 | Method for evaluating about singing and apparatus for executing the method |
CN111554256A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | Piano playing ability evaluation system based on strong and weak standards |
CN111554255A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | MIDI playing style automatic conversion system based on recurrent neural network |
CN111898753A (en) * | 2020-08-05 | 2020-11-06 | 字节跳动有限公司 | Music transcription model training method, music transcription method and corresponding device |
CN112669796A (en) * | 2020-12-29 | 2021-04-16 | 西交利物浦大学 | Method and device for converting music into music book based on artificial intelligence |
CN113780811A (en) * | 2021-09-10 | 2021-12-10 | 平安科技(深圳)有限公司 | Musical instrument performance evaluation method, device, equipment and storage medium |
-
2021
- 2021-09-26 CN CN202111131753.2A patent/CN113851146A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767847A (en) * | 2017-09-29 | 2018-03-06 | 小叶子(北京)科技有限公司 | A kind of intelligent piano performance assessment method and system |
CN108492817A (en) * | 2018-02-11 | 2018-09-04 | 北京光年无限科技有限公司 | A kind of song data processing method and performance interactive system based on virtual idol |
KR102107588B1 (en) * | 2018-10-31 | 2020-05-07 | 미디어스코프 주식회사 | Method for evaluating about singing and apparatus for executing the method |
CN111554256A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | Piano playing ability evaluation system based on strong and weak standards |
CN111554255A (en) * | 2020-04-21 | 2020-08-18 | 华南理工大学 | MIDI playing style automatic conversion system based on recurrent neural network |
CN111898753A (en) * | 2020-08-05 | 2020-11-06 | 字节跳动有限公司 | Music transcription model training method, music transcription method and corresponding device |
CN112669796A (en) * | 2020-12-29 | 2021-04-16 | 西交利物浦大学 | Method and device for converting music into music book based on artificial intelligence |
CN113780811A (en) * | 2021-09-10 | 2021-12-10 | 平安科技(深圳)有限公司 | Musical instrument performance evaluation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111630590B (en) | Method for generating music data | |
KR20190010135A (en) | Apparatus and method for composing music using artificial intelligence | |
CN118194923B (en) | Method, device, equipment and computer readable medium for constructing large language model | |
Kim et al. | An overview of automatic piano performance assessment within the music education context | |
CN117057414B (en) | Text generation-oriented multi-step collaborative prompt learning black box knowledge distillation method and system | |
CN113780811A (en) | Musical instrument performance evaluation method, device, equipment and storage medium | |
CN115206270A (en) | Training method and training device of music generation model based on cyclic feature extraction | |
Gounaropoulos et al. | Synthesising timbres and timbre-changes from adjectives/adverbs | |
Jadhav et al. | Transfer Learning for Audio Waveform to Guitar Chord Spectrograms Using the Convolution Neural Network | |
CN113851146A (en) | Performance evaluation method and device based on feature decomposition | |
Ortega et al. | Phrase-level modeling of expression in violin performances | |
Stevens et al. | Representations of tonal music: A case study in the development of temporal relationships | |
CN115331648A (en) | Audio data processing method, device, equipment, storage medium and product | |
CN111782864B (en) | Singing audio classification method, computer program product, server and storage medium | |
CN113870897A (en) | Audio data teaching evaluation method and device, equipment, medium and product thereof | |
Chuan | An active learning approach to audio-to-score alignment using dynamic time warping | |
Chen | Design of music teaching system based on artificial intelligence | |
KR102227415B1 (en) | System, device, and method to generate polyphonic music | |
CN116645957B (en) | Music generation method, device, terminal, storage medium and program product | |
CN116129938A (en) | Singing voice synthesizing method, singing voice synthesizing device, singing voice synthesizing equipment and storage medium | |
Chaurasiya et al. | Recognition of Speech Emotion Using Machine Learning Techniques | |
CN118824215A (en) | Audio processing method, device, equipment, medium and product | |
Zhu | Application of Artificial Intelligence and Speech Data System based on Music Internet Course Learning System | |
Otsuka et al. | An on-line algorithm of guitar performance transcription using non-negative matrix factorization | |
Askedalen | Generating Live Interactive Music Accompaniment Using Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40062551 Country of ref document: HK |