CN111681674B

CN111681674B - Musical instrument type identification method and system based on naive Bayesian model

Info

Publication number: CN111681674B
Application number: CN202010483915.8A
Authority: CN
Inventors: 丁戌倩; 梁循; 武文娟
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2024-03-08
Anticipated expiration: 2040-06-01
Also published as: CN111681674A

Abstract

The invention relates to a musical instrument type identification method and system based on a naive Bayesian model, comprising the following steps: s1, dividing music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the audio feature vectors corresponding to the musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music. The method realizes the identification of artificial intelligence on the types, timbres and techniques of musical instruments through the mode of extracting the data musical characteristics, and helps to finely distinguish the relationship between homogeneous and heterogeneous musical instruments, in particular to the artificial separation and accurate identification of the sound subdivision, timbre similarity and technical coincidence of homogeneous musical instrument types.

Description

Musical instrument type identification method and system based on naive Bayesian model

Technical Field

The invention relates to a musical instrument type recognition method and system based on a naive Bayesian model, and belongs to the technical field of musical instrument recognition.

Background

In recent years, with the rapid development of the internet age, the application of music increasingly affects the daily life of people, digital music also shows explosive growth in the entertainment field, people do not lack music in life, music communities are gradually popularized, the P2P transmission mode is also gradually popular, and how to help people find music needed by the people is an important direction of the future development of music identification technology. With the development of music recognition technology, music recognition has been widely popularized from the text aspects of song names, singers and the like, to nineties, to music recognition based on melody, rhythm and other music features, the music recognition technology based on music features is directly a very widely applied technology after appearance, and the development of the music recognition technology is promoted. Patent applications based on music recognition began to start in 1980-1996, but the total amount was not large, and the number of patents of music recognition technology began to increase from 1998 to 2008, which is also a rapid development stage of music recognition technology, including emotion recognition and music style recognition based on text attributes and melody rhythm attributes.

Currently, there is no system for identifying musical instruments used in musical compositions. This is because, for a library of relatively large data set sizes, there is a great difficulty in identifying musical instruments used for a musical composition than text attribute features or melody rhythm recognition, and although there is a great degree of distinction between some musical instruments from the waveform diagram, it is far from sufficient to identify the musical instruments in the musical composition from features such as pitch, spike, loudness, etc., so that it is necessary to analyze more accurate, more characteristic audio features to distinguish different sounds played by different musical instruments. Timbre is an attribute of sound quality, rather than loudness and intensity of sound, that distinguishes different instruments from each other in terms of hearing the same musical instrument. For example, the human auditory system can distinguish between a 4410Hz violin and a two-reed tube because their high frequency overtones are not identical in composition, nor are the magnitudes of the high frequency components identical, but the difference is timbre. The key point of distinguishing different musical instruments in a musical composition is to distinguish the tone colors of the musical instruments, but how to characterize the musical composition in a characteristic value manner is a problem to be solved in the art.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a musical instrument type identification method and system based on a naive Bayesian model, which realize the identification of artificial intelligence on the types, timbres and techniques of musical instruments by the way of extracting the musical characteristics of data, and help to finely distinguish the relations between homogeneous and heterogeneous musical instruments, in particular to the manual separation and accurate identification of the sound subdivision, timbre similarity and technical coincidence degree of homogeneous musical instrument types.

In order to achieve the above object, the present invention provides a musical instrument type recognition method based on a naive bayes model, comprising the steps of: s1, dividing music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the existing audio feature vectors corresponding to a plurality of musical instruments and feature vectors corresponding to all audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.

Further, if the probability that the instrument appears in the musical composition exceeds the threshold value, it is judged that the instrument appears in the musical composition, and if the probability that the instrument appears in the musical composition does not exceed the threshold value, it is judged that the instrument does not appear in the musical composition.

Further, the musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each musical instrument appearing in the musical composition is obtained by a naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument.

Further, the musical instrument with the highest probability of appearing in the musical composition is the primary musical instrument, and the other musical instruments appearing in the musical composition are the secondary musical instruments.

Further, the output formula of the naive bayes model is:

wherein X is _i A certain frame representing a piece of music X, a total of z frames; y is _j Representing a certain instrument, there are n instruments in total.

Further, the specific operation process of S3 is: s3.1, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to the audio frames into a pretrained naive Bayes model; s3.2 calculating P (y) using the output equation of the naive Bayes model ₁ |X _i ),P(y ₂ |X _i ),…,P(y _n |X _i ) The method comprises the steps of carrying out a first treatment on the surface of the S3.3 is expressed by the formulaObtaining the musical instrument y _j Probability of appearing in the music piece X.

Further, the pre-training process of the pre-trained naive bayes model is as follows: inputting music with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the music according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the actual playing music, and if the judging result is the same, inputting the naive Bayesian model into the original naive Bayesian model to be a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.

Further, the frequency domain information is obtained by performing fourier transform on each audio frame, the frequency domain information is obtained by rotating a frequency domain map composed of the frequency domain information, and the amplitude of the frequency domain map is represented by a gray scale map; the time domain information is obtained by stacking frequency domain maps in a time dimension.

Further, a hamming window is applied to several audio frames to prevent frequency leakage.

The invention also discloses a musical instrument type recognition system based on the naive Bayesian model, which comprises: the preprocessing module is used for dividing the music to be identified into a plurality of audio frames; the feature extraction module is used for extracting time domain information, frequency domain information and mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame; and the identification module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.

Due to the adoption of the technical scheme, the invention has the following advantages:

1. the invention realizes the identification of the type, tone and technique of the musical instrument by the artificial intelligence through the mode of extracting the data musical characteristics, and helps to finely distinguish the relationship between the homogeneous and heterogeneous musical instruments, in particular to the artificial separation and accurate identification of the sound subdivision, tone similarity and technical overlap ratio of the homogeneous musical instrument types.

2. The method for extracting the music features and extracting the feature phasors in the invention can reduce the time consumption of the identification of the musical instrument in the music, and does not influence the precision and accuracy of the identification of the musical instrument.

3. The method can be widely applied to a plurality of fields such as music appreciation, music classification, music recommendation and the like, and the musical instrument used in the music greatly influences the style of the music, so the method plays a certain role in music information retrieval.

4. According to the invention, the naive Bayesian classification model is adopted to train the music, and the probability mode is adopted to represent the musical instrument possibly corresponding to the music, so that the artificial intelligence model learning can be applied to key elements in the music and common music structure and rule recognition, and references are provided for better application of artificial intelligence in the field of music, such as music repair, music composition and the like.

Drawings

FIG. 1 is a flow chart of a naive Bayesian model-based instrument class identification method in an embodiment of the present invention;

FIG. 2 is a flow chart of a preprocessing process for musical compositions in an embodiment of the present invention;

FIG. 3 is a flow chart of an audio frame timbre feature extraction process in an embodiment of the invention;

FIG. 4 is a flowchart of an audio frame mel-frequency cepstrum coefficient feature extraction process according to an embodiment of the present invention;

FIG. 5 is a flow chart of a naive Bayesian classification model identification procedure in an embodiment of the present invention;

fig. 6 is a flow chart of a naive bayesian classification model training process in accordance with an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples thereof in order to better understand the technical direction of the present invention by those skilled in the art. It should be understood, however, that the detailed description is presented only to provide a better understanding of the invention, and should not be taken to limit the invention. In the description of the present invention, it is to be understood that the terminology used is for the purpose of description only and is not to be interpreted as indicating or implying relative importance.

The invention uses a naive Bayesian model to identify musical instruments playing music, wherein the musical instruments playing music comprise a main musical instrument playing a leading role and several secondary musical instruments matched with the main musical instrument, for example, one music is a piano main melody, namely, a piano is the main musical instrument, and the music also comprises accompaniments of music such as violins, flute and the like, namely, the music such as violins, flute and the like are the secondary musical instruments. The technical scheme of the invention can also be used for distinguishing the importance degree of each instrument in the secondary instruments.

Example 1

A naive bayes model-based instrument type recognition method, as shown in fig. 1, includes the following steps:

s1, dividing the music to be identified into a plurality of audio frames, and determining the number of the audio frames. As shown in fig. 2, each piece of music in the original data set is divided into a plurality of audio frames. The hamming window is then applied to the audio frames to prevent frequency leakage, and serves to smooth out the frame-to-frame and eliminate the gibbs effect. In order to save both time domain information and frequency domain information, short-time fourier transform is required to be performed on the audio frames after framing and windowing to obtain a spectrogram.

The process of generating the spectrogram through short-time Fourier transform comprises the following steps:

framing long signals of music, and adding windows; performing Fourier transform on each frame of audio frame; the audio frame at this time is a short-time signal, i.e., a short-time fourier transform. Rotating the spectrogram; the spectrogram amplitude is represented by a gray scale; and stacking the frequency domain graphs obtained by Fourier transformation according to the time dimension to finally obtain the spectrogram. The frequency domain information is obtained by carrying out Fourier transform on each audio frame, the frequency domain information is obtained by rotating a frequency domain diagram formed by the frequency domain information, and the amplitude of the frequency domain diagram is represented by a gray level diagram; the time domain information is obtained by stacking frequency domain maps in a time dimension.

S2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame.

As shown in fig. 3, based on the MPEG-7 (Multimedia Content Description Interface) standard, musical instrument tone is captured characteristically from three levels of time domain (time domain of tone), frequency domain (frequency of tone waveform) and frequency-down domain (frequency of inverted tone waveform), and tone characteristic elements at three levels of each frame of each piece of music in the original data set are extracted and stored finely.

As shown in fig. 4, the procedure of extracting mel-frequency cepstral coefficients (Mel Frequency Cepstrum Coefficient, MFCC) is as follows:

2.1 Pre-emphasis

If the intensity of the data at low frequency is larger than that at high frequency, the processing is not facilitated, so that the low frequency component in the data needs to be filtered out, and the high frequency characteristic is more outstanding.

2.2 framing

The framing is to integrate N sampling points into one observation unit. The time covered per frame is set to 25ms, and since the sampling rate is 16000, the number of sample points per frame is 400. In addition, in order to avoid the excessive variation between two adjacent frames, there is a section of overlap area between two adjacent frames. The set overlap area is 15ms, so that one frame is taken every 10 ms.

2.3 windowing for each frame

Since the intra-frame signal is treated as a periodic signal during conversion, abrupt changes occur at the two end points of the frame, and the converted spectrum is very different from the original signal spectrum. Each frame is windowed so that no abrupt changes occur at both endpoints when the intra signal is fourier transformed.

2.4 zero padding for each frame

Since each frame signal is subjected to fourier transform, which requires the input data length to be a certain value, and now 400 sample points for one frame, zero padding is performed to the nearest 512 bits.

2.5 Fourier transform of frame signals

And carrying out 512-point Fourier transform on each frame of signals subjected to framing and windowing to obtain the frequency spectrum of each frame. And taking absolute value or square of the frequency spectrum of the voice signal to obtain the power spectrum of the voice signal.

2.6 Mel filtering

The 40 triangular filters are uniformly distributed on the mel spectrum, and each two filters have 50% overlapping parts. The actual frequency is first converted to mel frequency, which is at a minimum of 0Hz and at a maximum of 16000/2=8000 Hz. After conversion to the mel frequency, the mel frequency distribution of the 40 triangular filters is calculated, and then the mel frequency is converted to the actual frequency.

2.7 taking the logarithm

The output of the triangular window filter bank is logarithmically obtained, and a result similar to homomorphic transformation can be obtained.

2.8 discrete cosine transform (DCT transform)

DCT transformation is carried out on the logarithmic energy mel spectrum, and the first 13 dimensions are taken for output, thus obtaining the mel cepstrum.

2.9 normalization

All mel-cepstral normalized. Firstly, calculating the average value vector of all cepstrum vectors, and then subtracting the average value vector from each cepstrum vector to obtain the Mel frequency cepstrum coefficient output characteristic vector.

S3, inputting the existing audio feature vectors corresponding to a plurality of musical instruments and feature vectors corresponding to all audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.

As shown in fig. 5, the specific operation procedure of step S3 is as follows:

s3.1 is to assemble several instruments c= { y ₁ ,y ₂ ,…,y _j ,…,y _n A corresponding feature vector j e n; feature vectors corresponding to the audio frames are input to a pre-trained naive Bayes model;

s3.2 calculating P (y) using the output equation of the naive Bayes model ₁ |X _i ),P(y ₂ |X _i ),…,P(y _n |X _i )；

S3.3 is expressed by the formulaObtaining the musical instrument y _j Probability of appearing in the music piece X.

The output formula of the naive Bayes model is as follows:

From the above-described procedure, the probability that each instrument is present in the musical composition X can be found, and since the probability that the instrument is not present in the musical composition is not necessarily completely zero, it is necessary to set a threshold value for the probability, determine that the instrument is present in the musical composition if the probability that the instrument is present in the musical composition exceeds the threshold value, and determine that the instrument is not present in the musical composition if the probability that the instrument is present in the musical composition does not exceed the threshold value. It should be noted that, the threshold value needs to be determined according to specific music or general standards, and the principle of the threshold value is to ensure that not only musical instruments which are not present in music but also secondary musical instruments with relatively short occurrence time are not removed, and the threshold value can be adjusted when the model is pre-trained.

The musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each musical instrument appearing in the musical composition is obtained by a naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument. The musical instrument with the highest probability of appearing in the musical composition is the primary musical instrument, and the other musical instruments appearing in the musical composition are the secondary musical instruments. Typically, the main instruments of a musical composition are only one, but it is not excluded that some musical compositions are played predominantly by a plurality of instruments, each with a comparable probability of occurrence. Here, the plural kinds are two or more. It is necessary to judge the primary musical instrument and the secondary musical instrument according to the style of the musical composition, not to generalize, for the case where the probabilities of several musical instruments appearing in the musical composition are not so much.

As shown in fig. 6, the pretraining process of the pretrained naive bayes model is: inputting music with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the music according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the actual playing music, and if the judging result is the same, inputting the naive Bayesian model into the original naive Bayesian model to be a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.

The musical instruments used by the music needing to be identified can be classified on the basis of training to obtain the naive Bayesian model through the steps, and meanwhile, as the obtained output result is the probability value of each musical instrument used by each music, the result can be ordered according to the need, and the main musical instrument and the secondary musical instrument of one music can be distinguished.

Example two

Based on the same inventive concept, the embodiment also discloses a musical instrument type recognition system based on a naive bayes model, comprising:

the preprocessing module is used for dividing the music to be identified into a plurality of audio frames;

the feature extraction module is used for extracting time domain information, frequency domain information and mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame;

and the identification module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A naive bayes model-based instrument type recognition method, comprising the steps of:

s1, dividing music to be identified into a plurality of audio frames;

s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame;

s3, inputting the audio feature vectors corresponding to a plurality of existing musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music;

the specific operation process of the S3 is as follows:

s3.1, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to the audio frames into a pretrained naive Bayes model;

S3.3 is expressed by the formulaObtaining the musical instrument y _j Probability of occurrence in the music piece X;

the output formula of the naive Bayes model is as follows:

wherein X is _i A certain frame representing a piece of music X, a total of z frames; y is _j Representing a certain instrument, n instruments in total;

judging that the musical instrument is present in the music if the probability that the musical instrument is present in the music exceeds a threshold value, and judging that the musical instrument is not present in the music if the probability that the musical instrument is not present in the music does not exceed the threshold value;

the musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each of the musical instruments appearing in the musical composition is obtained by the naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument.

2. The method for identifying musical instrument types based on a naive bayes model according to claim 1, wherein the musical instrument having the highest probability of appearing in the musical composition is a primary musical instrument, and the other musical instruments appearing in the musical composition are secondary musical instruments.

3. The naive bayes model-based instrument class identification method of claim 1, wherein the pre-training process of the naive bayes model is:

inputting a musical instrument with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the musical instrument according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the musical instrument actually played, and if the judging result is the same, inputting the naive Bayesian model into a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.

4. The naive bayes model based instrument class recognition method according to claim 1, wherein the frequency domain information is obtained by fourier transforming each of the audio frames, the inverse frequency domain information is obtained by rotating a frequency domain map constituted by the frequency domain information, and the amplitude of the frequency domain map is represented by a gray scale map; the time domain information is obtained by stacking the frequency domain maps in a time dimension.

5. The naive bayes model based instrument class identification method of claim 1, wherein hamming windows are added to several of said audio frames to prevent frequency leakage.

6. A naive bayes model-based instrument class recognition system, comprising:

the recognition module is used for inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and recognizing the musical instruments according to the probability that the musical instruments appear in the music;

the specific operation process of the identification module is as follows:

the output formula of the naive Bayes model is as follows: