CN111681674A

CN111681674A - Method and system for identifying musical instrument types based on naive Bayes model

Info

Publication number: CN111681674A
Application number: CN202010483915.8A
Authority: CN
Inventors: 丁戌倩; 梁循; 武文娟
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-09-18
Anticipated expiration: 2040-06-01
Also published as: CN111681674B

Abstract

The invention relates to a naive Bayes model-based musical instrument type identification method and a system, which comprise the following steps: s1 dividing the music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to all audio frames into a naive Bayes model, and identifying the musical instruments according to the probability of the musical instruments appearing in the music. Through the data music characteristic extraction mode, the identification of the types, timbres and techniques of the musical instruments by artificial intelligence is realized, and the refined differentiation of the relationship between the homogeneous musical instruments and the heterogeneous musical instruments is facilitated, particularly the artificial separation and the precise identification of the sound subdivision, the timbre similarity and the technical overlap ratio of the homogeneous musical instrument types are realized.

Description

Method and system for identifying musical instrument types based on naive Bayes model

Technical Field

The invention relates to a naive Bayes model-based musical instrument type identification method and a naive Bayes model-based musical instrument type identification system, and belongs to the technical field of musical instrument identification.

Background

In recent years, with the rapid development of the internet era, the daily life of people is influenced by more and more music applications, digital music also shows explosive growth in the entertainment field, people do not lack music in life, music communities are gradually popularized, the propagation mode of P2P is also gradually developed, and how to help people find music needed by people is an important direction for the future development of music identification technology. With the development of music recognition technology, music recognition from the aspect of singing names, singers and other texts is widely popularized, and in the nineties, music recognition based on melody, rhythm and other music characteristics is developed, so that the music recognition technology directly becomes a very widely applied technology after the music recognition technology appears, and the development of the music recognition technology is promoted. In 1980-.

Currently, systems for identifying musical instruments used in music are not common. This is because, for a music library with a large data set size, compared with text attribute features or melody rhythm recognition, it is more difficult to recognize musical instruments used in music, and although some musical instruments have a great degree of distinction from waveform diagram analysis, it is far from sufficient to recognize musical instruments in music only from features such as tone, pitch, loudness, etc., so it is necessary to analyze audio features more accurately and more characteristically to distinguish different sounds played by different musical instruments. Timbre is the attribute of sound quality, not the loudness and intensity of sound, and can distinguish different instruments playing the same musical note from different sounds in hearing. For example, the human auditory system can distinguish between a 4410Hz violin and a oboe because their high frequency overtone components are different, and the amplitude of the high frequency components is different, which is the timbre. Therefore, the key point for distinguishing different musical instruments in music is to distinguish the timbres of the musical instruments, but how to characterize the music in a characteristic value mode is an urgent problem to be solved in the field.

Disclosure of Invention

In view of the above disadvantages of the prior art, the present invention provides a naive bayes model based method and system for identifying the type of musical instrument, which can realize the identification of the type, timbre and technique of the musical instrument by artificial intelligence through the way of extracting the digitized music characteristics, and help to finely distinguish the relationship between the homogeneous and heterogeneous musical instruments, especially the artificial separation and precise identification of the sound subdivision, timbre similarity and technical overlap ratio of the homogeneous musical instrument type.

In order to achieve the aim, the invention provides a naive Bayes model-based musical instrument type identification method, which comprises the following steps: s1 dividing the music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the audio feature vectors corresponding to a plurality of existing musical instruments and the feature vectors corresponding to all audio frames into a naive Bayes model, and identifying the musical instruments according to the probability of the musical instruments appearing in the music.

Further, if the probability of the instrument appearing in the music exceeds the threshold value, it is determined that the instrument appears in the music, and if the probability of the instrument appearing in the music does not exceed the threshold value, it is determined that the instrument does not appear in the music.

Further, the musical instruments used in the music piece include a main musical instrument and a secondary musical instrument, and the main musical instrument and the secondary musical instrument are distinguished by obtaining the probability of each musical instrument appearing in the music piece through a naive Bayes method model.

Further, the instrument with the highest probability of appearing in the music piece is the main instrument, and the other instruments appearing in the music piece are the secondary instruments.

Further, the output formula of the naive bayes model is as follows:

wherein, X_iA certain frame, a total z-frame, representing a piece of music X; y is_jRepresents a certain musical instrument, and has n types of musical instruments in total.

Further, the specific operation process of S3 is as follows: s3.1, inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to the audio frames into a pre-trained naive Bayes model; s3.2 calculating P (y) by using output formula of naive Bayes model₁|X_i),P(y₂|X_i),…,P(y_n|X_i) (ii) a S3.3 by formula

Get the musical instrument y_jProbability of appearing in the music piece X.

Further, the pre-training process of the pre-trained naive Bayes model is as follows: inputting a music piece with known musical instrument playing type into the original naive Bayes model, obtaining the probability of a certain musical instrument appearing in the music piece by the music piece according to an output formula of the naive Bayes model, judging whether the probability exceeds a threshold value, comparing the judgment result with the type of the actual playing music piece, and inputting the naive Bayes model as a final output model if the judgment result is the same; if the results are different, the output formula of the naive Bayes model is adjusted until the results are the same.

Further, frequency domain information is obtained by performing Fourier transform on each audio frame, and the frequency domain inversion information is obtained by rotating a frequency domain graph formed by the frequency domain information and representing the amplitude of the frequency domain graph by using a gray scale graph; the time domain information is obtained by stacking the frequency domain plots in the time dimension.

Further, a hamming window is added to several audio frames to prevent frequency leakage.

The invention also discloses a system for identifying the types of musical instruments based on the naive Bayes model, which comprises the following steps: the preprocessing module is used for dividing the music to be identified into a plurality of audio frames; the characteristic extraction module is used for extracting time domain information, frequency domain information, cepstrum domain information and Mel frequency cepstrum coefficients in the audio frame to form a characteristic vector corresponding to the audio frame; and the recognition module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into the naive Bayes model and recognizing the musical instruments according to the probability of the musical instruments appearing in the music.

Due to the adoption of the technical scheme, the invention has the following advantages:

1. the invention realizes the identification of the type, tone color and technique of the musical instrument by artificial intelligence through the datamation music characteristic extraction mode, helps to finely distinguish the relationship between the homogeneous musical instrument and the heterogeneous musical instrument, and particularly helps to artificially separate and accurately distinguish the sound subdivision, tone color similarity and technical overlap ratio of the homogeneous musical instrument.

2. The method for extracting the music features and the extracted feature phasor can reduce the time consumption of instrument identification in music, and cannot influence the precision and the accuracy of instrument identification.

3. The method can be widely applied to a plurality of fields such as music appreciation, music classification and music recommendation, and the musical instruments used in the music greatly influence the style of the music, so the method can play a certain role in music information retrieval.

4. The invention trains the music by adopting a naive Bayes classification model, and represents musical instruments possibly corresponding to the music by adopting a probability mode, so that the artificial intelligence model learning can be applied to the identification of key elements in the music and common music structures and rules, and provides reference and reference for the artificial intelligence better applied to the music field, such as sound modification, music composition and the like.

Drawings

FIG. 1 is a flow chart of a naive Bayes model based instrument type identification method in an embodiment of the invention;

FIG. 2 is a flow diagram of a pre-processing process for a musical composition in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of the audio frame timbre feature extraction process in one embodiment of the present invention;

FIG. 4 is a flow chart of a process for extracting the Mel cepstral coefficient feature of an audio frame according to an embodiment of the present invention;

FIG. 5 is a flow diagram of a naive Bayes classification model identification process in an embodiment of the invention;

FIG. 6 is a flow chart of a naive Bayes classification model training process in an embodiment of the invention.

Detailed Description

The present invention is described in detail by way of specific embodiments in order to better understand the technical direction of the present invention for those skilled in the art. It should be understood, however, that the detailed description is provided for a better understanding of the invention only and that they should not be taken as limiting the invention. In describing the present invention, it is to be understood that the terminology used is for the purpose of description only and is not intended to be indicative or implied of relative importance.

The invention is characterized in that the feature vector of the music is formed by extracting the tone color feature of the music and the feature fusion of the Mel cepstrum coefficient (MFCC), the feature vector is used as input, the naive Bayes model is used for identifying the musical instruments playing the music, the musical instruments playing the music comprise a main musical instrument playing the leading role and several secondary musical instruments matched with the main musical instrument, for example, one music is the main melody of a piano, namely the piano is the main musical instrument, and the music also comprises the accompaniment of the music such as violin, flute and the like, namely the music such as the violin, flute and the like is the secondary musical instrument. The technical scheme of the invention can also be used for distinguishing the importance degree of each instrument in the secondary instruments.

Example one

A method for identifying a musical instrument category based on a naive bayes model, as shown in fig. 1, comprises the following steps:

s1 divides the music piece to be recognized into a plurality of audio frames, and determines the number of the audio frames. As shown in fig. 2, each piece of music in the original data set is divided into a plurality of audio frames. A hamming window is then applied to the audio frame to prevent frequency leakage, which acts to smooth out the gibbs effect from frame to frame. In order to store both time domain information and frequency domain information, it is necessary to perform short-time fourier transform on the audio frame after framing and windowing to obtain a spectrogram.

The process of generating the spectrogram by short-time Fourier transform comprises the following steps:

framing the long signal of the music and adding a window; performing Fourier transform on each audio frame; the audio frame at this time is a short-time signal, i.e., a short-time fourier transform. Rotating the spectrogram; representing the magnitude of the spectrogram by a gray scale map; and stacking the frequency domain graphs obtained by Fourier transform according to the time dimension to finally obtain a spectrogram. The frequency domain information is obtained by performing Fourier transform on each audio frame, the frequency domain inversion information is obtained by rotating a frequency domain graph formed by the frequency domain information and representing the amplitude of the frequency domain graph by using a gray scale graph; the time domain information is obtained by stacking the frequency domain plots in the time dimension.

S2, extracting the time domain information, frequency domain information and Mel frequency cepstrum coefficient in the audio frame, and forming the feature vector corresponding to the audio frame.

As shown in fig. 3, based on the MPEG-7(Multimedia Content Description Interface) standard, the timbre of the musical instrument is captured from three levels, i.e., time domain (time domain of timbre), frequency domain (frequency of timbre waveform) and inverse frequency domain (frequency of inverted timbre waveform), and the timbre feature elements of the three levels, i.e., each frame of each music piece in the original data set, are extracted and stored finely.

As shown in fig. 4, the process of extracting Mel-Frequency Cepstrum Coefficient (MFCC) is:

2.1 Pre-emphasis

If the intensity of the data at low frequency is greater than that at high frequency, it is not easy to process, so that it is necessary to filter out the low frequency component in the data, and the high frequency characteristic is more prominent.

2.2 Framing

Framing is to assemble N sampling points into one observation unit. The time covered by each frame is set to be 25ms, and since the sampling rate is 16000, the number of sample points obtained in each frame is 400. In addition, in order to avoid the excessive variation of two adjacent frames, there is an overlapping region between two adjacent frames. Since the set overlap region is 15ms, one frame is taken every 10 ms.

2.3 windowing of each frame

Because the frame signal is treated as a periodic signal during conversion, sudden changes occur at two end points of the frame, and the converted frequency spectrum is greatly different from the original signal frequency spectrum. Each frame is windowed so that no abrupt changes occur at the two end points of the fourier transform of the signal within the frame.

2.4 zero padding for each frame

Since each frame of signal is fourier transformed, which requires a certain input data length, now 400 samples in a frame, zero padding is performed to the nearest 512 bits.

2.5 Fourier transform of the signals of each frame

And carrying out 512-point Fourier transform on each frame signal after windowing the subframe to obtain the frequency spectrum of each frame. And taking an absolute value or a square of the frequency spectrum of the voice signal to obtain a power spectrum of the voice signal.

2.6 mel filtering

The 40 triangular filters are evenly distributed over the mel-frequency spectrum with a 50% overlap between each two filters. Therefore, the actual frequency is converted into the mel frequency, and the minimum actual frequency is 0Hz, and the maximum actual frequency is 16000/2-8000 Hz. After converting into the mel frequency, the mel frequency distribution of the 40 triangular filters is calculated, and then the mel frequency is converted into the actual frequency.

2.7 logarithm of

The logarithm is solved from the output of the triangular window filter group, and the result similar to homomorphic transformation can be obtained.

2.8 discrete cosine transform (DCT transform)

And performing DCT (discrete cosine transformation) on the logarithmic energy Mel spectrum, and taking the first 13 dimensions to output to obtain a Mel cepstrum.

2.9 normalization

All mel cepstra were normalized. And firstly, solving the mean vector of all cepstrum vectors, and then subtracting the mean vector from each cepstrum vector to obtain the output characteristic vector of the mel frequency cepstrum coefficient.

S3, inputting the audio feature vectors corresponding to a plurality of existing musical instruments and the feature vectors corresponding to all audio frames into a naive Bayes model, and identifying the musical instruments according to the probability of the musical instruments appearing in the music.

As shown in fig. 5, the specific operation procedure of step S3 is as follows:

s3.1 sets C ═ y of several instruments₁,y₂,…,y_j,…,y_nInputting the feature vector corresponding to the audio frame into a pre-trained naive Bayes model;

s3.2 calculating P (y) by using output formula of naive Bayes model₁|X_i),P(y₂|X_i),…,P(y_n|X_i)；

S3.3 by formula

Get the musical instrument y_jProbability of appearing in the music piece X.

Wherein, the output formula of the naive Bayes model is as follows:

The probability of each instrument appearing in the music piece X can be obtained by the above procedure, and since the probability of the instrument not appearing in the music piece is not necessarily completely zero, it is necessary to set a threshold value to the probability, and if the probability of the instrument appearing in the music piece exceeds the threshold value, it is determined that the instrument appears in the music piece, and if the probability of the instrument appearing in the music piece does not exceed the threshold value, it is determined that the instrument does not appear in the music piece. It should be noted that the value of the threshold needs to be determined according to specific music or general standards, and the principle of the value is to ensure that not only musical instruments that do not appear in the music but also secondary musical instruments that appear in a shorter time are not removed, and the threshold can be adjusted when the model is pre-trained.

The musical instruments used in the music piece comprise main musical instruments and secondary musical instruments, and the main musical instruments and the secondary musical instruments are distinguished by obtaining the probability of the appearance of each musical instrument in the music piece through a naive Bayes method model. The instrument with the highest probability of appearing in the music piece is the primary instrument, and the other instruments appearing in the music piece are the secondary instruments. Usually, there is only one main instrument of a music piece, but it is not excluded that some music pieces are played by multiple instruments, and the probability of each instrument appearing is not much different. Here, the plurality is two or more. It is not possible to generalize the case where the probabilities of several musical instruments appearing in a music piece are not very different, it is necessary to judge the main musical instrument and the sub musical instrument according to the style of the music piece.

As shown in fig. 6, the pre-training process of the pre-trained naive bayes model is as follows: inputting a music piece with known musical instrument playing type into the original naive Bayes model, obtaining the probability of a certain musical instrument appearing in the music piece by the music piece according to an output formula of the naive Bayes model, judging whether the probability exceeds a threshold value, comparing the judgment result with the type of the actual playing music piece, and inputting the naive Bayes model as a final output model if the judgment result is the same; if the results are different, the output formula of the naive Bayes model is adjusted until the results are the same.

Through the steps, on the basis of obtaining a naive Bayes model through training, musical instruments used by the music needing to be identified can be classified, and meanwhile, because the obtained output result is the probability value of each musical instrument used by each music, the results can be sorted according to needs, and the main musical instrument and the secondary musical instrument of the music can be distinguished.

Example two

Based on the same inventive concept, the embodiment also discloses a system for identifying the type of musical instrument based on the naive bayes model, which comprises:

the preprocessing module is used for dividing the music to be identified into a plurality of audio frames;

the characteristic extraction module is used for extracting time domain information, frequency domain information, cepstrum domain information and Mel frequency cepstrum coefficients in the audio frame to form a characteristic vector corresponding to the audio frame;

and the recognition module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into the naive Bayes model and recognizing the musical instruments according to the probability of the musical instruments appearing in the music.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A naive Bayes model-based musical instrument type identification method is characterized by comprising the following steps:

s1 dividing the music to be identified into a plurality of audio frames;

s2, extracting time domain information, frequency domain inversion information and Mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame;

s3, inputting the audio feature vectors corresponding to a plurality of existing musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayes model, and identifying the musical instruments according to the probability of the musical instruments appearing in the music.

2. The naive bayes model-based instrument class identification method of claim 1, wherein if the probability that the instrument appears in the music piece exceeds a threshold value, it is judged that the instrument appears in the music piece, and if the probability that the instrument appears in the music piece does not exceed the threshold value, it is judged that the instrument does not appear in the music piece.

3. The naive bayes model-based instrument class identification method of claim 2, wherein the instruments used in the piece of music comprise primary and secondary instruments, the primary and secondary instruments being distinguished by the probability of each of said instruments appearing in the piece of music obtained through the naive bayes model.

4. The naive bayes model-based instrument class identification method as claimed in claim 3, wherein the instrument with the highest probability appearing in said piece of music is the primary instrument and the other instruments appearing in said piece of music are the secondary instruments.

5. The naive bayes model-based instrument class identification method as claimed in any of claims 1-4, wherein the output formula of the naive bayes model is:

6. The naive bayes model-based instrument class identification method as claimed in claim 5, wherein the specific operation procedure of S3 is as follows:

s3.1, inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to the audio frames into a pre-trained naive Bayes model;

S3.3 by formula

Get the musical instrument y_jProbability of appearing in the music piece X.

7. The naive bayes model-based instrument class identification method of claim 6, wherein the pre-training process of the pre-trained naive bayes model is:

inputting a music piece with known musical instrument playing type into an original naive Bayes model, obtaining the probability of a certain musical instrument appearing in the music piece by the music piece according to an output formula of the naive Bayes model, judging whether the probability exceeds a threshold value, comparing the judgment result with the type of the music piece actually played, and inputting the naive Bayes model as a final output model if the judgment result is the same; if the results are different, the output formula of the naive Bayes model is adjusted until the results are the same.

8. The naive bayes model-based instrument class identification method as set forth in any of claims 1-4, wherein said frequency domain information is obtained by performing a fourier transform on each of said audio frames, and said inverse frequency domain information is obtained by rotating a frequency domain graph formed by said frequency domain information and representing the amplitude of said frequency domain graph by a gray scale; the time domain information is obtained by stacking the frequency domain maps in a time dimension.

9. A naive Bayes model based instrument class identification method as in any of claims 1-4, wherein a Hamming window is added to a number of said audio frames to prevent frequency leakage.

10. A naive Bayes model based instrument type identification system, comprising:

the feature extraction module is used for extracting time domain information, frequency domain information, cepstrum domain information and Mel frequency cepstrum coefficients in the audio frame to form feature vectors corresponding to the audio frame;

and the recognition module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayes model and recognizing the musical instruments according to the probability of the musical instruments appearing in the music.