CN111681674B - Musical instrument type identification method and system based on naive Bayesian model - Google Patents

Musical instrument type identification method and system based on naive Bayesian model Download PDF

Info

Publication number
CN111681674B
CN111681674B CN202010483915.8A CN202010483915A CN111681674B CN 111681674 B CN111681674 B CN 111681674B CN 202010483915 A CN202010483915 A CN 202010483915A CN 111681674 B CN111681674 B CN 111681674B
Authority
CN
China
Prior art keywords
musical
music
musical instrument
instrument
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010483915.8A
Other languages
Chinese (zh)
Other versions
CN111681674A (en
Inventor
丁戌倩
梁循
武文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202010483915.8A priority Critical patent/CN111681674B/en
Publication of CN111681674A publication Critical patent/CN111681674A/en
Application granted granted Critical
Publication of CN111681674B publication Critical patent/CN111681674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/041Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention relates to a musical instrument type identification method and system based on a naive Bayesian model, comprising the following steps: s1, dividing music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the audio feature vectors corresponding to the musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music. The method realizes the identification of artificial intelligence on the types, timbres and techniques of musical instruments through the mode of extracting the data musical characteristics, and helps to finely distinguish the relationship between homogeneous and heterogeneous musical instruments, in particular to the artificial separation and accurate identification of the sound subdivision, timbre similarity and technical coincidence of homogeneous musical instrument types.

Description

Musical instrument type identification method and system based on naive Bayesian model
Technical Field
The invention relates to a musical instrument type recognition method and system based on a naive Bayesian model, and belongs to the technical field of musical instrument recognition.
Background
In recent years, with the rapid development of the internet age, the application of music increasingly affects the daily life of people, digital music also shows explosive growth in the entertainment field, people do not lack music in life, music communities are gradually popularized, the P2P transmission mode is also gradually popular, and how to help people find music needed by the people is an important direction of the future development of music identification technology. With the development of music recognition technology, music recognition has been widely popularized from the text aspects of song names, singers and the like, to nineties, to music recognition based on melody, rhythm and other music features, the music recognition technology based on music features is directly a very widely applied technology after appearance, and the development of the music recognition technology is promoted. Patent applications based on music recognition began to start in 1980-1996, but the total amount was not large, and the number of patents of music recognition technology began to increase from 1998 to 2008, which is also a rapid development stage of music recognition technology, including emotion recognition and music style recognition based on text attributes and melody rhythm attributes.
Currently, there is no system for identifying musical instruments used in musical compositions. This is because, for a library of relatively large data set sizes, there is a great difficulty in identifying musical instruments used for a musical composition than text attribute features or melody rhythm recognition, and although there is a great degree of distinction between some musical instruments from the waveform diagram, it is far from sufficient to identify the musical instruments in the musical composition from features such as pitch, spike, loudness, etc., so that it is necessary to analyze more accurate, more characteristic audio features to distinguish different sounds played by different musical instruments. Timbre is an attribute of sound quality, rather than loudness and intensity of sound, that distinguishes different instruments from each other in terms of hearing the same musical instrument. For example, the human auditory system can distinguish between a 4410Hz violin and a two-reed tube because their high frequency overtones are not identical in composition, nor are the magnitudes of the high frequency components identical, but the difference is timbre. The key point of distinguishing different musical instruments in a musical composition is to distinguish the tone colors of the musical instruments, but how to characterize the musical composition in a characteristic value manner is a problem to be solved in the art.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a musical instrument type identification method and system based on a naive Bayesian model, which realize the identification of artificial intelligence on the types, timbres and techniques of musical instruments by the way of extracting the musical characteristics of data, and help to finely distinguish the relations between homogeneous and heterogeneous musical instruments, in particular to the manual separation and accurate identification of the sound subdivision, timbre similarity and technical coincidence degree of homogeneous musical instrument types.
In order to achieve the above object, the present invention provides a musical instrument type recognition method based on a naive bayes model, comprising the steps of: s1, dividing music to be identified into a plurality of audio frames; s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame; s3, inputting the existing audio feature vectors corresponding to a plurality of musical instruments and feature vectors corresponding to all audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.
Further, if the probability that the instrument appears in the musical composition exceeds the threshold value, it is judged that the instrument appears in the musical composition, and if the probability that the instrument appears in the musical composition does not exceed the threshold value, it is judged that the instrument does not appear in the musical composition.
Further, the musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each musical instrument appearing in the musical composition is obtained by a naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument.
Further, the musical instrument with the highest probability of appearing in the musical composition is the primary musical instrument, and the other musical instruments appearing in the musical composition are the secondary musical instruments.
Further, the output formula of the naive bayes model is:
wherein X is i A certain frame representing a piece of music X, a total of z frames; y is j Representing a certain instrument, there are n instruments in total.
Further, the specific operation process of S3 is: s3.1, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to the audio frames into a pretrained naive Bayes model; s3.2 calculating P (y) using the output equation of the naive Bayes model 1 |X i ),P(y 2 |X i ),…,P(y n |X i ) The method comprises the steps of carrying out a first treatment on the surface of the S3.3 is expressed by the formulaObtaining the musical instrument y j Probability of appearing in the music piece X.
Further, the pre-training process of the pre-trained naive bayes model is as follows: inputting music with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the music according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the actual playing music, and if the judging result is the same, inputting the naive Bayesian model into the original naive Bayesian model to be a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.
Further, the frequency domain information is obtained by performing fourier transform on each audio frame, the frequency domain information is obtained by rotating a frequency domain map composed of the frequency domain information, and the amplitude of the frequency domain map is represented by a gray scale map; the time domain information is obtained by stacking frequency domain maps in a time dimension.
Further, a hamming window is applied to several audio frames to prevent frequency leakage.
The invention also discloses a musical instrument type recognition system based on the naive Bayesian model, which comprises: the preprocessing module is used for dividing the music to be identified into a plurality of audio frames; the feature extraction module is used for extracting time domain information, frequency domain information and mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame; and the identification module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention realizes the identification of the type, tone and technique of the musical instrument by the artificial intelligence through the mode of extracting the data musical characteristics, and helps to finely distinguish the relationship between the homogeneous and heterogeneous musical instruments, in particular to the artificial separation and accurate identification of the sound subdivision, tone similarity and technical overlap ratio of the homogeneous musical instrument types.
2. The method for extracting the music features and extracting the feature phasors in the invention can reduce the time consumption of the identification of the musical instrument in the music, and does not influence the precision and accuracy of the identification of the musical instrument.
3. The method can be widely applied to a plurality of fields such as music appreciation, music classification, music recommendation and the like, and the musical instrument used in the music greatly influences the style of the music, so the method plays a certain role in music information retrieval.
4. According to the invention, the naive Bayesian classification model is adopted to train the music, and the probability mode is adopted to represent the musical instrument possibly corresponding to the music, so that the artificial intelligence model learning can be applied to key elements in the music and common music structure and rule recognition, and references are provided for better application of artificial intelligence in the field of music, such as music repair, music composition and the like.
Drawings
FIG. 1 is a flow chart of a naive Bayesian model-based instrument class identification method in an embodiment of the present invention;
FIG. 2 is a flow chart of a preprocessing process for musical compositions in an embodiment of the present invention;
FIG. 3 is a flow chart of an audio frame timbre feature extraction process in an embodiment of the invention;
FIG. 4 is a flowchart of an audio frame mel-frequency cepstrum coefficient feature extraction process according to an embodiment of the present invention;
FIG. 5 is a flow chart of a naive Bayesian classification model identification procedure in an embodiment of the present invention;
fig. 6 is a flow chart of a naive bayesian classification model training process in accordance with an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples thereof in order to better understand the technical direction of the present invention by those skilled in the art. It should be understood, however, that the detailed description is presented only to provide a better understanding of the invention, and should not be taken to limit the invention. In the description of the present invention, it is to be understood that the terminology used is for the purpose of description only and is not to be interpreted as indicating or implying relative importance.
The invention uses a naive Bayesian model to identify musical instruments playing music, wherein the musical instruments playing music comprise a main musical instrument playing a leading role and several secondary musical instruments matched with the main musical instrument, for example, one music is a piano main melody, namely, a piano is the main musical instrument, and the music also comprises accompaniments of music such as violins, flute and the like, namely, the music such as violins, flute and the like are the secondary musical instruments. The technical scheme of the invention can also be used for distinguishing the importance degree of each instrument in the secondary instruments.
Example 1
A naive bayes model-based instrument type recognition method, as shown in fig. 1, includes the following steps:
s1, dividing the music to be identified into a plurality of audio frames, and determining the number of the audio frames. As shown in fig. 2, each piece of music in the original data set is divided into a plurality of audio frames. The hamming window is then applied to the audio frames to prevent frequency leakage, and serves to smooth out the frame-to-frame and eliminate the gibbs effect. In order to save both time domain information and frequency domain information, short-time fourier transform is required to be performed on the audio frames after framing and windowing to obtain a spectrogram.
The process of generating the spectrogram through short-time Fourier transform comprises the following steps:
framing long signals of music, and adding windows; performing Fourier transform on each frame of audio frame; the audio frame at this time is a short-time signal, i.e., a short-time fourier transform. Rotating the spectrogram; the spectrogram amplitude is represented by a gray scale; and stacking the frequency domain graphs obtained by Fourier transformation according to the time dimension to finally obtain the spectrogram. The frequency domain information is obtained by carrying out Fourier transform on each audio frame, the frequency domain information is obtained by rotating a frequency domain diagram formed by the frequency domain information, and the amplitude of the frequency domain diagram is represented by a gray level diagram; the time domain information is obtained by stacking frequency domain maps in a time dimension.
S2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame.
As shown in fig. 3, based on the MPEG-7 (Multimedia Content Description Interface) standard, musical instrument tone is captured characteristically from three levels of time domain (time domain of tone), frequency domain (frequency of tone waveform) and frequency-down domain (frequency of inverted tone waveform), and tone characteristic elements at three levels of each frame of each piece of music in the original data set are extracted and stored finely.
As shown in fig. 4, the procedure of extracting mel-frequency cepstral coefficients (Mel Frequency Cepstrum Coefficient, MFCC) is as follows:
2.1 Pre-emphasis
If the intensity of the data at low frequency is larger than that at high frequency, the processing is not facilitated, so that the low frequency component in the data needs to be filtered out, and the high frequency characteristic is more outstanding.
2.2 framing
The framing is to integrate N sampling points into one observation unit. The time covered per frame is set to 25ms, and since the sampling rate is 16000, the number of sample points per frame is 400. In addition, in order to avoid the excessive variation between two adjacent frames, there is a section of overlap area between two adjacent frames. The set overlap area is 15ms, so that one frame is taken every 10 ms.
2.3 windowing for each frame
Since the intra-frame signal is treated as a periodic signal during conversion, abrupt changes occur at the two end points of the frame, and the converted spectrum is very different from the original signal spectrum. Each frame is windowed so that no abrupt changes occur at both endpoints when the intra signal is fourier transformed.
2.4 zero padding for each frame
Since each frame signal is subjected to fourier transform, which requires the input data length to be a certain value, and now 400 sample points for one frame, zero padding is performed to the nearest 512 bits.
2.5 Fourier transform of frame signals
And carrying out 512-point Fourier transform on each frame of signals subjected to framing and windowing to obtain the frequency spectrum of each frame. And taking absolute value or square of the frequency spectrum of the voice signal to obtain the power spectrum of the voice signal.
2.6 Mel filtering
The 40 triangular filters are uniformly distributed on the mel spectrum, and each two filters have 50% overlapping parts. The actual frequency is first converted to mel frequency, which is at a minimum of 0Hz and at a maximum of 16000/2=8000 Hz. After conversion to the mel frequency, the mel frequency distribution of the 40 triangular filters is calculated, and then the mel frequency is converted to the actual frequency.
2.7 taking the logarithm
The output of the triangular window filter bank is logarithmically obtained, and a result similar to homomorphic transformation can be obtained.
2.8 discrete cosine transform (DCT transform)
DCT transformation is carried out on the logarithmic energy mel spectrum, and the first 13 dimensions are taken for output, thus obtaining the mel cepstrum.
2.9 normalization
All mel-cepstral normalized. Firstly, calculating the average value vector of all cepstrum vectors, and then subtracting the average value vector from each cepstrum vector to obtain the Mel frequency cepstrum coefficient output characteristic vector.
S3, inputting the existing audio feature vectors corresponding to a plurality of musical instruments and feature vectors corresponding to all audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.
As shown in fig. 5, the specific operation procedure of step S3 is as follows:
s3.1 is to assemble several instruments c= { y 1 ,y 2 ,…,y j ,…,y n A corresponding feature vector j e n; feature vectors corresponding to the audio frames are input to a pre-trained naive Bayes model;
s3.2 calculating P (y) using the output equation of the naive Bayes model 1 |X i ),P(y 2 |X i ),…,P(y n |X i );
S3.3 is expressed by the formulaObtaining the musical instrument y j Probability of appearing in the music piece X.
The output formula of the naive Bayes model is as follows:
wherein X is i A certain frame representing a piece of music X, a total of z frames; y is j Representing a certain instrument, there are n instruments in total.
From the above-described procedure, the probability that each instrument is present in the musical composition X can be found, and since the probability that the instrument is not present in the musical composition is not necessarily completely zero, it is necessary to set a threshold value for the probability, determine that the instrument is present in the musical composition if the probability that the instrument is present in the musical composition exceeds the threshold value, and determine that the instrument is not present in the musical composition if the probability that the instrument is present in the musical composition does not exceed the threshold value. It should be noted that, the threshold value needs to be determined according to specific music or general standards, and the principle of the threshold value is to ensure that not only musical instruments which are not present in music but also secondary musical instruments with relatively short occurrence time are not removed, and the threshold value can be adjusted when the model is pre-trained.
The musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each musical instrument appearing in the musical composition is obtained by a naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument. The musical instrument with the highest probability of appearing in the musical composition is the primary musical instrument, and the other musical instruments appearing in the musical composition are the secondary musical instruments. Typically, the main instruments of a musical composition are only one, but it is not excluded that some musical compositions are played predominantly by a plurality of instruments, each with a comparable probability of occurrence. Here, the plural kinds are two or more. It is necessary to judge the primary musical instrument and the secondary musical instrument according to the style of the musical composition, not to generalize, for the case where the probabilities of several musical instruments appearing in the musical composition are not so much.
As shown in fig. 6, the pretraining process of the pretrained naive bayes model is: inputting music with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the music according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the actual playing music, and if the judging result is the same, inputting the naive Bayesian model into the original naive Bayesian model to be a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.
The musical instruments used by the music needing to be identified can be classified on the basis of training to obtain the naive Bayesian model through the steps, and meanwhile, as the obtained output result is the probability value of each musical instrument used by each music, the result can be ordered according to the need, and the main musical instrument and the secondary musical instrument of one music can be distinguished.
Example two
Based on the same inventive concept, the embodiment also discloses a musical instrument type recognition system based on a naive bayes model, comprising:
the preprocessing module is used for dividing the music to be identified into a plurality of audio frames;
the feature extraction module is used for extracting time domain information, frequency domain information and mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame;
and the identification module is used for inputting the audio feature vectors corresponding to the plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. A naive bayes model-based instrument type recognition method, comprising the steps of:
s1, dividing music to be identified into a plurality of audio frames;
s2, extracting time domain information, frequency domain information and Mel frequency cepstrum coefficients in the audio frame to form a feature vector corresponding to the audio frame;
s3, inputting the audio feature vectors corresponding to a plurality of existing musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and identifying the musical instruments according to the probability that the musical instruments appear in the music;
the specific operation process of the S3 is as follows:
s3.1, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to the audio frames into a pretrained naive Bayes model;
s3.2 calculating P (y) using the output equation of the naive Bayes model 1 |X i ),P(y 2 |X i ),…,P(y n |X i );
S3.3 is expressed by the formulaObtaining the musical instrument y j Probability of occurrence in the music piece X;
the output formula of the naive Bayes model is as follows:
wherein X is i A certain frame representing a piece of music X, a total of z frames; y is j Representing a certain instrument, n instruments in total;
judging that the musical instrument is present in the music if the probability that the musical instrument is present in the music exceeds a threshold value, and judging that the musical instrument is not present in the music if the probability that the musical instrument is not present in the music does not exceed the threshold value;
the musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each of the musical instruments appearing in the musical composition is obtained by the naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument.
2. The method for identifying musical instrument types based on a naive bayes model according to claim 1, wherein the musical instrument having the highest probability of appearing in the musical composition is a primary musical instrument, and the other musical instruments appearing in the musical composition are secondary musical instruments.
3. The naive bayes model-based instrument class identification method of claim 1, wherein the pre-training process of the naive bayes model is:
inputting a musical instrument with known musical instrument types into an original naive Bayesian model, obtaining the probability of a certain musical instrument in the musical instrument according to an output formula of the naive Bayesian model, judging whether the probability exceeds a threshold value, comparing a judging result with the type of the musical instrument actually played, and if the judging result is the same, inputting the naive Bayesian model into a final output model; and if the results are different, adjusting an output formula of the naive Bayes model until the results are the same.
4. The naive bayes model based instrument class recognition method according to claim 1, wherein the frequency domain information is obtained by fourier transforming each of the audio frames, the inverse frequency domain information is obtained by rotating a frequency domain map constituted by the frequency domain information, and the amplitude of the frequency domain map is represented by a gray scale map; the time domain information is obtained by stacking the frequency domain maps in a time dimension.
5. The naive bayes model based instrument class identification method of claim 1, wherein hamming windows are added to several of said audio frames to prevent frequency leakage.
6. A naive bayes model-based instrument class recognition system, comprising:
the preprocessing module is used for dividing the music to be identified into a plurality of audio frames;
the feature extraction module is used for extracting time domain information, frequency domain information and mel frequency cepstrum coefficient in the audio frame to form a feature vector corresponding to the audio frame;
the recognition module is used for inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to all the audio frames into a naive Bayesian model, and recognizing the musical instruments according to the probability that the musical instruments appear in the music;
the specific operation process of the identification module is as follows:
s3.1, inputting the audio feature vectors corresponding to a plurality of musical instruments and the feature vectors corresponding to the audio frames into a pretrained naive Bayes model;
s3.2 calculating P (y) using the output equation of the naive Bayes model 1 |X i ),P(y 2 |X i ),…,P(y n |X i );
S3.3 is expressed by the formulaObtaining the musical instrument y j Probability of occurrence in the music piece X;
the output formula of the naive Bayes model is as follows:
wherein X is i A certain frame representing a piece of music X, a total of z frames; y is j Representing a certain instrument, n instruments in total;
judging that the musical instrument is present in the music if the probability that the musical instrument is present in the music exceeds a threshold value, and judging that the musical instrument is not present in the music if the probability that the musical instrument is not present in the music does not exceed the threshold value;
the musical instruments used in the musical composition include a primary musical instrument and a secondary musical instrument, and the probability of each of the musical instruments appearing in the musical composition is obtained by the naive bayes method model to distinguish the primary musical instrument from the secondary musical instrument.
CN202010483915.8A 2020-06-01 2020-06-01 Musical instrument type identification method and system based on naive Bayesian model Active CN111681674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010483915.8A CN111681674B (en) 2020-06-01 2020-06-01 Musical instrument type identification method and system based on naive Bayesian model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010483915.8A CN111681674B (en) 2020-06-01 2020-06-01 Musical instrument type identification method and system based on naive Bayesian model

Publications (2)

Publication Number Publication Date
CN111681674A CN111681674A (en) 2020-09-18
CN111681674B true CN111681674B (en) 2024-03-08

Family

ID=72453206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010483915.8A Active CN111681674B (en) 2020-06-01 2020-06-01 Musical instrument type identification method and system based on naive Bayesian model

Country Status (1)

Country Link
CN (1) CN111681674B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421589B (en) * 2021-06-30 2024-03-01 平安科技(深圳)有限公司 Singer identification method, singer identification device, singer identification equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10319948A (en) * 1997-05-15 1998-12-04 Nippon Telegr & Teleph Corp <Ntt> Sound source kind discriminating method of musical instrument included in musical playing
CN101546556A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Classification system for identifying audio content
CN103761965A (en) * 2014-01-09 2014-04-30 太原科技大学 Method for classifying musical instrument signals
CN105719661A (en) * 2016-01-29 2016-06-29 西安交通大学 Automatic discrimination method for playing timbre of string instrument
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN108962279A (en) * 2018-07-05 2018-12-07 平安科技(深圳)有限公司 New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7842878B2 (en) * 2007-06-20 2010-11-30 Mixed In Key, Llc System and method for predicting musical keys from an audio source representing a musical composition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10319948A (en) * 1997-05-15 1998-12-04 Nippon Telegr & Teleph Corp <Ntt> Sound source kind discriminating method of musical instrument included in musical playing
CN101546556A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Classification system for identifying audio content
CN103761965A (en) * 2014-01-09 2014-04-30 太原科技大学 Method for classifying musical instrument signals
CN105719661A (en) * 2016-01-29 2016-06-29 西安交通大学 Automatic discrimination method for playing timbre of string instrument
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN108962279A (en) * 2018-07-05 2018-12-07 平安科技(深圳)有限公司 New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data

Also Published As

Publication number Publication date
CN111681674A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
EP2659482B1 (en) Ranking representative segments in media data
Gerhard Audio signal classification: History and current techniques
US20100332222A1 (en) Intelligent classification method of vocal signal
Rocamora et al. Comparing audio descriptors for singing voice detection in music audio files
Lu Indexing and retrieval of audio: A survey
Lehner et al. Online, loudness-invariant vocal detection in mixed music signals
Zlatintsi et al. Multiscale fractal analysis of musical instrument signals with application to recognition
Hu et al. Separation of singing voice using nonnegative matrix partial co-factorization for singer identification
Fan et al. Singing voice separation and pitch extraction from monaural polyphonic audio music via DNN and adaptive pitch tracking
Park Towards automatic musical instrument timbre recognition
CN116665669A (en) Voice interaction method and system based on artificial intelligence
Yu et al. Sparse cepstral codes and power scale for instrument identification
CN115050387A (en) Multi-dimensional singing playing analysis evaluation method and system in art evaluation
Li et al. A comparative study on physical and perceptual features for deepfake audio detection
Liu et al. Content-based audio classification and retrieval using a fuzzy logic system: towards multimedia search engines
CN111681674B (en) Musical instrument type identification method and system based on naive Bayesian model
Gao et al. Vocal melody extraction via dnn-based pitch estimation and salience-based pitch refinement
Kitahara et al. Instrogram: A new musical instrument recognition technique without using onset detection nor f0 estimation
Barthet et al. Speech/music discrimination in audio podcast using structural segmentation and timbre recognition
CN114678039A (en) Singing evaluation method based on deep learning
Patil et al. Content-based audio classification and retrieval: A novel approach
CN113129923A (en) Multi-dimensional singing playing analysis evaluation method and system in art evaluation
Kos et al. Online speech/music segmentation based on the variance mean of filter bank energy
Liang et al. [Retracted] Extraction of Music Main Melody and Multi‐Pitch Estimation Method Based on Support Vector Machine in Big Data Environment
Hosain et al. Deep-Learning-Based Speech Emotion Recognition Using Synthetic Bone-Conducted Speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant