CN115910099A - Musical instrument automatic identification method based on depth probability map neural network - Google Patents

Musical instrument automatic identification method based on depth probability map neural network Download PDF

Info

Publication number
CN115910099A
CN115910099A CN202211391028.3A CN202211391028A CN115910099A CN 115910099 A CN115910099 A CN 115910099A CN 202211391028 A CN202211391028 A CN 202211391028A CN 115910099 A CN115910099 A CN 115910099A
Authority
CN
China
Prior art keywords
formula
label
musical instrument
frequency spectrum
crbm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211391028.3A
Other languages
Chinese (zh)
Other versions
CN115910099B (en
Inventor
张健
侯海薇
杜威
丁世飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202211391028.3A priority Critical patent/CN115910099B/en
Publication of CN115910099A publication Critical patent/CN115910099A/en
Application granted granted Critical
Publication of CN115910099B publication Critical patent/CN115910099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

A musical instrument automatic identification method based on a depth probability map neural network divides audio data into time slices, divides a section of audio into N time slices with fixed length, and simultaneously records a label corresponding to each time slice; converting the obtained audio data of each time slice into a Mel frequency spectrum image, and then carrying out regularization processing on the image; performing feature extraction on the obtained regularized Mel frequency spectrum image by using a convolutional neural network, mapping the extracted features from two dimensions into a one-dimensional form, and then combining labels to form a time slice Mel frequency spectrum image feature label pair; constructing an improved conditional restricted boltzmann machine model, and obtaining conditional probability distribution; training the improved CRBM model using Gibbs samples; constructing an objective function, and training an improved CRBM (CrBM) model by using the objective function to obtain an automatic musical instrument recognition model; the predicted instrument label is output. The method can effectively solve the problem that the polyphonic musical instrument is difficult to accurately identify in the prior art.

Description

Musical instrument automatic identification method based on depth probability map neural network
Technical Field
The invention belongs to the technical field of sound recognition, and particularly relates to an automatic musical instrument recognition method based on a depth probability map neural network.
Background
With the development of artificial intelligence, the intelligent music analysis method based on machine learning gradually becomes a core technology and research direction in the fields of melody recognition, music style detection and the like, wherein the automatic recognition of musical instruments in polyphonic music is a key step for realizing intelligent music analysis. In current research and applications, the mainstream approach is to combine machine learning methods to realize intelligent music analysis from the perspective of signal processing. However, for the task of identifying the musical instrument of the polyphonic music, due to the harmonics of the musical instrument, a complex signal superposition phenomenon exists in the polyphonic music in both time domain and frequency domain, and further the accuracy of the musical instrument identification is influenced, and the phenomenon is always a difficult point to be solved for identifying the polyphonic music musical instrument.
From the point of view of signal processing, instrument recognition can be regarded as a branch of audio data processing, however, unlike other audio data, vocal music and instrumental music have unique properties on harmonic energy distribution, and some scholars perform feature extraction on polyphonic music from an acoustic (and psychoacoustic) point of view to obtain a feature representation of the polyphonic music, for example: attack time, spectrum mass center, energy envelope, mel frequency spectrum cepstrum coefficient and the like, and then a corresponding traditional machine learning or deep learning method is designed to realize the musical instrument identification task. Such methods, although extracting acoustic (psychoacoustic) features manually, still have an unsatisfactory recognition effect on polyphonic music, especially if it is difficult to distinguish different instruments within the same instrument family. The reason is that harmonic distributions of musical instruments in different musical instrument families are different, but harmonic distributions of different musical instruments in the same musical instrument family are similar, and harmonic characteristics of the musical instruments cause that polyphonic music is overlapped by complex signals from time domains or frequency domains, so that signals on a certain frequency can be the fundamental frequency of a certain musical instrument and fundamental frequency and harmonic signals of other musical instruments can be simultaneously overlapped, and therefore, although the result of musical instrument identification is irrelevant to the pitch (fundamental frequency) played by the musical instrument, a musical instrument identification task based on machine learning is greatly influenced by the fundamental frequency and the harmonic signals thereof, and the harmonic distributions of different musical instruments are difficult to effectively distinguish. With the development of deep learning, learners use an excellent label of a deep neural network on an image processing task, represent polyphonic music in a time domain in a frequency domain, construct a spectrogram or a Mel spectrogram of the polyphonic music, and then use the deep neural network to complete an instrument identification task in a supervised learning mode. The deep neural network based on the frequency domain image brings great progress to the musical instrument recognition task, but the method also has some unsolved problems, firstly, the algorithm extracts the harmonic features of the musical instrument on the audio frequency of a single musical instrument, and then the harmonic features are applied to a polyphonic music test data set of multiple musical instruments for multi-label classification, and the classification result depends on whether the model can fully learn the features of the musical instrument on a training set or not; furthermore, since the test data is polyphonic music, superposition of multiple instruments on the frequency spectrum may greatly interfere with the result of instrument identification, while the neural network-based instrument identification method generally analyzes the identification problem of the instruments from the perspective of the frequency spectrum image features or generic features, thereby converting the instrument identification problem into the image identification problem, but rarely considers the similarity between instrument categories as a key feature for distinguishing instrument harmonic distribution, thereby resulting in low accuracy of instrument identification.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the automatic musical instrument identification method based on the deep probability map neural network, which has the advantages of simple steps and high identification precision and can effectively solve the problem that polyphonic musical instruments in the prior art are difficult to accurately identify.
In order to achieve the above object, the present invention provides a musical instrument automatic identification method based on a depth probability map neural network, comprising the following steps:
the method comprises the following steps: preprocessing data;
s11, time slice division is carried out on the audio data, a section of audio is divided into N time slices with fixed length, and meanwhile, a label corresponding to each time slice is recorded;
s12, converting the obtained audio data of each time slice into a Mel frequency spectrum image, then carrying out regularization processing on the image, and regularizing the value range of pixel points to obtain a regularized Mel frequency spectrum image;
step two: extracting the characteristics of the data;
performing feature extraction on the obtained regularized Mel frequency spectrum image by using a convolutional neural network to obtain image features of Mel frequency spectrum, mapping the extracted features from two dimensions into a one-dimensional form, and then combining labels to form time slice Mel frequency spectrum image feature label pairs;
step three: modeling the tag correlation characteristics by using an improved CRBM model;
s31, constructing an improved CRBM model according to an energy function provided by a formula (1);
Figure BDA0003930997260000021
where x represents the input spectral feature data, y represents the label to which x corresponds, which is also the expected output in the prediction phase, h represents the desired feature representation, s is an introduced additional variable, W y β, μ, b are training parameters;
s32, obtaining conditional joint probability distribution according to the formula (1), wherein the conditional joint probability distribution is shown in a formula (2);
Figure BDA0003930997260000031
wherein Z represents a partition function;
s33, obtaining formulas (3) and (4) based on the formula (2), and obtaining the conditional probability P (h | x, y) of h according to the formula (3); obtaining the activation probability of each component of h according to formula (4); obtaining a conditional probability formula (5) of y based on x and h according to the formulas (2) and (3);
P(h|x,y)=Π i P(h i |x,y) (3);
Figure BDA0003930997260000032
Figure BDA0003930997260000033
s34, obtaining formulas (6) and (7) based on the formulas (2) and (4), and obtaining the activation probability of each component of S according to the formula (6); obtaining an activation probability of each component of y according to formula (7);
Figure BDA0003930997260000034
Figure BDA0003930997260000035
wherein N represents a Gaussian distribution;
step four: training an improved CRBM model classified as a target by utilizing the correlation characteristics, and outputting a predicted musical instrument label;
s41, constructing an objective function according to the formula (8), and training an improved CRBM model by using the objective function;
Loss=log(p(y|x)+Rank-Loss(y|x)+σ||y||l1 (8);
in the formula, log (P (y | x)) represents a likelihood function, rank-Loss (y | x) represents a sorting Loss function, | | y | | | non-conducting phosphor l1 Representing l1 regularization, sigma being a hyperparameter;
s42, based on the formulas (3), (4), (6) and (7), using Gibbs sampling to obtain a gradient formula (9) of a likelihood function in the formula (8), calculating the gradient of the formula (8) according to the formula (9), training an improved CRBM model according to the gradient of the formula (8), and then obtaining a feature expression h containing label correlation and a conditional probability of a label through training;
Figure BDA0003930997260000041
where E represents the mathematical expectation, θ represents the set of parameters, and both mathematical expectations are obtained using Gibbs sampling according to equations (4), (6), and (7);
s43, after the improved CRBM training is completed, in the face of input of musical instrument categories needing to be predicted, calculating a label y which enables log (p (y | x) to be maximum according to a formula (10), and accordingly outputting a predicted musical instrument label according to the musical instrument automatic recognition model based on the improved CRBM to obtain a musical instrument automatic recognition model;
Figure BDA0003930997260000042
preferably, in S12 of step one, the audio data is converted into a mel-frequency spectrum image using an open source tool.
Further, in step two, a neural network ResNet101 pre-trained on the ImageNet dataset is used to extract features of the mel-frequency spectrum image. In this way, the method can effectively extract the characteristics which can be used for the musical instrument identification task by utilizing the Mel frequency spectrum mapping and matching with the ResNet101 image characteristic extraction mode, thereby further improving the identification precision by matching with a polyphonic musical instrument identification method based on the label correlation characteristics.
In the data processing section, after the polyphonic music audio is sliced by time slices, it is converted into a Mel frequency spectrum according to the time slices. In the model construction part, extracting the image characteristics of the Mel frequency spectrum of the polyphonic music by using a convolutional neural network, then modeling the correlation between the image characteristics and the musical instrument label corresponding to the Mel frequency spectrum of the time slice by using an improved Conditional Restricted Boltzmann Machine (CRBM) model, and finally training the improved CRBM model; and outputting the predicted musical instrument label based on the obtained correlation characteristics. Since a piece of polyphonic music usually has multiple instruments playing simultaneously, the problem of identification of polyphonic music is naturally a multi-tag identification problem. Meanwhile, the existing musical instrument identification method based on the neural network is as followsThe correlation between instrument categories has not been considered in identifying polyphonic music as a key feature for distinguishing instrument harmonic distributions. To this end, the invention first learns the correlation between harmonic features of an instrument and different instrument labels and generic features in polyphonic music from the two perspectives of spectral feature extraction and instrument generic features (wherein the generic feature (label specific feature) is intended to extract features in data directly related to the label from the perspective of the label), wherein the generic features are expressed in the form of conditional probabilities P (h | x, y) in an improved CRBM, and a variable pair (x, y) is introduced in the improved CRBM model to model the correlation between the image features and the labels, and the correlation between the labels, P (y | x) and P (y | x) are modeled by the activation probability of y i | x, h, s) model the association of harmonic features with different instrument tags to fully extract features that can be used for instrument recognition tasks. Therefore, the idea of multi-label learning is used for reference, and the correlation among a plurality of instrument labels in the polyphonic music is modeled from two angles of the image characteristics and the generic characteristics, so that the instruments in the polyphonic music can be identified according to the label correlation and the correlation between the labels and the frequency spectrum image characteristics, therefore, harmonic overlapping possibly occurs in which instruments can be modeled through the label correlation, the overlapping possibly occurs is associated with the corresponding frequency spectrum images through the conditional probability between the frequency spectrum image characteristics and the corresponding labels, and the instruments in the polyphonic music can be effectively distinguished and identified. The method overcomes the defect that the prior identification method only analyzes the problem from the aspect of the spectral image characteristics or the generic characteristics, thereby greatly improving the identification precision and solving the problem of instrument identification of polyphonic music.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an improved CRBM model constructed in the present invention;
fig. 3 is a structural diagram of an automatic recognition model of musical instruments in the present invention.
Detailed Description
The present invention is further described below.
As shown in fig. 1 and 3, the present invention provides an automatic musical instrument identification method based on a depth probability map neural network, comprising the following steps:
the method comprises the following steps: preprocessing data;
s11, time slice division is carried out on the audio data, a section of audio is divided into N time slices with fixed length, and meanwhile, a label corresponding to each time slice is recorded;
s12, converting the obtained audio data of each time slice into a Mel frequency spectrum image, then carrying out regularization processing on the image, and regularizing the value range of pixel points to obtain a regularized Mel frequency spectrum image;
step two: extracting the characteristics of the data;
performing feature extraction on the obtained regularized Mel frequency spectrum image by using a convolutional neural network to obtain image features of Mel frequency spectrum, mapping the extracted features from two dimensions into a one-dimensional form, and then combining labels to form time slice Mel frequency spectrum image feature label pairs;
step three: modeling the tag correlation features using an improved CRBM model, as shown in FIG. 2;
s31, constructing an improved CRBM model according to an energy function provided by a formula (1);
Figure BDA0003930997260000061
where x represents the input spectral feature data, y represents the label to which x corresponds, which is also the expected output in the prediction phase, h represents the desired feature representation, s is the introduced additional variable, W y β, μ, b are training parameters;
s32, obtaining conditional joint probability distribution according to the formula (1), wherein the conditional joint probability distribution is shown in the formula (2);
Figure BDA0003930997260000062
wherein Z represents a partition function;
s33, obtaining formulas (3) and (4) based on the formula (2), and obtaining the conditional probability P (h | x, y) of h according to the formula (3); obtaining the activation probability of each component of h according to formula (4); obtaining a conditional probability formula (5) of y based on x and h according to the formulas (2) and (3);
P(h|x,y)=Π i P(h i |x,y) (3);
Figure BDA0003930997260000063
Figure BDA0003930997260000064
/>
however, the covariance matrix of formula (5) is a non-diagonal matrix, and although formula (5) can represent the correlation between the components of tag y by the covariance matrix, formula (5) is difficult to directly sample and is therefore not suitable for training the improved CRBM model, and in order to train the model, the invention further decomposes the conditional probability of y according to S34.
S34, obtaining formulas (6) and (7) based on the formulas (2) and (4), and obtaining the activation probability of each component of S according to the formula (6); obtaining an activation probability for each component of y according to equation (7);
Figure BDA0003930997260000065
Figure BDA0003930997260000066
wherein N represents a Gaussian distribution; thus, under the action of s, the conditions of the components of y are independent, and the calculation can be carried out through Gibbs sampling.
Step four: training an improved CRBM model classified as a target by utilizing the correlation characteristics, and outputting a predicted musical instrument label;
s41, constructing an objective function according to the formula (8), and training an improved CRBM model by using the objective function;
Loss=log(p(y|x)+Rank-Loss(y|x)+σ||y|| l1 (8);
in the formula, log (p (y | x) represents a likelihood function, rank-Loss (y | x) represents a sorting Loss function, | | y | | | non-conducting phosphor l1 Representing l1 regularization, sigma being a hyperparameter;
s42, based on the formulas (3), (4), (6) and (7), obtaining a gradient formula (9) of the likelihood function in the formula (8) by using Gibbs sampling,
Figure BDA0003930997260000071
where E represents the mathematical expectation and θ represents the parameter set, both mathematical expectations can be obtained using Gibbs sampling according to equations (4), (6) and (7). Thus, the gradient of equation (8) can be calculated according to equation (9), and the improved CRBM model can be trained according to the gradient of equation (8); obtaining a feature expression h containing label correlation and the conditional probability of the label through training;
s43, after the improved CRBM training is completed, in the face of input of musical instrument categories needing to be predicted, calculating a label y which enables log (p (y | x) to be maximum according to a formula (10);
Figure BDA0003930997260000072
calculating the label y maximizing log (p (y | x) is achieved by calculating the gradient of formula (10) with respect to y, thereby obtaining an instrument automatic recognition model from the instrument automatic recognition model output predicted instrument label based on the improved CRBM.
Preferably, in S12 of step one, the audio data is converted into a mel-frequency spectrum image using an open source tool.
In order to more fully extract the features available for the musical instrument recognition task, in step two, the neural network ResNet101 pre-trained on the ImageNet dataset is used to extract the features of the mel-spectrum image. In this way, the method can effectively extract the characteristics which can be used for the musical instrument identification task by utilizing the Mel frequency spectrum mapping and matching with the ResNet101 image characteristic extraction mode, thereby further improving the identification precision by matching with a polyphonic musical instrument identification method based on the label correlation characteristics.
In the data processing section, after the polyphonic music audio is sliced by time slices, it is converted into a Mel frequency spectrum according to the time slices. In the model construction part, extracting the image characteristics of the Mel frequency spectrum of the polyphonic music by using a convolutional neural network, then modeling the correlation between the image characteristics and the musical instrument label corresponding to the Mel frequency spectrum of the time slice by using an improved Conditional Restricted Boltzmann Machine (CRBM) model, and finally training the improved CRBM model; and outputting the predicted musical instrument label based on the obtained correlation characteristics. Since a piece of polyphonic music usually has multiple instruments playing simultaneously, the problem of polyphonic music identification is naturally a multi-tag identification problem. Meanwhile, the existing instrument identification method based on the neural network does not consider the correlation among instrument categories as a key feature for distinguishing the harmonic distribution of the instruments in the process of identifying polyphonic music. To this end, the invention first learns the correlation between harmonic features of an instrument and different instrument labels and generic features in polyphonic music from the two perspectives of spectral feature extraction and instrument generic features (wherein the generic feature (label specific feature) is intended to extract features in data directly related to the label from the perspective of the label), wherein the generic features are expressed in the form of conditional probabilities P (h | x, y) in an improved CRBM, and a variable pair (x, y) is introduced in the improved CRBM model to model the correlation between the image features and the labels, and the correlation between the labels, P (y | x) and P (y | x) are modeled by the activation probability of y i | x, h, s) model the association of harmonic features with different instrument tags to fully extract features that can be used for instrument recognition tasks. Therefore, the method not only uses the thought of multi-label learning for reference, but also simultaneously models the correlation among a plurality of musical instrument labels in polyphonic music from the two aspects of image characteristics and generic characteristics, thereby identifying the musical instruments in the polyphonic music according to the label correlation and the correlation between the labels and the frequency spectrum image characteristics, and further modeling which musical instrument is in polyphonic music according to the label correlationHarmonic overlapping may occur in some musical instruments, and the overlapping that may occur is associated with the corresponding spectral image by conditional probability between the spectral image features and the corresponding tags, thereby effectively distinguishing and identifying the musical instruments in the polyphonic music. The method overcomes the defect that the prior identification method only analyzes the problem from the aspect of the spectral image characteristics or the generic characteristics, thereby greatly improving the identification precision and solving the problem of instrument identification of polyphonic music.

Claims (3)

1. An automatic musical instrument identification method based on a depth probability map neural network is characterized by comprising the following steps:
the method comprises the following steps: preprocessing data;
s11, time slice division is carried out on the audio data, a section of audio is divided into N time slices with fixed length, and meanwhile, a label corresponding to each time slice is recorded;
s12, converting the obtained audio data of each time slice into a Mel frequency spectrum image, then carrying out regularization processing on the image, and regularizing the value range of pixel points to obtain a regularized Mel frequency spectrum image;
step two: extracting the characteristics of the data;
performing feature extraction on the obtained regularized Mel frequency spectrum image by using a convolutional neural network to obtain image features of the Mel frequency spectrum, mapping the extracted features from two dimensions into a one-dimensional form, and then combining labels to form a time slice Mel frequency spectrum image feature label pair;
step three: modeling the tag correlation characteristics by using an improved CRBM model;
s31, constructing an improved CRBM model according to an energy function provided by the formula (1);
Figure FDA0003930997250000011
where x represents the input spectral feature data, y represents the label to which x corresponds, which is also the desired output in the prediction phase, and h represents the desired feature tableD, s is an additional variable introduced, W y β, μ, b are training parameters;
s32, obtaining conditional joint probability distribution according to the formula (1), wherein the conditional joint probability distribution is shown in a formula (2);
Figure FDA0003930997250000012
wherein Z represents a partition function;
s33, obtaining formulas (3) and (4) based on the formula (2), and obtaining the conditional probability P (h | x, y) of h according to the formula (3); obtaining the activation probability of each component of h according to formula (4); obtaining a conditional probability formula (5) of y based on x and h according to the formulas (2) and (3);
P(h|x,y)=∏ i P(h i |x,y) (3);
Figure FDA0003930997250000021
Figure FDA0003930997250000022
s34, obtaining formulas (6) and (7) based on the formulas (2) and (4), and obtaining the activation probability of each component of S according to the formula (6); obtaining an activation probability of each component of y according to formula (7);
Figure FDA0003930997250000023
Figure FDA0003930997250000024
wherein N represents a Gaussian distribution;
step four: training an improved CRBM model classified as a target by utilizing the correlation characteristics, and outputting a predicted musical instrument label;
s41, constructing an objective function according to the formula (8), and training an improved CRBM model by using the objective function;
Loss=log(p(y|x)+Rank-Loss(y|x)+σ||y|| l1 (8);
in the formula, log (p (y | x) represents a likelihood function, rank-Loss (y | x) represents a sorting Loss function, and | y | survival l1 Representing l1 regularization, sigma being a hyperparameter;
s42, based on the formulas (3), (4), (6) and (7), a gradient formula (9) of a likelihood function in the formula (8) is obtained by using Gibbs sampling, the gradient of the formula (8) is calculated according to the formula (9), an improved CRBM model is trained according to the gradient of the formula (8), and then the feature expression h containing the label correlation and the conditional probability of the label are obtained through training;
Figure FDA0003930997250000025
where E represents the mathematical expectation, θ represents the set of parameters, and both mathematical expectations are obtained using Gibbs sampling according to equations (4), (6), and (7);
s43, after the improved CRBM training is completed, in the face of the input of musical instrument categories needing to be predicted, calculating a label y which enables log (p (y | x) to be maximum according to a formula (10), and therefore outputting a predicted musical instrument label according to an automatic musical instrument recognition model based on the improved CRBM to obtain an automatic musical instrument recognition model;
Figure FDA0003930997250000026
2. the method for automatically identifying musical instruments based on the deep probability map neural network as claimed in claim 1, wherein in step one, S12, an open source tool is used to convert the audio data into mel-frequency spectrum images.
3. The method for automatically identifying musical instruments based on the deep probability map neural network as claimed in claim 1 or 2, wherein in the second step, a neural network ResNet101 pre-trained on ImageNet data set is used to extract the features of the Mel frequency spectrum image.
CN202211391028.3A 2022-11-08 2022-11-08 Automatic musical instrument identification method based on depth probability map neural network Active CN115910099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211391028.3A CN115910099B (en) 2022-11-08 2022-11-08 Automatic musical instrument identification method based on depth probability map neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211391028.3A CN115910099B (en) 2022-11-08 2022-11-08 Automatic musical instrument identification method based on depth probability map neural network

Publications (2)

Publication Number Publication Date
CN115910099A true CN115910099A (en) 2023-04-04
CN115910099B CN115910099B (en) 2023-08-04

Family

ID=86492715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211391028.3A Active CN115910099B (en) 2022-11-08 2022-11-08 Automatic musical instrument identification method based on depth probability map neural network

Country Status (1)

Country Link
CN (1) CN115910099B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140079297A1 (en) * 2012-09-17 2014-03-20 Saied Tadayon Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
CN106920544A (en) * 2017-03-17 2017-07-04 深圳市唯特视科技有限公司 A kind of audio recognition method based on deep neural network features training
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
CN109918535A (en) * 2019-01-18 2019-06-21 华南理工大学 Music automatic marking method based on label depth analysis
CN110909820A (en) * 2019-12-02 2020-03-24 齐鲁工业大学 Image classification method and system based on self-supervision learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140079297A1 (en) * 2012-09-17 2014-03-20 Saied Tadayon Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
CN106920544A (en) * 2017-03-17 2017-07-04 深圳市唯特视科技有限公司 A kind of audio recognition method based on deep neural network features training
CN109918535A (en) * 2019-01-18 2019-06-21 华南理工大学 Music automatic marking method based on label depth analysis
CN110909820A (en) * 2019-12-02 2020-03-24 齐鲁工业大学 Image classification method and system based on self-supervision learning

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
GRAHSM W.TAYLOR, GEOFFREY HINTON: "Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style", PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE ON MACHINE LEARNING, vol. 26, no. 06 *
XIN LI: "Conditional Restricted Boltzmann Machines for Multi-label Learning with incomplete Labels", PROCEEDINGS OF MACHINE LEARNING RESEARCH, vol. 38 *
周畅: "基于深度学习的乐器分类方法研究", 中国优秀硕士学位论文全文数据库, vol. 2019, no. 01 *
张健: "基于RBM的深度神经网络算法研究", 中国博士学位论文全文数据库(电子期刊)信息科技辑, vol. 2021, no. 01 *
张健: "基于实值RBM的深度生成网络研究", 软件学报, vol. 32, no. 12 *
王芳: "基于深度学习的音乐流派及中国传统乐器识别分类研究", 中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑, vol. 2017, no. 7 *
胡振: "基于深度学习的作曲家分类问题", 计算机研究与发展, vol. 2014, no. 51 *

Also Published As

Publication number Publication date
CN115910099B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN107680582A (en) Acoustic training model method, audio recognition method, device, equipment and medium
Lindgren et al. Speech recognition using reconstructed phase space features
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN112259105A (en) Training method of voiceprint recognition model, storage medium and computer equipment
CN112750442B (en) Crested mill population ecological system monitoring system with wavelet transformation and method thereof
CN111128236B (en) Main musical instrument identification method based on auxiliary classification deep neural network
CN112289326B (en) Noise removal method using bird identification integrated management system with noise removal function
Han et al. Sparse feature learning for instrument identification: Effects of sampling and pooling methods
CN113111786A (en) Underwater target identification method based on small sample training image convolutional network
CN110415730B (en) Music analysis data set construction method and pitch and duration extraction method based on music analysis data set construction method
Mousavi et al. Persian classical music instrument recognition (PCMIR) using a novel Persian music database
Permana et al. Implementation of constant-Q transform (CQT) and mel spectrogram to converting bird’s sound
CN116312484B (en) Cross-language domain invariant acoustic feature extraction method and system
Pikrakis et al. Unsupervised singing voice detection using dictionary learning
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
JP4219539B2 (en) Acoustic classification device
CN115910099B (en) Automatic musical instrument identification method based on depth probability map neural network
CN112735442B (en) Wetland ecology monitoring system with audio separation voiceprint recognition function and audio separation method thereof
Pawar et al. Automatic tonic (shruti) identification system for indian classical music
CN112687280B (en) Biodiversity monitoring system with frequency spectrum-time space interface
CN114678039A (en) Singing evaluation method based on deep learning
Kohlsdorf et al. Feature Learning and Automatic Segmentation for Dolphin Communication Analysis.
Guerrero-Turrubiates et al. Guitar chords classification using uncertainty measurements of frequency bins
Dodia et al. Identification of raga by machine learning with chromagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant