CN114627885A - Small sample data set musical instrument identification method based on ASRT algorithm - Google Patents
Small sample data set musical instrument identification method based on ASRT algorithm Download PDFInfo
- Publication number
- CN114627885A CN114627885A CN202210182234.7A CN202210182234A CN114627885A CN 114627885 A CN114627885 A CN 114627885A CN 202210182234 A CN202210182234 A CN 202210182234A CN 114627885 A CN114627885 A CN 114627885A
- Authority
- CN
- China
- Prior art keywords
- musical instrument
- audio file
- sample
- layer
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 8
- 230000004913 activation Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 5
- 238000009432 framing Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 238000005215 recombination Methods 0.000 claims description 2
- 230000006798 recombination Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 3
- 208000023514 Barrett esophagus Diseases 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/02—Preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention relates to the field of natural language processing, in particular to a small sample data set musical instrument identification method based on an ASRT algorithm. The invention uses the network configuration VGG with the best effect in image identification for reference on the model structure, has strong expression capability, can see very long history and future information, and is more excellent in robustness compared with RNN; the method can be perfectly combined with a CTC scheme at the output end to realize end-to-end training of the whole model, and the sound waveform signal is directly transcribed into the waveform of the musical instrument, so that the waveform of the musical instrument is finally judged and the predicted musical instrument type is output.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a small sample data set musical instrument identification method based on an ASRT algorithm.
Background
With the development of artificial intelligence, the emergence of Convolutional Neural Network (CNN) and connectivity time sequence classification (CTC) methods, the rapid development of deep neural networks, the application of traditional handwritten word clustering has failed to meet the needs of people, and the processing of slow artificial intelligence in natural language is more and more extensive. The recognition of musical instruments is always a less important aspect, and people need to distinguish the types of the musical instruments at some time, especially a complete piece of music, so that non-professionals and even professionals can hardly distinguish which musical instruments are used by the music. Therefore, a tool oil can be used for judging the type of the sound.
ASRT is a deep learning-based chinese speech recognition system that is implemented using tensrflow.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a small sample data set musical instrument identification method based on an ASRT algorithm, which can effectively identify the musical instrument type in audio data.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A small sample data set musical instrument identification method based on an ASRT algorithm comprises the following steps:
step 1, obtaining a sample set;
step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
and 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
Compared with the prior art, the invention has the beneficial effects that: the VGG (network configuration gateway) with the best effect in image recognition is used for reference on the model structure, has strong expression capacity, can see very long history and future information, and is more excellent in robustness compared with RNN (radio network); the method can be perfectly combined with a CTC scheme at the output end to realize end-to-end training of the whole model, and the sound waveform signal is directly transcribed into the waveform of the musical instrument, so that the waveform of the musical instrument is finally judged and the predicted musical instrument type is output.
Drawings
The invention is described in further detail below with reference to the figures and the specific embodiments.
FIG. 1 is a schematic diagram of an instrument recognition model according to the present invention;
fig. 2 is a schematic structural diagram of the instrument recognition model in the present invention during training.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention.
A small sample data set musical instrument identification method based on an ASRT algorithm comprises the following steps:
step 1, obtaining a sample set;
specifically, a plurality of audio files are obtained from the existing database, and all the audio files are converted into 1600hz and wav formats; wherein, the audio file is an audio file with only single musical instrument sound or an audio file with a plurality of musical instrument sounds;
using a converted audio file and the types of instruments contained in the audio file as a set of samples; all samples are taken as a sample set.
Step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
substep 2.1, preprocessing the sample set sample: firstly, analyzing an audio file in a sample to obtain data contained in the audio file; converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing;
substep 2.2, using the spectrogram as network input and inputting the spectrogram into the musical instrument recognition model; using the type of the instrument in the sample as a network tag; the musical instrument type contained in the audio file corresponding to the spectrogram output by the musical instrument identification model;
specifically, referring to fig. 1, the musical instrument identification model includes a convolutional layer, a pooling layer, a convolutional layer, a recombination matrix format layer, a fully-connected layer, and an Activation function Activation, which are connected in sequence;
wherein the convolution kernel size of the first convolution layer is 1 x 3 x 32, the convolution kernel size of the second convolution layer is 32 x 3 x 32, the convolution kernel size of the third convolution layer is 64 x 3 x 32, the convolution kernel size of the fourth convolution layer is 128 x 3 x 32, the convolution kernel size of the first convolution layer is 256 x 3 x 32, and the convolution kernel size of the first convolution layer is 512 x 3 x 32; after each convolution, activation is performed using the RELU activation function.
Padding is carried out once during each data input, and the size of the matrix is increased by 2 dimensions in the four directions of the upper direction, the lower direction, the left direction and the right direction, and 0 is supplemented; the pooling size pool _ size during pooling is 2, and the padding function used is the 'same' function.
The convolutional network is based on the Keras and tensrflow framework, using this deep convolutional neural network referenced to VGG as the network model.
And substep 2.3, performing iterative updating on parameters of the instrument recognition model by using the regression loss function to obtain the trained instrument recognition model.
Referring to fig. 2, during training, the instrument recognition model adds Dropout layers between the first convolutional layer and the second convolutional layer, after each pooling layer, between the Reshape layer and the first fully-connected layer, and in each fully-connected layer; the Dropout layer is used for randomly interrupting part of the neural network, and the instrument recognition model is prevented from being over-fitted during training.
And 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
Specifically, the audio file to be identified as the type of the musical instrument is preprocessed as follows: firstly, converting an audio file needing to identify the type of the musical instrument into a format of 1600hz and wav; analyzing the converted audio file to obtain data contained in the audio file; and finally, converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing.
In addition, the server software based on the HTTP is provided, an HTTP protocol basic server package of Python is used, a voice recognition API based on a network HTTP protocol is provided, and an API server can be easily set; so that the client software can send API request through network to realize the function of identifying musical instrument.
API interface based on HTTP protocol:
the project uses an http server packet built in Python to realize a basic API server based on an http protocol. By using the server program, a simple API server can be directly realized, and data interaction between a user and the server is carried out in a POST mode.
The following table is a list of POST parameters:
simulation experiment
And (3) identifying 100 pieces of audio data acquired from an external network IRMAS website by using the trained instrument identification model, and predicting the probability of the instrument type contained in each piece of audio data.
Results of the experiment
For a random one of the 100 audio data, the prediction result is: there are 0.0115% of the probability to contain flute, 21.4380% of the probability to contain gel (electric guitar), 6.0032% of the probability to contain piano, 0.0000767% of the probability to contain violin (violin), 0.0000923% of the probability to contain tremepet (trumpet), 0.7711% of the probability to contain gac (acoustic guitar), 0.0475% of the probability to contain saxophone (saxophone), 18.4390% of the probability to contain organ (organ), 0.0002497% of the probability to contain cello (cello), 53.2890% of the probability to contain voi (human voice), 0.0000048% of the probability to contain clarinet (clarinet), and 0.00000005% of the probability to contain other musical instruments.
For 100 audio data, the predicted correct rate of the only instrument being the most likely instrument is 74%;
for 100 audio data, the accuracy of predicting whether the first five instruments are the first five possible instruments is 92.5%;
for 100 audio data, the overall prediction accuracy reaches 99%, which shows that the musical instrument identification method can accurately identify musical instruments in the audio.
Although the present invention has been described in detail in this specification with reference to specific embodiments and illustrative embodiments, it will be apparent to those skilled in the art that modifications and improvements can be made thereto based on the present invention. Accordingly, it is intended that all such modifications and alterations be included within the scope of this invention as defined in the appended claims.
Claims (4)
1. A small sample data set musical instrument identification method based on an ASRT algorithm is characterized by comprising the following steps:
step 1, obtaining a sample set;
step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
and 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
2. The ASRT algorithm-based small sample dataset musical instrument recognition method according to claim 1, wherein step 1 specifically obtains a plurality of audio files from an existing database, converts all audio files to 1600hz, wav format; wherein, the audio file is an audio file with only single musical instrument sound or an audio file with a plurality of musical instrument sounds; using a converted audio file and the types of instruments contained in the audio file as a set of samples; all samples are taken as a sample set.
3. The ASRT algorithm-based small sample dataset musical instrument identification method according to claim 1, characterised in that the substep of step 2 is as follows:
substep 2.1, preprocessing the sample set sample: firstly, analyzing an audio file in a sample to obtain data contained in the audio file; converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing;
substep 2.2, using the spectrogram as network input and inputting the spectrogram into the musical instrument recognition model; using the type of the instrument in the sample as a network tag; the musical instrument type contained in the audio file corresponding to the spectrogram output by the musical instrument identification model;
and substep 2.3, performing iterative updating on parameters of the instrument recognition model by using the regression loss function to obtain the trained instrument recognition model.
4. The ASRT algorithm-based small sample data set musical instrument identification method according to claim 1, wherein the musical instrument identification model in step 2 comprises a convolutional layer, a pooling layer, a convolutional layer, a recombination matrix format layer, a full-link layer, and an Activation function Activation which are connected in sequence;
wherein the convolution kernel size of the first convolution layer is 1 x 3 x 32, the convolution kernel size of the second convolution layer is 32 x 3 x 32, the convolution kernel size of the third convolution layer is 64 x 3 x 32, the convolution kernel size of the fourth convolution layer is 128 x 3 x 32, the convolution kernel size of the first convolution layer is 256 x 3 x 32, and the convolution kernel size of the first convolution layer is 512 x 3 x 32; after each convolution, activation is performed using the RELU activation function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210182234.7A CN114627885A (en) | 2022-02-25 | 2022-02-25 | Small sample data set musical instrument identification method based on ASRT algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210182234.7A CN114627885A (en) | 2022-02-25 | 2022-02-25 | Small sample data set musical instrument identification method based on ASRT algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114627885A true CN114627885A (en) | 2022-06-14 |
Family
ID=81901000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210182234.7A Pending CN114627885A (en) | 2022-02-25 | 2022-02-25 | Small sample data set musical instrument identification method based on ASRT algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114627885A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024048492A1 (en) * | 2022-08-30 | 2024-03-07 | ヤマハ株式会社 | Musical instrument identifying method, musical instrument identifying device, and musical instrument identifying program |
-
2022
- 2022-02-25 CN CN202210182234.7A patent/CN114627885A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024048492A1 (en) * | 2022-08-30 | 2024-03-07 | ヤマハ株式会社 | Musical instrument identifying method, musical instrument identifying device, and musical instrument identifying program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021232725A1 (en) | Voice interaction-based information verification method and apparatus, and device and computer storage medium | |
CN111933129B (en) | Audio processing method, language model training method and device and computer equipment | |
CN110211565B (en) | Dialect identification method and device and computer readable storage medium | |
CN1169115C (en) | Prosodic databases holding fundamental frequency templates for use in speech synthesis | |
CN112712813B (en) | Voice processing method, device, equipment and storage medium | |
WO2019109787A1 (en) | Audio classification method and apparatus, intelligent device, and storage medium | |
CN113836277A (en) | Machine learning system for digital assistant | |
WO2020006898A1 (en) | Method and device for recognizing audio data of instrument, electronic apparatus, and storage medium | |
US20020173956A1 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
JP7266683B2 (en) | Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction | |
US11011160B1 (en) | Computerized system for transforming recorded speech into a derived expression of intent from the recorded speech | |
WO2023245389A1 (en) | Song generation method, apparatus, electronic device, and storage medium | |
WO2019137392A1 (en) | File classification processing method and apparatus, terminal, server, and storage medium | |
CN106295717A (en) | A kind of western musical instrument sorting technique based on rarefaction representation and machine learning | |
Mahanta et al. | Deep neural network for musical instrument recognition using MFCCs | |
CN114627885A (en) | Small sample data set musical instrument identification method based on ASRT algorithm | |
US20220093089A1 (en) | Model constructing method for audio recognition | |
JP2010009446A (en) | System, method and program for retrieving voice file | |
JP7376896B2 (en) | Learning device, learning method, learning program, generation device, generation method, and generation program | |
Qiu et al. | A Voice Cloning Method Based on the Improved HiFi‐GAN Model | |
CN115273826A (en) | Singing voice recognition model training method, singing voice recognition method and related device | |
CN115662465A (en) | Voice recognition algorithm and device suitable for national stringed instruments | |
CN115359775A (en) | End-to-end tone and emotion migration Chinese voice cloning method | |
CN113470612A (en) | Music data generation method, device, equipment and storage medium | |
JP7376895B2 (en) | Learning device, learning method, learning program, generation device, generation method, and generation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |