CN114627885A - Small sample data set musical instrument identification method based on ASRT algorithm - Google Patents

Small sample data set musical instrument identification method based on ASRT algorithm Download PDF

Info

Publication number
CN114627885A
CN114627885A CN202210182234.7A CN202210182234A CN114627885A CN 114627885 A CN114627885 A CN 114627885A CN 202210182234 A CN202210182234 A CN 202210182234A CN 114627885 A CN114627885 A CN 114627885A
Authority
CN
China
Prior art keywords
musical instrument
audio file
sample
layer
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210182234.7A
Other languages
Chinese (zh)
Inventor
王树龙
刘钰
薛慧敏
赵银峰
马兰
孙承坤
陈树鹏
刘红侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210182234.7A priority Critical patent/CN114627885A/en
Publication of CN114627885A publication Critical patent/CN114627885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention relates to the field of natural language processing, in particular to a small sample data set musical instrument identification method based on an ASRT algorithm. The invention uses the network configuration VGG with the best effect in image identification for reference on the model structure, has strong expression capability, can see very long history and future information, and is more excellent in robustness compared with RNN; the method can be perfectly combined with a CTC scheme at the output end to realize end-to-end training of the whole model, and the sound waveform signal is directly transcribed into the waveform of the musical instrument, so that the waveform of the musical instrument is finally judged and the predicted musical instrument type is output.

Description

Small sample data set musical instrument identification method based on ASRT algorithm
Technical Field
The invention relates to the field of natural language processing, in particular to a small sample data set musical instrument identification method based on an ASRT algorithm.
Background
With the development of artificial intelligence, the emergence of Convolutional Neural Network (CNN) and connectivity time sequence classification (CTC) methods, the rapid development of deep neural networks, the application of traditional handwritten word clustering has failed to meet the needs of people, and the processing of slow artificial intelligence in natural language is more and more extensive. The recognition of musical instruments is always a less important aspect, and people need to distinguish the types of the musical instruments at some time, especially a complete piece of music, so that non-professionals and even professionals can hardly distinguish which musical instruments are used by the music. Therefore, a tool oil can be used for judging the type of the sound.
ASRT is a deep learning-based chinese speech recognition system that is implemented using tensrflow.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a small sample data set musical instrument identification method based on an ASRT algorithm, which can effectively identify the musical instrument type in audio data.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A small sample data set musical instrument identification method based on an ASRT algorithm comprises the following steps:
step 1, obtaining a sample set;
step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
and 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
Compared with the prior art, the invention has the beneficial effects that: the VGG (network configuration gateway) with the best effect in image recognition is used for reference on the model structure, has strong expression capacity, can see very long history and future information, and is more excellent in robustness compared with RNN (radio network); the method can be perfectly combined with a CTC scheme at the output end to realize end-to-end training of the whole model, and the sound waveform signal is directly transcribed into the waveform of the musical instrument, so that the waveform of the musical instrument is finally judged and the predicted musical instrument type is output.
Drawings
The invention is described in further detail below with reference to the figures and the specific embodiments.
FIG. 1 is a schematic diagram of an instrument recognition model according to the present invention;
fig. 2 is a schematic structural diagram of the instrument recognition model in the present invention during training.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention.
A small sample data set musical instrument identification method based on an ASRT algorithm comprises the following steps:
step 1, obtaining a sample set;
specifically, a plurality of audio files are obtained from the existing database, and all the audio files are converted into 1600hz and wav formats; wherein, the audio file is an audio file with only single musical instrument sound or an audio file with a plurality of musical instrument sounds;
using a converted audio file and the types of instruments contained in the audio file as a set of samples; all samples are taken as a sample set.
Step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
substep 2.1, preprocessing the sample set sample: firstly, analyzing an audio file in a sample to obtain data contained in the audio file; converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing;
substep 2.2, using the spectrogram as network input and inputting the spectrogram into the musical instrument recognition model; using the type of the instrument in the sample as a network tag; the musical instrument type contained in the audio file corresponding to the spectrogram output by the musical instrument identification model;
specifically, referring to fig. 1, the musical instrument identification model includes a convolutional layer, a pooling layer, a convolutional layer, a recombination matrix format layer, a fully-connected layer, and an Activation function Activation, which are connected in sequence;
wherein the convolution kernel size of the first convolution layer is 1 x 3 x 32, the convolution kernel size of the second convolution layer is 32 x 3 x 32, the convolution kernel size of the third convolution layer is 64 x 3 x 32, the convolution kernel size of the fourth convolution layer is 128 x 3 x 32, the convolution kernel size of the first convolution layer is 256 x 3 x 32, and the convolution kernel size of the first convolution layer is 512 x 3 x 32; after each convolution, activation is performed using the RELU activation function.
Padding is carried out once during each data input, and the size of the matrix is increased by 2 dimensions in the four directions of the upper direction, the lower direction, the left direction and the right direction, and 0 is supplemented; the pooling size pool _ size during pooling is 2, and the padding function used is the 'same' function.
The convolutional network is based on the Keras and tensrflow framework, using this deep convolutional neural network referenced to VGG as the network model.
And substep 2.3, performing iterative updating on parameters of the instrument recognition model by using the regression loss function to obtain the trained instrument recognition model.
Referring to fig. 2, during training, the instrument recognition model adds Dropout layers between the first convolutional layer and the second convolutional layer, after each pooling layer, between the Reshape layer and the first fully-connected layer, and in each fully-connected layer; the Dropout layer is used for randomly interrupting part of the neural network, and the instrument recognition model is prevented from being over-fitted during training.
And 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
Specifically, the audio file to be identified as the type of the musical instrument is preprocessed as follows: firstly, converting an audio file needing to identify the type of the musical instrument into a format of 1600hz and wav; analyzing the converted audio file to obtain data contained in the audio file; and finally, converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing.
In addition, the server software based on the HTTP is provided, an HTTP protocol basic server package of Python is used, a voice recognition API based on a network HTTP protocol is provided, and an API server can be easily set; so that the client software can send API request through network to realize the function of identifying musical instrument.
API interface based on HTTP protocol:
the project uses an http server packet built in Python to realize a basic API server based on an http protocol. By using the server program, a simple API server can be directly realized, and data interaction between a user and the server is carried out in a POST mode.
The following table is a list of POST parameters:
Figure BDA0003521584040000041
simulation experiment
And (3) identifying 100 pieces of audio data acquired from an external network IRMAS website by using the trained instrument identification model, and predicting the probability of the instrument type contained in each piece of audio data.
Results of the experiment
For a random one of the 100 audio data, the prediction result is: there are 0.0115% of the probability to contain flute, 21.4380% of the probability to contain gel (electric guitar), 6.0032% of the probability to contain piano, 0.0000767% of the probability to contain violin (violin), 0.0000923% of the probability to contain tremepet (trumpet), 0.7711% of the probability to contain gac (acoustic guitar), 0.0475% of the probability to contain saxophone (saxophone), 18.4390% of the probability to contain organ (organ), 0.0002497% of the probability to contain cello (cello), 53.2890% of the probability to contain voi (human voice), 0.0000048% of the probability to contain clarinet (clarinet), and 0.00000005% of the probability to contain other musical instruments.
For 100 audio data, the predicted correct rate of the only instrument being the most likely instrument is 74%;
for 100 audio data, the accuracy of predicting whether the first five instruments are the first five possible instruments is 92.5%;
for 100 audio data, the overall prediction accuracy reaches 99%, which shows that the musical instrument identification method can accurately identify musical instruments in the audio.
Although the present invention has been described in detail in this specification with reference to specific embodiments and illustrative embodiments, it will be apparent to those skilled in the art that modifications and improvements can be made thereto based on the present invention. Accordingly, it is intended that all such modifications and alterations be included within the scope of this invention as defined in the appended claims.

Claims (4)

1. A small sample data set musical instrument identification method based on an ASRT algorithm is characterized by comprising the following steps:
step 1, obtaining a sample set;
step 2, preprocessing the sample set sample, and training the musical instrument recognition model by using the preprocessed sample set sample to obtain a trained musical instrument recognition model;
and 3, preprocessing the audio file needing to identify the type of the musical instrument, inputting the preprocessed audio file into the trained musical instrument identification model, and obtaining the type of the musical instrument contained in the audio file.
2. The ASRT algorithm-based small sample dataset musical instrument recognition method according to claim 1, wherein step 1 specifically obtains a plurality of audio files from an existing database, converts all audio files to 1600hz, wav format; wherein, the audio file is an audio file with only single musical instrument sound or an audio file with a plurality of musical instrument sounds; using a converted audio file and the types of instruments contained in the audio file as a set of samples; all samples are taken as a sample set.
3. The ASRT algorithm-based small sample dataset musical instrument identification method according to claim 1, characterised in that the substep of step 2 is as follows:
substep 2.1, preprocessing the sample set sample: firstly, analyzing an audio file in a sample to obtain data contained in the audio file; converting the data into a two-dimensional frequency spectrum image, namely a spectrogram, through operations such as framing and windowing;
substep 2.2, using the spectrogram as network input and inputting the spectrogram into the musical instrument recognition model; using the type of the instrument in the sample as a network tag; the musical instrument type contained in the audio file corresponding to the spectrogram output by the musical instrument identification model;
and substep 2.3, performing iterative updating on parameters of the instrument recognition model by using the regression loss function to obtain the trained instrument recognition model.
4. The ASRT algorithm-based small sample data set musical instrument identification method according to claim 1, wherein the musical instrument identification model in step 2 comprises a convolutional layer, a pooling layer, a convolutional layer, a recombination matrix format layer, a full-link layer, and an Activation function Activation which are connected in sequence;
wherein the convolution kernel size of the first convolution layer is 1 x 3 x 32, the convolution kernel size of the second convolution layer is 32 x 3 x 32, the convolution kernel size of the third convolution layer is 64 x 3 x 32, the convolution kernel size of the fourth convolution layer is 128 x 3 x 32, the convolution kernel size of the first convolution layer is 256 x 3 x 32, and the convolution kernel size of the first convolution layer is 512 x 3 x 32; after each convolution, activation is performed using the RELU activation function.
CN202210182234.7A 2022-02-25 2022-02-25 Small sample data set musical instrument identification method based on ASRT algorithm Pending CN114627885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210182234.7A CN114627885A (en) 2022-02-25 2022-02-25 Small sample data set musical instrument identification method based on ASRT algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210182234.7A CN114627885A (en) 2022-02-25 2022-02-25 Small sample data set musical instrument identification method based on ASRT algorithm

Publications (1)

Publication Number Publication Date
CN114627885A true CN114627885A (en) 2022-06-14

Family

ID=81901000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210182234.7A Pending CN114627885A (en) 2022-02-25 2022-02-25 Small sample data set musical instrument identification method based on ASRT algorithm

Country Status (1)

Country Link
CN (1) CN114627885A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024048492A1 (en) * 2022-08-30 2024-03-07 ヤマハ株式会社 Musical instrument identifying method, musical instrument identifying device, and musical instrument identifying program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024048492A1 (en) * 2022-08-30 2024-03-07 ヤマハ株式会社 Musical instrument identifying method, musical instrument identifying device, and musical instrument identifying program

Similar Documents

Publication Publication Date Title
WO2021232725A1 (en) Voice interaction-based information verification method and apparatus, and device and computer storage medium
CN111933129B (en) Audio processing method, language model training method and device and computer equipment
CN110211565B (en) Dialect identification method and device and computer readable storage medium
CN1169115C (en) Prosodic databases holding fundamental frequency templates for use in speech synthesis
CN112712813B (en) Voice processing method, device, equipment and storage medium
WO2019109787A1 (en) Audio classification method and apparatus, intelligent device, and storage medium
CN113836277A (en) Machine learning system for digital assistant
WO2020006898A1 (en) Method and device for recognizing audio data of instrument, electronic apparatus, and storage medium
US20020173956A1 (en) Method and system for speech recognition using phonetically similar word alternatives
JP7266683B2 (en) Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction
US11011160B1 (en) Computerized system for transforming recorded speech into a derived expression of intent from the recorded speech
WO2023245389A1 (en) Song generation method, apparatus, electronic device, and storage medium
WO2019137392A1 (en) File classification processing method and apparatus, terminal, server, and storage medium
CN106295717A (en) A kind of western musical instrument sorting technique based on rarefaction representation and machine learning
Mahanta et al. Deep neural network for musical instrument recognition using MFCCs
CN114627885A (en) Small sample data set musical instrument identification method based on ASRT algorithm
US20220093089A1 (en) Model constructing method for audio recognition
JP2010009446A (en) System, method and program for retrieving voice file
JP7376896B2 (en) Learning device, learning method, learning program, generation device, generation method, and generation program
Qiu et al. A Voice Cloning Method Based on the Improved HiFi‐GAN Model
CN115273826A (en) Singing voice recognition model training method, singing voice recognition method and related device
CN115662465A (en) Voice recognition algorithm and device suitable for national stringed instruments
CN115359775A (en) End-to-end tone and emotion migration Chinese voice cloning method
CN113470612A (en) Music data generation method, device, equipment and storage medium
JP7376895B2 (en) Learning device, learning method, learning program, generation device, generation method, and generation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination