LU103103B1

LU103103B1 - Method for electronic music classification model construction based on machine learning and deep learning

Info

Publication number: LU103103B1
Application number: LU103103A
Authority: LU
Inventors: Yaping Tang
Original assignee: Univ Hunan Humanities Sci & Tech
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-11-30

Abstract

The invention discloses a music genre classification model with spectrogram as input, which provides a new idea of audio classification and recognition, the model is used to perform 5 classification simulation experiments on various electronic music signals, this model reduces the time required for constructing an electronic music classifier.

Description

S1XX1LU/SK23025LU 21.04.2023

METHOD FOR ELECTRONIC MUSIC CLASSIFICATION MODEL 04103103

CONSTRUCTION BASED ON MACHINE LEARNING AND DEEP

LEARNING

FIELD OF TECHNOLOGY

[0001] The present invention relates to the field of artificial intelligence, more particularly, to a method for electronic music classification model construction based on machine learning and deep learning.

BACKGROUND

[0002] With the development of information technology and storage technology, digital music has become increasingly popular, and major music companies have also shifted their product focus to digital albums. It is increasingly difficult for us to see physical albums. such as tapes and CDs. Different music has different styles, accompaniment instruments and other components, and music with different hierarchical structures and characteristics can be summarized into different genres. Because most people like listening to music, there are many kinds of electronic music, and everyone likes different types of electronic music. If the types of electronic music signals are classified and identified in advance, listeners can choose the electronic music they want to listen to from the electronic music signal labels, which can greatly improve the management level of electronic music, so the classification and identification of electronic music signals has become an important research direction in the field of artificial intelligence. Electronic music is music made by using electronic musical instruments and related technologies. The electronic musical instruments used realize music data exchange through corresponding digital interfaces, synthesizers, sequencers and computers. With the development of computer technology, more detailed and in-depth research has been conducted on computer audio-visual information processing, and artificial intelligence technology can enable computers to understand music. Apply deep learning to create feature recognition modules, perform adaptive feature fusion on electronic music, and adjust using adaptive mechanisms to facilitate feature fusion in universities. By taking on the fused feature factors through a NN, introducing a distribution structure for multi-layer perceptual classification, and utilizing the electronic music special frequency effect to construct an electronic music classification model. A single electronic music feature can provide limited electronic music information, making it difficult to accurately describe the specific content of electronic music and achieve correct classification of electronic music. 1

S1XX1LU/SK23025LU 21.04.2023

Electronic music features have many features such as short-term energy features, LU103103 time-domain features, and frequency-domain features. Through electronic music features, detailed content of electronic music can be described. Subsequently, artificial intelligence electronic music signal identification methods appeared. such as linear discriminant analysis, artificial neural network and support vector machine, which obtained better identification results than manual methods. However, in practical application, these methods have some shortcomings: the accuracy of electronic music signal identification by linear discriminant analysis is low, and it is considered that there is a fixed linear relationship between feature vectors and electronic music signal types, which cannot reflect electronic music signals.

Overall, applying traditional machine learning methods to music genre classification requires manual feature design and relies on professional knowledge and experience in audio signal analysis. The steps are cumbersome, and there are bottlenecks in improving accuracy. The use of deep learning methods, such as the popular NN in recent years, can provide new ideas for music genre classification modeling. In this invention, the classification and recognition of music genres are taken as the research direction, and one-dimensional audio files are processed by short-time Fourier transform, Mel transform and constant Q transform respectively to generate frequency spectrum and related data. Using convolutional neural network, acoustic features such as rhythm, pitch and chord in images are automatically learned and extracted. and a music genre classification model is constructed. In order to ensure the effectiveness of the design, a simulation experiment is carried out by simulating the use environment. The experimental results show that the designed electronic music classification model can be classified by feature fusion, and the classification result is very accurate.

SUMMARY

[0003] In order to address such a technical problem in the prior art, one aspect of the invention provides a method for electronic music classification model construction based on machine learning and deep learning. the method comprising:

[0004] using an interface to transmit received audio signal to an audio processing module, processing the audio signal through analog digital conversion and signal amplification. to form a music data;

[0005] constructing a music classification model with spectrogram as input based on learning and the structural characteristics of NN;

[0006] obtaining dynamic parameters of the electronic music classification model by using 2

S1XX1LU/SK23025LU 21.04.2023 the steganographic analysis algorithm of weight distribution to model the music data; LU103103

[0007] classifying modeled music data using the constructed music classification model.

[0008] According to an embodiment of the invention, a complementary processing is carried out according to the size of the differential gradient in electronic music.

[0009] According to an embodiment of the invention, adaptive docking is used for complementary processing when the characteristics of electronic audio are not obvious.

[0010] According to an embodiment of the invention, wherein a NN multilayer perceptron (MLP) is used to divide the classification process into three layers: a import layer, a classification layer, and a output layer.

[0011] According to an embodiment of the invention, wherein after various types of original ecological electronic music data is collected, the collected electronic music data is denoised.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] Fig. 1 shows a music classification flowchart provided by the embodiment of the present invention;

[0013] Fig. 2 shows a system hardware structure diagram;

[0014] Fig. 3 shows a classification framework of electronic music model;

[0015] Fig. 4 shows classification accuracy corresponding to different time periods;

DETAILED DESCRIPTION

[0016] The invention will now be described in greater detail with reference to the figures.

[0017] With respect to FIG. 1, a method for electronic music classification model construction based on machine learning and deep learning is described. the method comprising the flowing steps:

[0018] Step S101, using an interface to transmit received audio signal to an audio processing module, processing the audio signal through analog digital conversion and signal amplification, to form a music data;

[0019] Step S102, constructing a music classification model with spectrogram as input based on learning and the structural characteristics of NN;

[0020] Step S103, obtaining dynamic parameters of the electronic music classification model by using the steganographic analysis algorithm of weight distribution to model the music data;

[0021] Step S104, classifying modeled music data using the constructed music 3

S1XX1LU/SK23025LU 21.04.2023 classification model. LU103103

[0022] Furthermore, steps S101-S104 could containing the flowing content:

[0023] The adaptive multi feature fusion process of electronic music is actually a screening process. When the background rhythm frequency of electronic music undergoes drastic changes, the features of electronic sound effects change in a continuous form. The feature results obtained by single multi feature fusion are very single and cannot be classified for use.

Currently, electronic music features are typically modeled and analyzed using a single feature, and the amount of information extracted from a single feature is limited, making it difficult to fully describe the type of electronic music. Therefore, this article uses extracting multiple features for electronic music classification. Firstly, collect electronic music signals.

As electronic music signals are continuous, it is necessary to perform frame division processing on the electronic music signals, and to better extract electronic music classification features . Among which the hardware part mainly consists of an audio acquisition module. an audio processing module, a storage module, and a power module. The overall hardware structure is shown in Figure 2.

[0024] Use the interface to transmit the received signal to the audio processing module, process the audio signal through analog digital conversion, signal amplification and other processes, and store it in the memory module after processing. At the same time, transmit it to the computer through the audio device interface, and identify the electronic music through the software part.

[0025] If there is a differential gradient in electronic music, then complementary processing is carried out according to the size of the differential gradient; If the characteristics of electronic audio are not very obvious, adaptive docking is used for complementary processing. The complementary features are recorded each time, so that the features of electronic music can be expressed and integrated from different aspects. The treble and bass have obvious influence on the process of multi-feature fusion, and the fusion features presented by different audio in treble are also different. Collecting and extracting the features of electronic music effects from different levels can ensure the adequacy of the fusion method. Considering the actual function requirements of the system, the

TLV320AIC23 chip introduced by TI, also known as AIC23 chip. is selected. It is a chip that supports MIC and LINEIN input modes, and has programmable gain adjustment for audio input and output. Program-controlled gain is for users to better control and adjust audio signals and realize online parameter modification. The interaction between users and program gain function is realized by coding switch, and users can adjust the switch to the 4

S1XX1LU/SK23025LU 21.04.2023 appropriate gear according to their own needs, so that the gain ratio selected by users is more LU103103 accurate and meets more precise control requirements.

[0026] In the process of multi-layer perceptual feature classification, the NN multilayer perceptron (MLP) is used to divide the classification process into three layers: the import layer, the classification layer (one or more layers), and the output layer. The network classification framework of NNs includes neurons, which can undertake feature fusion factors and solve linear classification problems that cannot be solved in single-layer perceptual classification. It can not only classify in the form of multiple features, but also reflect multiple classification paths. The frequency cepstrum coefficient is inspired by people's auditory characteristics, and according to these characteristics, the human ear sensation is influenced by the actual changes in loudness and amplitude. Aft that spectrum of the amplitude value is logarithmized, the coefficient can be divided into several frequency bands according to frequency. As the obtain frequency vector has highly recognizable characteristics and complex correlation, in order to remove the correlation between loudness and amplitude, Fourier transform audio characteristics should be used and powershift should be used for processing. The extracted music features are processed. The basic framework of the electronic music classification model is shown in Figure 3.

[0027] As can be seen from Figure 3, when constructing the electronic music classification model, we should first collect various types of original ecological electronic music data and denoise the collected electronic music data. The denoised electronic music is detected by framing and endpoint, and effective electronic music signals are obtained.

[0028] Using this method can reduce computational complexity and improve classification speed. The specific factors in the classification layer carry out multiple classifications in the classification layer, and are allocated to different neurons according to different characteristics. Assuming that each neuron can only accept one characteristic factor, in the call process of the output layer, the weights of neurons are called, but the output is the characteristic factors carried by neurons. The NN multilayer perceptron is used for the classification of the same feature factor. When the number of imported neurons is the same as the number of output neurons, the number of output feature factors will be separated by the NN multilayer perceptron. Each neuron in the classification layer is an independent individual, but the connection path is different, which can effectively eliminate the classification process of bidirectional feature factors.

[0029] Electronic music signal identification method based on machine learning algorithm. At present, there principle of maximizing empirical risk, while support vector 5

S1XX1LU/SK23025LU 21.04.2023 machine is based on the principle of minimizing structural risk. The learning effect of NN is LU103103 obviously lower than that of support vector machine. The following is a detailed description of support vector machine.

[0030] Least squares support vector machine is a recently popular machine learning algorithm, which has a faster learning speed and better learning performance than NN.

Therefore, it was chosen to establish an electronic music signal identification model. Set the training sample set consisting of electronic music signal identification features and signal types: {x.y Dh =L2..nx,eR".y,e Rx, and 77 are the identification features and types of electronic music signals, respectively, as shown in equation (1).

[0031] /()=0'e(x)+b

[0032] Equation (1) is transformed and solved, as shown in Equation (2). min|æ]?+ 1 „x ci

[0033] 2 (2)

[0034] 5.

[0035] y,-0"@(x)+b =e, 3)

[0036] Where y represents the parameters of the least squares support vector machine.

Because the calculation process of formula (3) is very complicated, its equivalent form is established, as shown in formula (4).

Lob ga) =30 ar 3rd +

Ya lo’ ox) ~b+E-3)

[0037] = (4)

[0038] According to optimization theory, as shown in equation (5). ps.) ep Jl

[0039] ; 20° (5)

[0041] After digital filtering. the movable Hamming window is used to perform windowing and framing processing, So that the audio characteristics are always stable. The framing processing adopts the method of alternating overlapping between frames, and the alternating part is frame shift, which aims to make the transition between frames smooth and maintain continuity. After ensuring the smooth transition between frames and reducing the truncation effect of audio, the endpoint detection step is entered. Endpoint detection is the key to electronic music signal identification and has great influence on subsequent feature 6

S1XX1LU/SK23025LU 21.04.2023 extraction. Accurately find out the starting point and ending point of a single tone from the LU103103 audio with noise, suppress the noise interference of the silent segment, and reduce the amount of data and computation. and reduce the processing time. After the classifier is determined, the tones in the training set are input into it. Every time a 60-dimensional feature vector of tone data is input, the possibility of each tone calculated by the hidden layer and the output layer can be obtained. The value is between 0 and 1, and the output result is the maximum value. Compare it with the notes corresponding to the input MFCC feature to determine whether it is the same, and output the final result to complete the electronic music signal recognition.

[0042] The following experiments were conducted to verify the rationality of the intelligent classification model design for electronic music based on reasonable weight allocation. The experiment included 10 types of music, including Blue, Classical, Country.

Disco, Hiphop, Jazz, Metal, Pop, Reggae, and Rock. Each type of music contained a total of 100 pieces, and music features were extracted to obtain music fragments. This test is mainly aimed at electronic music with excessive modulation. A designed deep learning electronic music signal recognition system is used to test the decoding time of audio files. At the same time, traditional electronic music signal recognition systems are used to obtain test results for comparative analysis. The number of samples for each type of electronic music is shown in

Table 1.

Table 1 Ten sample distribution of electronic music categories

Number of electronic Electronic Music Name Sample size music types

Popular bel canto

HIP-HOP Music

Folk rhyme

Rock and roll 8 | Film musie 4 2 000000000 9 | World Music

[0043] For these 10 music types, the lowest accuracy of traditional classification method is 100, while the lowest accuracy of this classification method is 300. The highest accuracy of traditional classification method is 20, while the highest accuracy of this classification method is 200. It can be seen that the intelligent classification model of electronic music based on reasonable distribution of weights can improve the classification accuracy and have accurate classification performance. The classification of music is different in different training periods. Rock music with the highest classification accuracy in this invention is 7

S1XX1LU/SK23025LU 21.04.2023 selected as the experimental object to test whether the correct classification rate will be LU103103 affected with the increase of training period. The result is shown in Figure 4.

[0044] From Figure 4, it can be seen that rock music has the highest classification accuracy at a time of 8 seconds, and as time increases, the accuracy shows a decreasing trend. This indicates that the model does not require longer time to achieve higher accuracy in classification, nor does it require shorter time to achieve lower classification accuracy.

Instead, each moment of music segment will affect the extraction of classification information. Therefore, classification will only be accurate at a specific time. The classification output rate is an output value that can indirectly reflect the electronic music classification process. Data that has not undergone multiple feature classification processing cannot be output, and data that is not accurately classified will also be isolated and will not be output.

[0045] The electronic music signal identification system of deep learning plays a very important role in the development of electronic music. The system can not only assist the professional grade examination, but also be suitable for non-professionals to learn music.

Under the same conditions, the identification simulation experiment is carried out with the classical method. The accuracy of electronic music signal identification by machine learning algorithm is much higher than the requirements of practical application, and the signal identification error is lower than that by the classical method. According to the dynamic characteristics of music, the traditional classification method is improved, and the intelligent classification model design of electronic music based on reasonable weight distribution is proposed. The dynamic parameters of the model are obtained by using the steganographic analysis algorithm of weight distribution, and then the music is modeled, and the obtained vectors are used as the sequence model, and the classification results are obtained. Through comparative tests, it is proved that this system overcomes the shortcomings of the traditional identification system, shortens the decoding time of the system for files that are transferred too much, is suitable for application in real life, is convenient for all walks of life to learn and understand electronic music, and makes a contribution to the development of electronic music.

[0046] The embodiments of the present disclosure mean to cover all of these substitutes, modifications, and variations which fall within the scope of the appended claims. Therefore, within the spirit and principle of the present disclosure, any omission, modification, equivalent substitute, and modification should fall within the protection scope of the present disclosure. 8

Claims

S1XX1LU/SK23025LU 21.04.2023 CLAIMS LU103103

1. A method for electronic music classification model construction based on machine learning and deep learning, the method comprising: using an interface to transmit received audio signal to an audio processing module, processing the audio signal through analog digital conversion and signal amplification, to form a music data; constructing a music classification model with spectrogram as input based on learning and the structural characteristics of NN; obtaining dynamic parameters of the electronic music classification model by using the steganographic analysis algorithm of weight distribution to model the music data; classifying modeled music data using the constructed music classification model.

2. The method according to claim 1, wherein a complementary processing is carried out according to the size of the differential gradient in electronic music.

3. The method according to claim 2, wherein adaptive docking is used for complementary processing when the characteristics of electronic audio are not obvious.

4. The method according to claim 3, wherein a NN multilayer perceptron (MLP) is used to divide the classification process into three layers: a import layer, a classification layer, and a output layer.

5. The method according to claim 4, wherein after various types of original ecological electronic music data is collected, the collected electronic music data is denoised. 9