CN118364134A - Music genre classification method, device, equipment and storage medium - Google Patents

Music genre classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN118364134A
CN118364134A CN202211707416.8A CN202211707416A CN118364134A CN 118364134 A CN118364134 A CN 118364134A CN 202211707416 A CN202211707416 A CN 202211707416A CN 118364134 A CN118364134 A CN 118364134A
Authority
CN
China
Prior art keywords
music
classification
model
information
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211707416.8A
Other languages
Chinese (zh)
Inventor
蒋刚
肖鑫龙
王宁
宋华东
孟国防
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tianhao Photoelectric Co ltd
Original Assignee
Shenzhen Tianhao Photoelectric Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tianhao Photoelectric Co ltd filed Critical Shenzhen Tianhao Photoelectric Co ltd
Priority to CN202211707416.8A priority Critical patent/CN118364134A/en
Publication of CN118364134A publication Critical patent/CN118364134A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of intelligent classification, and discloses a music genre classification method, device, equipment and storage medium. The method comprises the following steps: acquiring music file information of prerecorded music; training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information; performing parameter optimization according to the music characteristic information to obtain optimal training parameters; modifying the initial model according to the optimal training parameters to obtain a target classification model; and inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model. By the method, the music genre is modeled and classified in an end-to-end classification mode, music features are extracted and classified into one, the complexity of classification is reduced, and the efficiency is improved.

Description

Music genre classification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of intelligent classification technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classifying music genres.
Background
With the development of music multimedia and internet, millions of music are played by users on the network, and the classification of the music genre can bring better use experience to the users, so that the users can choose to play the music according to the genre types favored by the users. But there are similarities between different genre types of music, which are difficult to distinguish. Professionals often use their own auditory and visual experience to determine the type of music, and this approach is relatively intense and inefficient in their personal subjective awareness; the music characteristic is manually extracted by manually extracting the characteristics of timbre, tone, rhythm, lyrics, vowels and the like of the music, and then the characteristic data is input into a support vector machine for classification.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a music genre classification method, device, equipment and storage medium, and aims to solve the technical problems that the existing classification method in the prior art needs to manually extract music characteristics, and is high in complexity and low in classification accuracy.
To achieve the above object, the present invention provides a music genre classification method comprising the steps of:
Acquiring music file information of prerecorded music;
training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information;
Performing parameter optimization according to the music characteristic information to obtain optimal training parameters;
Modifying the initial model according to the optimal training parameters to obtain a target classification model;
And inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model.
Optionally, the training is performed for multiple times through an initial model according to the music file information to obtain multiple groups of music feature information, including:
preprocessing the music file information to obtain music time domain data;
And inputting the music time domain data into a convolution layer of an initial model for training for a plurality of times to obtain a plurality of groups of music characteristic information.
Optionally, the inputting the music time domain data to a convolution layer of an initial model for training for multiple times to obtain multiple sets of music feature information includes:
Determining tag information according to the music file information;
Determining tag type data according to the music file information and the tag information;
and inputting the tag type data into a convolution layer of an initial model for multiple times of training to obtain multiple groups of music characteristic information, wherein the first layer convolution of the convolution layer is replaced by a target large convolution kernel, and a target activation function is added after the convolution layer.
Optionally, the performing parameter optimization according to the music feature information to obtain an optimal training parameter includes:
inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction, so as to obtain a plurality of groups of screening music characteristic information with redundant characteristics removed;
Determining target screening music characteristics corresponding to the last pooling layer according to the screening music characteristic information;
inputting the target screening music characteristics into a full-connection layer of the initial model to obtain a plurality of classification results;
and carrying out parameter optimization according to the classification result to obtain the optimal training parameters.
Optionally, the inputting the music feature information to the pooling layer of the initial model for data dimension reduction, obtaining multiple sets of screening music feature information for removing redundant features, including:
Inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction to obtain the maximum value in each convolution operation, wherein the pooling layer is set as the maximum pooling;
And determining a plurality of groups of screening music characteristic information according to the maximum value in each convolution operation.
Optionally, the performing parameter optimization according to the classification result to obtain an optimal training parameter includes:
determining training precision information corresponding to a plurality of training times according to the classification result;
And determining training parameters corresponding to the model with the highest classification precision as optimal training parameters according to the training precision information.
Optionally, after the received music to be classified is input to the target classification model to classify the genre of the music according to the result output by the target classification model, the method further includes:
when a classification error instruction is received, error music information of classification errors is obtained;
and retraining and outputting the target classification model according to the error music information so as to enhance the generalization capability of the target classification model.
In addition, to achieve the above object, the present invention also provides a music genre classification apparatus, including:
the information acquisition module is used for acquiring music file information of prerecorded music;
The model training module is used for training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information;
the parameter optimization module is used for performing parameter optimization according to the music characteristic information to obtain optimal training parameters;
the model optimization module is used for modifying the initial model according to the optimal training parameters to obtain a target classification model;
and the model classification module is used for inputting the received music to be classified into the target classification model so as to classify the music genre according to the output result of the target classification model.
In addition, to achieve the above object, the present invention also proposes a music genre classification apparatus including: a memory, a processor, and a music genre classification program stored on the memory and executable on the processor, the music genre classification program configured to implement the steps of the music genre classification method as described above.
In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a music genre classification program which, when executed by a processor, implements the steps of the music genre classification method as described above.
The method comprises the steps of obtaining music file information of prerecorded music; training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information; performing parameter optimization according to the music characteristic information to obtain optimal training parameters; modifying the initial model according to the optimal training parameters to obtain a target classification model; and inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model. By the method, music characteristic information is obtained through multiple times of training of prerecorded music files, optimal parameters are selected according to the music characteristic information obtained through multiple times of training, a final target classification model is obtained through training, automatic feature selection is achieved for training and classification, and music genre is classified automatically. The music genre is modeled and classified in an end-to-end classification mode, music features are extracted and classified into one, the complexity of classification is reduced, and the efficiency is improved.
Drawings
Fig. 1 is a schematic structural diagram of a music genre classification apparatus of a hardware running environment according to an embodiment of the present invention;
FIG. 2 is a flowchart of a music genre classification method according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating music time domain data collection according to an embodiment of the present invention;
FIG. 4 is a first-layer convolution diagram of an embodiment of a music genre classification method according to the present invention;
FIG. 5 is a statistical diagram of the accuracy of music genre prediction according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a music genre classification method according to a second embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating the back propagation of convolutional nerves according to an embodiment of the music genre classification method of the present invention;
FIG. 8 is a diagram showing model accuracy statistics in one embodiment of a music genre classification method according to the present invention;
FIG. 9 is a flowchart of a music genre classification method according to an embodiment of the present invention;
fig. 10 is a block diagram illustrating a first embodiment of a music genre classification apparatus according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a music genre classification device in a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the music genre classification apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the music genre classification apparatus, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a music genre classification program may be included in the memory 1005 as one storage medium.
In the music genre classification apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the music genre classification apparatus of the present invention may be provided in the music genre classification apparatus, which calls the music genre classification program stored in the memory 1005 through the processor 1001 and performs the music genre classification method provided by the embodiment of the present invention.
An embodiment of the invention provides a music genre classification method, referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a music genre classification method according to the invention.
In this embodiment, the music genre classification method includes the following steps:
step S10: music file information of prerecorded music is acquired.
It should be noted that, the execution body of the embodiment is an intelligent terminal, which may be a smart phone, a computer device, a tablet computer, or a server, and is mainly a device capable of implementing a music genre classification method, which is not limited in this embodiment.
It should be understood that, at present, analysis and classification of genres or styles of music depend on hearing of professionals or classification is performed by manually extracting features, but the two existing methods have subjective factor influence and complicated problems caused by manually extracting features. The music genre is modeled and classified in an end-to-end classification mode, music features are extracted and classified into one, the complexity of classification is reduced, and the efficiency is improved.
In a specific implementation, the music file information refers to file related information of prerecorded music, and the prerecorded music can be prerecorded music or songs, or can be any type of song file that is prerecorded by a user.
Step S20: and training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information.
It should be noted that, firstly, preprocessing is performed according to the music file information to obtain the music time domain data, then the music time domain data is input into the initial model for training for a plurality of times, the training times are times greater than 10 times, and finally, a plurality of groups of music characteristic information can be obtained. The initial model refers to a preset convolutional neural network model, and each group of music characteristic information corresponds to a model training result.
Further, in order to obtain the music feature information, step S20 includes: preprocessing the music file information to obtain music time domain data; and inputting the music time domain data into a convolution layer of an initial model for training for a plurality of times to obtain a plurality of groups of music characteristic information.
It should be understood that preprocessing the music file information to obtain music time domain data refers to: firstly, carrying out data preprocessing on recorded music or imported music files in Python to obtain a time domain data graph, manufacturing training and predicting sample data, wherein the sampling interval of samples is 30 seconds between 5 seconds and 35 seconds, the sample length is 4410, and the sampling time of each sample is 100ms. In order to enhance the usage amount of the data set, as shown in fig. 3, the time domain data is subjected to overlapped sampling, and the step length of each sample movement is 100 data, so that the music time domain data is finally obtained.
In a specific implementation, inputting the music time domain data into a convolution layer of an initial model for training for multiple times, and obtaining multiple groups of music characteristic information refers to: firstly, determining a label of music file information, then determining a label type, and then inputting the label type and the music file into a convolution layer of an initial model together for training for a plurality of times, so that music characteristic information can be obtained.
By the method, the method comprises the steps of preprocessing the music file information to obtain time domain data, and inputting the time domain data into the function model for training, so that the music characteristic information can be obtained.
Further, in order to accurately obtain the music feature information, the step of inputting the music time domain data to a convolution layer of an initial model to perform multiple training to obtain multiple groups of music feature information includes: determining tag information according to the music file information; determining tag type data according to the music file information and the tag information; and inputting the tag type data into a convolution layer of an initial model for multiple times of training to obtain multiple groups of music characteristic information, wherein the first layer convolution of the convolution layer is replaced by a target large convolution kernel, and a target activation function is added after the convolution layer.
It should be noted that, first, a label of a music type given to each piece of music in advance is determined as label information according to the music file information, and then the label information and the music file information are integrated as label type data to be input to a convolution layer of an initial model for training.
It should be understood that the prepared music genre type data (label type data) is labeled and input to the convolution layer of the convolution neural network of the initial model to extract the music characteristics, the convolution kernel size of the first-layer convolution, that is, the target large convolution kernel is 64×1 as shown in fig. 4, and the first-layer large convolution kernel is selected to expand the convolution receptive field to extract more music genre information.
In a specific implementation, adding the target activation function after the convolution layer refers to: the music features obtained after convolution are linear, have simple features and are difficult to apply to complex music genre classification, so that a nonlinear activation function ReLU function needs to be added behind a convolution layer.
By the method, a deep network structure adopting a convolutional neural network is realized, and the classification precision of the model is improved.
Step S30: and carrying out parameter optimization according to the music characteristic information to obtain optimal training parameters.
It should be noted that, the parameter optimization refers to inputting the music feature information into the pooling layer of the initial model, screening and obtaining the classification result, and finally selecting the optimal training parameter according to the classification result of multiple training.
Step S40: and modifying the initial model according to the optimal training parameters to obtain a target classification model.
It should be understood that when the optimal training parameters are obtained, the initial model is modified by the optimal training parameters, and the modified model is the target classification model.
Step S50: and inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model.
In a specific implementation, inputting music to be predicted in a genre (music to be classified) into a trained convolutional neural network model (target classification model), and outputting a result by using a classifier; and the prediction result is displayed by the classification probability of each music genre, and each sample is subjected to a model to obtain the corresponding music genre with the highest output probability. When classifying target music, the scheme of the embodiment can output various music genre classification results with probability of 1 by only inputting a small section of sample data into a trained model, and as shown in fig. 5, the accuracy of the music genre prediction results is basically above 90%.
Further, in order to continuously optimize the accuracy and precision of the model, after step S50, the method further includes: when a classification error instruction is received, error music information of classification errors is obtained; and retraining and outputting the target classification model according to the error music information so as to enhance the generalization capability of the target classification model.
The classification error instruction is an instruction that the user feeds back the classification error music genre when listening to the music or searching in the music genre classification list. When a classification error instruction is received, firstly, files and related information of music or songs with wrong classification fed back by a user are obtained and used as error music information.
It should be understood that retraining and outputting the object classification model based on the erroneous music information refers to: adding the music with wrong classification into model training, and then outputting classification results again, and carrying out model optimization and modification according to the results fed back by the user so as to improve the generalization capability of the target classification model.
By the method, a user feedback mechanism is added behind the classification result, so that the precision performance of the model is continuously improved, and better experience is brought to a user.
The embodiment obtains the music file information of the prerecorded music; training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information; performing parameter optimization according to the music characteristic information to obtain optimal training parameters; modifying the initial model according to the optimal training parameters to obtain a target classification model; and inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model. By the method, music characteristic information is obtained through multiple times of training of prerecorded music files, optimal parameters are selected according to the music characteristic information obtained through multiple times of training, a final target classification model is obtained through training, automatic feature selection is achieved for training and classification, and music genre is classified automatically. The music genre is modeled and classified in an end-to-end classification mode, music features are extracted and classified into one, the complexity of classification is reduced, and the efficiency is improved.
Referring to fig. 6, fig. 6 is a flowchart illustrating a music genre classification method according to a second embodiment of the present invention.
Based on the above-described first embodiment, the music genre classification method of the present embodiment includes, at step S30:
step S301: and inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction, so as to obtain a plurality of groups of screening music characteristic information with redundant characteristics removed.
It should be noted that, inputting the music feature information into the pooling layer of the initial model to perform data dimension reduction, and obtaining multiple groups of screening music feature information with redundant features removed refers to: and respectively inputting a plurality of groups of music characteristic information into a pooling layer of the initial model to realize data dimension reduction, and then taking the maximum value in each convolution operation, so that redundant characteristics can be removed, and screening music characteristic information is obtained.
Further, in order to perform data dimension reduction, step S301 includes: inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction to obtain the maximum value in each convolution operation, wherein the pooling layer is set as the maximum pooling; and determining a plurality of groups of screening music characteristic information according to the maximum value in each convolution operation.
It should be understood that the music features after convolution and activation functions are subjected to data dimension reduction through a pooling layer, so that feature data quantity and redundant information are reduced, wherein the pooling layer selects maximum pooling, and the maximum value in each convolution operation is selected.
In a specific implementation, after the maximum value in each convolution operation is obtained, training is performed for each time corresponding to each maximum value, so that multiple groups of screening music feature information are determined.
By the method, the data dimension reduction is accurately realized through the pooling layer, the data quantity and redundant information are reduced, and the operation precision is improved.
Step S302: and determining target screening music characteristics corresponding to the last pooling layer according to the screening music characteristic information.
After the screening music feature information is determined, the data after the last layer of pooling is paved and unfolded through the full-connection layer, and the hidden layers are connected, wherein the number of neurons of the hidden layers is 200, and then the neurons of the hidden layers are input into a ReLU activation function, so that the target screening music feature is obtained.
Step S303: and inputting the target screening music characteristics into a full-connection layer of the initial model to obtain a plurality of classification results.
It should be understood that, as shown in the forward propagation and network structure of the convolutional neural network in fig. 4, in the music genre classification method based on the convolutional neural network, a 4-layer network structure is used, a maximum pooling layer is followed by each convolutional layer, and finally, after the music features are paved and unfolded through a full connection layer, the music genre type is output through a Softmax classifier. The first layer structure of the convolutional neural network has the convolutional kernel size of 64 x 1, the convolutional kernel number of 16, the step length of 8*1, the convolutional kernel size of the maximum pooling layer of 2*1, the convolutional kernel number of 16 and the step length of 2*1; the second layer structure of the convolutional neural network has the convolutional kernel size of 4*1, the number of convolutional kernels of 32, the step length of 2*1, the convolutional kernel size of the maximum pooling layer of 2*1, the number of convolutional kernels of 32 and the step length of 2*1; the third layer structure of the convolutional neural network has the convolutional kernel size of 4*1, the number of the convolutional kernels of 64, the step length of 2*1, the maximum pooling layer convolutional kernel size of 2*1, the number of the convolutional kernels of 64 and the step length of 2*1; a fourth layer structure of the convolutional neural network, wherein the convolutional kernel size of the convolutional layer is 4*1, the number of the convolutional kernels is 128, and the step length is 2*1; after 4 layers of convolution pooling, the obtained music features are randomly discarded by 30%, and the model is prevented from being overfitted. The objective function of the convolutional neural network is the preset precision, and when the objective function is reached, training is finished, and a result is output; if the value of the objective function is not reached, the weights and biases of the model training are reacquired by back propagation. The back propagation process is shown in fig. 7, the derivative of the weights and the offsets is solved from the fully connected layer, and the reassignment training is performed by solving the derivative values of the weights and the offsets of the pooling layer and the convolution layer. The objective function adopts an Adam optimization algorithm to solve the optimal weight and bias, and the algorithm formula is as follows:
θ*=arg minθL(f(xi;θ))
Wherein L (), f () is the objective function value and the output value; θ is all parameters of the convolutional neural network; θ * is the optimal parameter of the convolutional neural network; x i is the input to the convolutional neural network. After iteration for a plurality of times, the optimal super-parameters are obtained by iteration for 30 times, and finally, the classification result is output by using a Softmax classifier.
Step S304: and carrying out parameter optimization according to the classification result to obtain the optimal training parameters.
It should be noted that, after the classification results of multiple training are obtained, the classification results are ranked accurately, so that the optimal training parameters are selected from the classification results.
Further, in order to select the optimal training parameters, step S304 includes: determining training precision information corresponding to a plurality of training times according to the classification result; and determining training parameters corresponding to the model with the highest classification precision as optimal training parameters according to the training precision information.
It should be understood that the training accuracy information is determined by the classification result of each training time, and since the initial hyper-parameters of the convolutional neural network are random, training is repeated at least 10 times, as shown in fig. 8, the average accuracy of the model is above 95%, and the model with the best classification effect is taken as the final model. Therefore, firstly, the training precision information of each training is determined, and then the training parameters of the model with the highest precision are selected as the optimal training parameters, so that the finally obtained target classification model is optimized.
It should be noted that, the complete flowchart of the embodiment is shown in fig. 9, and by this flowchart, various music genre classification results with probability of 1 can be output only by inputting a small section of sample data into the trained model, so that feature extraction and genre classification can be automatically performed on unprocessed music files.
According to the embodiment, the music characteristic information is input into a pooling layer of the initial model to perform data dimension reduction, so that multiple groups of screening music characteristic information with redundant characteristics removed are obtained; determining target screening music characteristics corresponding to the last pooling layer according to the screening music characteristic information; inputting the target screening music characteristics into a full-connection layer of the initial model to obtain a plurality of classification results; and carrying out parameter optimization according to the classification result to obtain the optimal training parameters. By the method, the complexity of classification is reduced, the efficiency is improved, and the classification precision of the model is improved by adopting a deep network structure of the convolutional neural network.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a music genre classification program, and the music genre classification program realizes the steps of the music genre classification method when being executed by a processor.
The storage medium adopts all the technical solutions of all the embodiments, so that the storage medium has at least all the beneficial effects brought by the technical solutions of the embodiments, and is not described in detail herein.
Referring to fig. 10, fig. 10 is a block diagram showing the construction of a first embodiment of the music genre classification apparatus according to the present invention.
As shown in fig. 10, a music genre classification device according to an embodiment of the present invention includes:
the information acquisition module 10 is used for acquiring the music file information of the prerecorded music.
The model training module 20 is configured to perform multiple training through an initial model according to the music file information, so as to obtain multiple sets of music feature information.
And the parameter optimization module 30 is used for performing parameter optimization according to the music characteristic information to obtain optimal training parameters.
The model optimization module 40 is configured to modify the initial model according to the optimal training parameters to obtain a target classification model.
The model classification module 50 is configured to input the received music to be classified into the target classification model, so as to classify the genre of the music according to the result output by the target classification model.
The embodiment obtains the music file information of the prerecorded music; training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information; performing parameter optimization according to the music characteristic information to obtain optimal training parameters; modifying the initial model according to the optimal training parameters to obtain a target classification model; and inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model. By the method, music characteristic information is obtained through multiple times of training of prerecorded music files, optimal parameters are selected according to the music characteristic information obtained through multiple times of training, a final target classification model is obtained through training, automatic feature selection is achieved for training and classification, and music genre is classified automatically. The music genre is modeled and classified in an end-to-end classification mode, music features are extracted and classified into one, the complexity of classification is reduced, and the efficiency is improved.
In an embodiment, the model training module 20 is further configured to preprocess the music file information to obtain music time domain data; and inputting the music time domain data into a convolution layer of an initial model for training for a plurality of times to obtain a plurality of groups of music characteristic information.
In one embodiment, the model training module 20 is further configured to determine tag information according to the music file information; determining tag type data according to the music file information and the tag information; and inputting the tag type data into a convolution layer of an initial model for multiple times of training to obtain multiple groups of music characteristic information, wherein the first layer convolution of the convolution layer is replaced by a target large convolution kernel, and a target activation function is added after the convolution layer.
In an embodiment, the parameter optimization module 30 is further configured to input the music feature information to a pooling layer of the initial model for data dimension reduction, so as to obtain multiple sets of screening music feature information with redundant features removed; determining target screening music characteristics corresponding to the last pooling layer according to the screening music characteristic information; inputting the target screening music characteristics into a full-connection layer of the initial model to obtain a plurality of classification results; and carrying out parameter optimization according to the classification result to obtain the optimal training parameters.
In an embodiment, the parameter optimization module 30 is further configured to input the music feature information to a pooling layer of the initial model for performing data dimension reduction to obtain a maximum value in each convolution operation, where the pooling layer is set to be the maximum pooling; and determining a plurality of groups of screening music characteristic information according to the maximum value in each convolution operation.
In an embodiment, the parameter optimization module 30 is further configured to determine training precision information corresponding to a plurality of training times according to the classification result; and determining training parameters corresponding to the model with the highest classification precision as optimal training parameters according to the training precision information.
In one embodiment, the model classification module 50 is further configured to obtain erroneous music information of a classification error when a classification error instruction is received; and retraining and outputting the target classification model according to the error music information so as to enhance the generalization capability of the target classification model.
It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.
It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.
In addition, technical details not described in detail in the present embodiment may refer to the music genre classification method provided in any embodiment of the present invention, and are not described herein.
Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A music genre classification method, characterized in that the music genre classification method comprises:
Acquiring music file information of prerecorded music;
training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information;
Performing parameter optimization according to the music characteristic information to obtain optimal training parameters;
Modifying the initial model according to the optimal training parameters to obtain a target classification model;
And inputting the received music to be classified into the target classification model to classify the music genre according to the result output by the target classification model.
2. The method of claim 1, wherein the training for multiple times through an initial model based on the music file information to obtain multiple sets of music feature information comprises:
preprocessing the music file information to obtain music time domain data;
And inputting the music time domain data into a convolution layer of an initial model for training for a plurality of times to obtain a plurality of groups of music characteristic information.
3. The method of claim 2, wherein the inputting the music time domain data into the convolution layer of the initial model for training multiple times to obtain multiple sets of music feature information comprises:
Determining tag information according to the music file information;
Determining tag type data according to the music file information and the tag information;
and inputting the tag type data into a convolution layer of an initial model for multiple times of training to obtain multiple groups of music characteristic information, wherein the first layer convolution of the convolution layer is replaced by a target large convolution kernel, and a target activation function is added after the convolution layer.
4. The method of claim 1, wherein the performing parameter optimization according to the music feature information to obtain the optimal training parameters comprises:
inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction, so as to obtain a plurality of groups of screening music characteristic information with redundant characteristics removed;
Determining target screening music characteristics corresponding to the last pooling layer according to the screening music characteristic information;
inputting the target screening music characteristics into a full-connection layer of the initial model to obtain a plurality of classification results;
and carrying out parameter optimization according to the classification result to obtain the optimal training parameters.
5. The method of claim 4, wherein inputting the music feature information to a pooling layer of the initial model for data dimension reduction results in multiple sets of filtered music feature information with redundant features removed, comprising:
Inputting the music characteristic information into a pooling layer of the initial model for data dimension reduction to obtain the maximum value in each convolution operation, wherein the pooling layer is set as the maximum pooling;
And determining a plurality of groups of screening music characteristic information according to the maximum value in each convolution operation.
6. The method of claim 4, wherein the performing parameter optimization based on the classification result to obtain the optimal training parameters comprises:
determining training precision information corresponding to a plurality of training times according to the classification result;
And determining training parameters corresponding to the model with the highest classification precision as optimal training parameters according to the training precision information.
7. The method according to any one of claims 1 to 6, wherein after inputting the received music to be classified into the target classification model to classify the genre of music according to the result output from the target classification model, further comprising:
when a classification error instruction is received, error music information of classification errors is obtained;
and retraining and outputting the target classification model according to the error music information so as to enhance the generalization capability of the target classification model.
8. A music genre classification apparatus, characterized in that the music genre classification apparatus comprises:
the information acquisition module is used for acquiring music file information of prerecorded music;
The model training module is used for training for multiple times through an initial model according to the music file information to obtain multiple groups of music characteristic information;
the parameter optimization module is used for performing parameter optimization according to the music characteristic information to obtain optimal training parameters;
the model optimization module is used for modifying the initial model according to the optimal training parameters to obtain a target classification model;
and the model classification module is used for inputting the received music to be classified into the target classification model so as to classify the music genre according to the output result of the target classification model.
9. A music genre classification apparatus, the apparatus comprising: a memory, a processor, and a music genre classification program stored on the memory and executable on the processor, the music genre classification program configured to implement the music genre classification method of any of claims 1 to 7.
10. A storage medium having stored thereon a music genre classification program which, when executed by a processor, implements the music genre classification method according to any one of claims 1 to 7.
CN202211707416.8A 2022-12-27 2022-12-27 Music genre classification method, device, equipment and storage medium Pending CN118364134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211707416.8A CN118364134A (en) 2022-12-27 2022-12-27 Music genre classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211707416.8A CN118364134A (en) 2022-12-27 2022-12-27 Music genre classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118364134A true CN118364134A (en) 2024-07-19

Family

ID=91886451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211707416.8A Pending CN118364134A (en) 2022-12-27 2022-12-27 Music genre classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118364134A (en)

Similar Documents

Publication Publication Date Title
Costa et al. An evaluation of convolutional neural networks for music classification using spectrograms
Pirgazi et al. An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets
CN111027714A (en) Artificial intelligence-based object recommendation model training method, recommendation method and device
CN108460082B (en) Recommendation method and device and electronic equipment
CN113903346A (en) Sound range balancing method, device and system based on deep learning
CN109800413A (en) Recognition methods, device, equipment and the readable storage medium storing program for executing of media event
CN1637744A (en) Machine-learned approach to determining document relevance for search over large electronic collections of documents
CN110610193A (en) Method and device for processing labeled data
CN109902823B (en) Model training method and device based on generation countermeasure network
CN109918539A (en) A kind of mutual search method of sound, video for clicking behavior based on user
CN111242310A (en) Feature validity evaluation method and device, electronic equipment and storage medium
EP2115737B1 (en) Method and system to improve automated emotional recognition
WO2019237354A1 (en) Method and apparatus for computerized matching based on emotional profile
CN115240778A (en) Synthetic lethal gene partner recommendation method, device, terminal and medium based on comparative learning
CN115422443A (en) Recipe recommendation method, terminal device and storage medium
JP4143234B2 (en) Document classification apparatus, document classification method, and storage medium
CN110070891A (en) A kind of song recognition method, apparatus and storage medium
CN111444383B (en) Audio data processing method and device and computer readable storage medium
CN111460215B (en) Audio data processing method and device, computer equipment and storage medium
CN110717064B (en) Personalized audio playlist generation method and device and readable storage medium
CN111368131A (en) User relationship identification method and device, electronic equipment and storage medium
Nakariyakul Suboptimal branch and bound algorithms for feature subset selection: A comparative study
CN118364134A (en) Music genre classification method, device, equipment and storage medium
CN116501950A (en) Recall model optimization method, recall model optimization device, recall model optimization equipment and recall model storage medium
CN110659382B (en) Mixed music recommendation method based on heterogeneous information network representation learning technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination