WO2020024556A1

WO2020024556A1 - Music quality evaluation method and apparatus, and computer device and storage medium

Info

Publication number: WO2020024556A1
Application number: PCT/CN2018/125449
Authority: WO
Inventors: 梅亚琦; 刘奡智; 王义文; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-08-02
Filing date: 2018-12-29
Publication date: 2020-02-06
Also published as: CN109308913A

Abstract

Provided are a music quality evaluation method and apparatus, and a computer device and a storage medium. The method comprises the following steps: acquiring audio information of music to be evaluated (S1100); converting the audio information of the music to be evaluated into a frequency spectrum by taking a frequency as a limiting condition (S1200); and inputting the frequency spectrum of the audio information of the music to be evaluated into a pre-set sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated (S1300), wherein the sound quality evaluation model is a convolutional neural network model that is pre-trained to be convergent. Audio information of music to be evaluated is converted into a frequency spectrum, and the frequency spectrum is evaluated by means of a sound quality evaluation model obtained through the training of a convolutional neural network model so as to obtain evaluation information of each piece of music, such that a user can screen music according to evaluation information, and the user is not bothered by low-quality music, and the network environment is purified.

Description

Music quality evaluation method, device, computer equipment and storage medium

This application claims priority from a Chinese patent application filed with the Chinese Patent Office on August 2, 2018, with an application number of 201810873498.0, and the invention name is "Music Quality Evaluation Method, Device, Computer Equipment, and Storage Medium", the entire contents of which are incorporated by reference Incorporated in this application.

Technical field

The embodiments of the present application relate to the field of computers, and in particular, to a method, a device, a computer device, and a storage medium for evaluating music quality.

Background technique

Digital music, as its name implies, is stored in a database in the form of digital signals, and is transmitted in the network space. It is fast and can be downloaded and deleted according to people's needs. Digital music does not rely on traditional music carriers, such as magnetic tapes or CDs, to avoid wear and tear and to ensure music quality.

In recent years, due to the development of digital music, the number of musical works has exploded, but many computer-generated and randomly generated music have also appeared. Among them, the inventor found that most of this music is atonal The beats are disordered, there are too many repetitions, the harmony is continuously discordant, the melody is confused, or there is a sudden interruption of the melody, which belongs to low-quality music.

The spread of low-quality music on the Internet will cause interference to network users and affect their online experience.

Summary of the invention

The embodiment of the present application provides a method for evaluating a frequency spectrum obtained by converting Dai Peini's national audio information by using a sound quality evaluation model.

In order to solve the above technical problem, a technical solution adopted in the embodiment created by the present application is to provide a method for evaluating music quality, which includes the following steps: acquiring audio information of the music to be evaluated; The audio information of the music to be evaluated is converted into a frequency map; the frequency map of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model A pre-trained convolutional neural network model.

In order to solve the above technical problems, an embodiment of the present application further provides a music quality evaluation device, including: an acquisition module for acquiring audio information of music to be evaluated; and a processing module for converting the music to be evaluated with frequency as a limiting condition Convert the audio information into a frequency map; an execution module configured to input the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein: The sound quality evaluation model is a convolutional neural network model trained in advance to convergence.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, The processor executes the following steps of a music quality evaluation method: acquiring audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and converting the audio of the music to be evaluated The frequency spectrum of the information is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.

In order to solve the above technical problem, an embodiment of the present application further provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute one The following steps of the music quality evaluation method are: obtaining audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and inputting the frequency map of the audio information of the music to be evaluated into The preset sound quality evaluation model obtains the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.

In the embodiment of the present application, the audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by using a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information of each piece of music. In this way, it is convenient for users Filtering music based on evaluation information, avoiding the interference of low-quality music to users, and purifying the network environment.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those skilled in the art, other drawings can be obtained based on these drawings without paying creative labor.

FIG. 1 is a schematic flowchart of a music quality evaluation method according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a method for converting audio information of music to be evaluated into a frequency map using frequency as a limiting condition according to an embodiment of the present application; FIG.

3 is a schematic flowchart of a method for training a music quality evaluation model according to an embodiment of the present application;

4 is a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient diagram of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application;

5 is a schematic flowchart of an audio playing method according to an embodiment of the present application;

6 is a schematic flowchart of another audio playing method according to an embodiment of the present application;

7 is a block diagram of a basic structure of an audio quality evaluation device according to an embodiment of the present application;

FIG. 8 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.

detailed description

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

In the description and claims of this application and some of the processes described in the above drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that these operations may not follow the order in which they appear in this document. Execution or parallel execution. The sequence numbers of operations such as 101 and 102 are only used to distinguish different operations. The sequence numbers themselves do not represent any order of execution. In addition, these processes may include more or fewer operations, and these operations may be performed sequentially or in parallel. It should be noted that the descriptions such as "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent the sequence, nor do they limit "first" and "second" Are different types.

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall into the protection scope of the present application.

Examples

Those skilled in the art can understand that the "terminal" and "terminal equipment" used herein include both wireless signal receiver devices, and only devices with wireless signal receivers that have no transmitting capability, as well as receiving and transmitting hardware. A device having receiving and transmitting hardware capable of performing two-way communication over a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, Personal Communication System), which can combine voice and data Processing, fax and / or data communication capabilities; PDA (Personal Digital Assistant), which may include radio frequency receivers, pagers, internet / intranet access, web browsers, notepads, calendars and / or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and / or palmtop computer or other device having and / or a conventional laptop and / or palmtop computer or other device including a radio frequency receiver. As used herein, "terminal", "terminal equipment" may be portable, transportable, installed in a vehicle (air, sea, and / or land), or suitable and / or configured to operate locally, and / or Operate in a distributed fashion on any other location on Earth and / or space. The "terminal" and "terminal equipment" used herein may also be communication terminals, Internet terminals, music / video playback terminals, such as PDA, MID (Mobile Internet Device), and / or have music / video playback Functional mobile phones can also be smart TVs, set-top boxes and other devices.

The client terminal in this embodiment is the terminal described above.

Specifically, please refer to FIG. 1, which is a schematic flowchart of a method for configuring an insurance product according to this embodiment.

As shown in Figure 1, the method for configuring insurance products includes the following steps:

S1100. Acquire audio information of the music to be evaluated.

The audio information of the music to be evaluated includes the audio of the music to be evaluated, which may be a digital audio file generated from a digital signal, an audio file created by a musical instrument, an audio file spread on the Internet, or an audio file extracted from a video file. Among them, the formats of various audio files are MP3, WAVE, WMA, VQF, MIDI, AIFF, MPEG, etc.

In practical applications, a method for obtaining audio information of music to be evaluated includes directly obtaining audio information of music to be evaluated from a network or a local file, or obtaining audio information of music to be evaluated by extracting an audio file from a video file.

S1200: Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;

The frequency information spectrum of the music information to be evaluated can be converted by spectrum application software, for example, PC Sound Spectrum software, FFT spectrum analysis software, and SmartLive software. In practice, in order to make the frequencies in the frequency spectrum continuous and clear, the audio of the music to be evaluated is pre-emphasized, windowed, and Fourier transformed in the process of transforming the frequency spectrum.

According to an embodiment of the present application, the frequency information is used to convert the audio information of the music to be evaluated into a Mel frequency cepstrum coefficient map. Mel frequency cepstrum coefficient graph can be obtained from the frequency spectrum transformed by the above spectrum application software.

It should be noted that the Mel-Frequency Cepstral Coefficients (MFCCs) is a map composed of the coefficients of the Mel-Frequency Cepstrum Coefficients. They are derived from cepstrum of audio clips, in which the frequency band division of the Mel frequency cepstrum is divided equally on the Mel scale, which is better than the normal log cepstrum (the frequency obtained by the application software above) The linearly spaced frequency bands in the graph) can more closely approximate the human auditory system. This kind of frequency bending (curvature of the curve in the Mel frequency cepstrum coefficient graph) can better represent the sound. Therefore, for smooth audio, the change curve of the coefficients in the Mel frequency cepstrum coefficient graph is more consistent with the human hearing system. For noise and noise, the changes in the coefficients in the Mel frequency cepstrum coefficient graph are not consistent with the human hearing system. .

S1300. Input the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated.

The sound quality evaluation model is a convolutional neural network model that is pre-trained to convergence, and may be, for example, a CNN convolutional neural network model, a VGG convolutional neural network model, and the like.

In an embodiment of the present application, when training a sound quality evaluation model, the training data used are Mel frequency cepstrum coefficient maps obtained by smooth audio conversion, and the obtained sound quality evaluation model conforms to the human auditory system and is obtained Evaluation information is more accurate. At the same time, in order to ensure accurate evaluation, the frequency spectrum of the audio information of the music to be evaluated is a Mel frequency cepstrum coefficient map.

In order to solve the problems in this application, an embodiment of the application provides a method for evaluating music quality. The audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by a sound quality evaluation model trained by a convolutional neural network model. The evaluation is performed to obtain the evaluation information of each piece of music. In this way, it is convenient for users to filter music according to the evaluation information, avoiding interference of low-quality music to users, and purifying the network environment.

In the above embodiment, in order to evaluate accurately, a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated may be used. An embodiment of the present application provides a method for converting audio information of music to be evaluated into a frequency map with a frequency as a limiting condition. As shown in FIG. 2, FIG. 2 illustrates the audio information of the music to be evaluated with a frequency as a limiting condition. Schematic diagram of the basic process of transforming into a frequency spectrum method.

As shown in FIG. 2, step S1200 includes:

S1210. Obtain the Mel frequency of the audio information of the music to be evaluated.

The audio information of the music to be evaluated is converted into a frequency spectrum by a spectrum application software, for example, PC Spectrum Sound Spectrum software, FFT spectrum analysis software, and SmaartLive software. , Framed, and windowed preprocessing, and the frequency of each frame signal in the audio of the music to be evaluated is obtained by Fourier transform. Among them, the frame value can be set according to the actual situation, preferably 32ms (milliseconds), and the windowing can be processed using a hamming window.

Calculate Mel frequency f _mel using Mel frequency conversion formula,

Where f is the logarithmic frequency. A map of the Mel frequency is obtained by calculating the Mel frequency.

S1220. Obtain a Mel frequency cepstrum according to the map of the Mel frequency;

Assuming the Mel spectrum is X [k],

X [k] = H [k] E [k]

Among them, H [k] is a Mel frequency cepstrum coefficient, and E [k] is a high frequency spectrum.

Take the logarithm of the formula X [k] to get

log X [k] = log H [k] + log E [k]

Then inverse transform by inverse discrete cosine

X [k] = H [k] + E [k]

That is, the Mel frequency cepstrum coefficient H [k],

H [k] = X [k] -E [k]

Since E [k] is a high frequency spectrum, the Mel frequency cepstrum can be obtained by using a low-pass filter, and then the Mel frequency cepstrum chart can be obtained.

S1230. Extract a Mel frequency cepstrum coefficient map from the Mel frequency cepstrum.

The change trend of cepstrum frequency is extracted from the Mel frequency cepstrum chart, thereby obtaining a Mel frequency cepstrum coefficient chart.

The method of this embodiment further includes a training method of a sound quality evaluation model. Please refer to FIG. 3 for details. FIG. 3 is a schematic flowchart of a training method of a sound quality evaluation model according to an embodiment of the present application.

As shown in Figure 3, it includes the following steps:

S1311. Obtain a training sample set;

The training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of smooth audio. In an embodiment of the present application, 6000 short audio files with a duration of 5 seconds are extracted from 2000 clear and smooth recordings as a training data source. An arbitrary number of short audio frequencies are extracted from the training data source as training data, and respective Mel frequency cepstrum coefficient maps are extracted from each audio of the training data to obtain a training sample set. For a method of extracting a respective Mel frequency cepstrum coefficient map from each audio of the training data, refer to the foregoing embodiment, and details are not described herein again.

S1312. Obtain the expected values of multiple Mel frequency cepstrum coefficient graphs by a preset convolutional neural network model;

Specifically, a method for obtaining a Mel frequency cepstrum coefficient map, that is, step S1312 includes the following steps:

Step 1: input multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in order to obtain output values of multiple Mel frequency cepstrum coefficient maps respectively;

Step 2: Sort the output values by using the numerical value as a limiting condition;

Step 3: Confirm that the output value at the middle position in the ranking result is the expected output value of multiple Mel frequency cepstrum coefficient graphs.

It should be noted that the number of Mel frequency cepstrum coefficient graphs can be customized. The more the number, the more accurate the evaluation index of the evaluation model.

S1313: Input the training sample set into the convolutional neural network model, and obtain the excitation value of the convolutional neural network model;

The Mel frequency cepstrum coefficient map of the training sample set is sequentially input into the neural network model, and the neural network model performs feature extraction on the Mel frequency cepstrum coefficient map.

It should be noted that, in this embodiment, the convolutional layer neural network includes four layers of dual convolutional layers, four layers of pooling layers, and fully connected layers. During the feature extraction process, the convolution kernel in the convolutional layer is collected from the training samples. Features are extracted to get the weight of each unit in the convolution. To make the model more accurate, the preset activation function is used to limit the range of output values. In the pooling layer, the weights extracted from the volume base layer are used to reduce the pixels to the Mel frequency cepstrum coefficient map, and in order to make the model more stable and independent of the training data, the output value of the pooling layer can be randomly discarded according to a preset discard probability. The fully connected layer is used to output the finally obtained value to the classifier, which is subjected to normalization processing in the classifier to obtain the incentive value.

In an embodiment of the present application, a Mel cepstrum map is input in the first layer of the base layer, and features are extracted using 32 filters with a receptive field of 3 * 3 and a step size of 1, and output in the first pooling layer. The preset drop probability is 0.25 to randomly drop the output value of the pooling layer. It should be noted that after the output of the pooling layer of the fourth layer, because the fully connected layer is prone to overfitting, the output value is immediately discarded in the fully connected layer according to the drop probability of 0.5, and then the fully connected layer is pooled. The remaining output values of the layer are output to the classifier.

Among them, the excitation value is the excitation data output by the convolutional neural network model according to the input Mel frequency cepstrum coefficient graph. Before the neural network model is not trained to convergence, the excitation value is a value with large dispersion. When the neural network model After being trained to convergence, the excitation value is relatively stable data.

S1314. Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold value, and when the distance between the expected value and the incentive value is greater than the first threshold value, iteratively update the convolution through an inverse algorithm repeatedly and repeatedly. The weight in the neural network model ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.

A loss function is used to determine whether the excitation value output from the fully connected layer of the neural network model is consistent with the set expected classification value. When the results are not consistent, the back-propagation algorithm needs to be used to adjust the weights in the first channel.

In some embodiments, the loss function determines whether the incentive value is consistent with the set expected value by calculating the distance (Euclidean distance or spatial distance) between the incentive value and the set expected value, and sets a first threshold value (for example, 0.05). When the distance between the incentive value and the set expected classification value is less than or equal to the first threshold, it is determined that the incentive value is consistent with the set expected value; otherwise, the incentive value is not consistent with the set expected value.

When the excitation value of the neural network model is inconsistent with the set expected value, a random gradient descent algorithm needs to be used to correct the weights in the neural network model so that the output result of the convolutional neural network model is the same as the expected result of the classification judgment information. Through repeated training and correction through several training sample sets (in some embodiments, pictures in all training sample sets are scrambled during training to increase the model's ability to rely on interference and enhance the stability of the output.) When the comparison between the output value of the neural network model and the reference information of each training sample reaches (not limited to) 99.5%, the training ends.

In order to evaluate accurately, a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of audio information of the music to be evaluated. Specifically, the embodiment of the present application provides a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated by using a sound quality evaluation model. As shown in FIG. 4, FIG. 4 shows a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application.

As shown in FIG. 4, step S1300 includes:

S1321. Obtain the output value of the sound quality evaluation model.

The Mel cepstrum coefficient map of the audio of the music to be evaluated is input into the sound quality evaluation model for calculation, and the output value of the sound quality evaluation model is obtained. Because the sound quality evaluation model is obtained by audio training with smooth speech, the output result indicates the probability of belonging to audio with smooth speech. Therefore, the larger the output value is, the smoother the speech to be evaluated is, and the higher the quality is, the smaller the output value is, the lower the audio quality of the music to be evaluated is.

S1322. Find an evaluation index having a mapping relationship with the output value in the evaluation list.

The evaluation index is an index to measure the audio quality of the music to be evaluated. It can be customized and can be expressed in letters. For example, ABCDEF indicates the quality from high to low in turn; it can also be expressed as a score. , The higher the audio quality of the music to be evaluated.

The evaluation list is a list showing the mapping relationship between the output value of the sound quality evaluation model and the evaluation index. Using the output value, the corresponding evaluation index can be found through the evaluation list.

In an application scenario of the embodiment of the present application, a user searches for a target audio in application software for music playback for playback. Because there are many versions of target audio, and there are many low-quality audios on the network that are the same as the keywords of target audio for the purpose of traffic, so after the user enters the keywords of target audio in the music player software, a large number of key words will appear. Word matching audio makes it impossible for users to choose. An embodiment of the present application provides an audio playback method. As shown in FIG. 5, FIG. 5 is a schematic flowchart of the audio playback method.

As shown in FIG. 5, after step S1300, the method further includes:

S1331. Obtain a playback instruction.

Play instruction The instruction for the user to play the audio to be played. The playback instruction can be triggered by clicking the audio to be played.

S1332. Obtain the evaluation index of the audio to be played according to the playback instruction, and compare it with a preset index threshold;

After the terminal obtains the playback instruction, it obtains the quality index of the audio to be played according to the playback instruction. It should be noted that the quality index may be pre-stored in the information of each audio to be played, and the quality index may be directly retrieved after obtaining the playback instruction; or the terminal may use the sound quality evaluation model to perform the audio to be played according to the acquired playback instruction in real time. Evaluation to get quality index.

S1333. When the evaluation index of the audio to be played is greater than or equal to the index threshold, play the audio to be played.

The terminal sets an index threshold for audio playback in advance. For example, the audio can only be played when the quality index of the audio is greater than 95 points. The terminal compares the quality index of the audio to be played with the index threshold, and plays the audio to be played when the index is greater than the index threshold. In this way, the terminal filters the audio quality in the application software through the sound quality evaluation model, on the one hand, it can improve the user's listening experience, On the other hand, it saves time for user selection.

An embodiment of the present application. The embodiment of the present application provides another audio playback method. As shown in FIG. 6, FIG. 6 is a schematic flowchart of the audio playback method.

As shown in FIG. 6, after step S1332, the method further includes:

S1334. When the evaluation index of the audio of the music to be evaluated is less than the index threshold, search for a preset database of audio information matching the keywords according to the keywords of the audio to be played;

S1335: Display audio information.

When audio information is displayed, it can be displayed in descending order of quality index. To facilitate user selection and further improve the user experience.

In order to solve the above technical problems, an embodiment of the present application further provides a music quality evaluation device. For details, please refer to FIG. 7, which is a block diagram of the basic structure of the music quality evaluation device of this embodiment.

As shown in FIG. 7, a music quality evaluation device includes an acquisition module 2100, a processing module 2200, and an execution module 2300. Wherein, an obtaining module is used to obtain audio information of the music to be evaluated; a processing module is used to convert audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; an executing module is used to convert the music to be evaluated The frequency spectrum of the audio information is input into a preset sound quality evaluation model to obtain the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.

The music quality evaluation device converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users Filtering music based on evaluation information, avoiding the interference of low-quality music to users, and purifying the network environment.

In some embodiments, the processing module in the music quality evaluation device includes: a first acquisition sub-module for acquiring a Mel frequency of the audio information of the music to be evaluated; a first processing sub-module for receiving Obtain a Mel frequency cepstrum by using a map of the Mer frequency; a first execution submodule is configured to extract a Mel frequency cepstrum coefficient map from the Mel frequency cepstrum.

In some embodiments, the execution module specifically includes: a second acquisition sub-module for acquiring an output value of the sound quality evaluation model; and a second execution sub-module for finding in the evaluation list that the output value has The evaluation index of the mapping relationship.

In some embodiments, when the user searches for the target audio, the music quality evaluation device further includes: a third acquisition sub-module for acquiring a playback instruction; and a second processing sub-module for acquiring a to-be-played according to the playback instruction The evaluation index of the audio is compared with a preset index threshold; a third execution sub-module is configured to play the audio to be played when the evaluation index of the audio to be played is greater than or equal to the index threshold.

In some implementation manners, the playback instruction includes: a keyword of the audio to be played; the music quality evaluation device further includes: a third processing sub-module for when the evaluation index of the audio of the music to be evaluated is smaller than the evaluation index At the exponential threshold, the audio information matching the keywords is searched in a preset database according to the keywords of the audio to be played; a fourth execution sub-module is configured to display the audio information.

In some implementations, the music quality evaluation device further includes: a fourth acquisition submodule, configured to acquire a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality A fourth processing sub-module for obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps from the preset convolutional neural network model; a fifth processing sub-module for inputting the training sample set To the convolutional neural network model to obtain an excitation value of the convolutional neural network model; a fifth execution submodule for comparing whether a distance between the expected value and the excitation value is less than or equal to a preset And when the distance between the expected value and the stimulus value is greater than the first threshold value, the weights in the convolutional neural network model are updated by a reverse algorithm through repeated loop iterations, to the The process ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.

In some implementations, the fourth processing sub-module specifically includes: a sixth acquisition sub-module for sequentially inputting the plurality of Mel frequency cepstrum coefficient graphs into a preset convolutional neural network model, and respectively acquiring Output values of the plurality of Mel frequency cepstrum coefficient graphs; a sixth processing sub-module for sorting the output values with a numerical value as a limiting condition; a sixth execution sub-module for confirming that the ranking results are in the middle The output value of the position is the expected output value of the multiple Mel frequency cepstrum coefficient maps.

In order to solve the above technical problems, embodiments of the present application further provide computer equipment. For details, please refer to FIG. 8, which is a block diagram of the basic structure of the computer device of this embodiment.

As shown in FIG. 8, a schematic diagram of the internal structure of the computer equipment. As shown in FIG. 8, the computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor may implement a An insurance product configuration method. The processor of the computer equipment is used to provide computing and control capabilities to support the operation of the entire computer equipment. The memory of the computer device may store computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor may cause the processor to perform a method for evaluating music quality. The network interface of the computer equipment is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.

In this embodiment, the processor is configured to execute the specific content of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 7, and the memory stores program codes and various types of data required to execute the modules. The network interface is used for data transmission to user terminals or servers. The memory in this embodiment stores the program code and data required for executing all the sub-modules in the insurance product configuration method, and the server can call the program code and data of the server to perform the functions of all the sub-modules.

The computer equipment converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users to evaluate Information to filter music, avoid the interference of low quality music to users, and purify the network environment.

The present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors cause the music quality evaluation according to any one of the foregoing embodiments to be performed. Method steps.

A person of ordinary skill in the art may understand that all or part of the processes in the methods of the foregoing embodiments may be implemented by using a computer program to instruct related hardware. The computer program may be stored in a computer-readable storage medium. When executed, the processes of the embodiments of the methods described above may be included. The foregoing storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (Random Access Memory, RAM).

It should be understood that although the steps in the flowchart of the drawings are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited, and they can be performed in other orders. Moreover, at least a part of the steps in the flowchart of the drawing may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution order is also It is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

The above description is only part of the implementation of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and retouches can be made. These improvements and retouches also It should be regarded as the protection scope of this application.

Claims

A music quality evaluation method includes the following steps:

Obtain audio information of the music to be evaluated;

Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;

Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
The method for evaluating music quality according to claim 1, wherein the converting the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition specifically includes:

Acquiring a Mel frequency of the audio information of the music to be evaluated;

Obtaining cepstrum of Mel frequency according to the map of Mel frequency;

A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
The music quality evaluation method according to claim 1, wherein the frequency map of audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, specifically include:

Obtaining an output value of the sound quality evaluation model;

Find an evaluation index having a mapping relationship with the output value in the evaluation list.
According to the music quality evaluation method according to claim 1, when a user searches for a target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the After the audio information evaluation information, it also includes:

Obtain playback instructions;

Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;

When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
The music quality evaluation method according to claim 4, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining the evaluation index of the audio to be played according to the playback instruction, and comparing with a preset index threshold, include:

When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;

Displaying the audio information.
The music quality evaluation method according to any one of claims 1 to 4, the training method of the sound quality evaluation model comprises:

Acquiring a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality;

Obtaining an expected value of the plurality of Mel frequency cepstrum coefficient maps by using the preset convolutional neural network model;

Input the training sample set into the convolutional neural network model, and obtain an excitation value of the convolutional neural network model;

Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold, and when the distance between the expected value and the incentive value is greater than the first threshold, repeat the loop and iteration The weighting in the convolutional neural network model is updated by an inverse algorithm, and ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
The method for evaluating music quality according to claim 6, wherein the obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps by the preset convolutional neural network model specifically comprises:

Inputting the multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in turn, and respectively obtaining output values of the multiple Mel frequency cepstrum coefficient maps;

Sort the output values with a numerical value as a limiting condition;

It is confirmed that the output value in the middle position in the ranking result is an expected output value of the multiple Mel frequency cepstrum coefficient graphs.
A music quality evaluation device includes:

An acquisition module for acquiring audio information of the music to be evaluated;

A processing module, configured to convert the audio information into a frequency map with a frequency as a limiting condition;

An execution module configured to input a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is pre-trained Convergent neural network model to convergence.
A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to execute a method of evaluating music quality. Said steps:

Obtain audio information of the music to be evaluated;

Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;

Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
The computer device according to claim 9, wherein the converting the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition specifically comprises:

Acquiring a Mel frequency of the audio information of the music to be evaluated;

Obtaining cepstrum of Mel frequency according to the map of Mel frequency;

A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
The computer device according to claim 9, wherein the inputting the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain the evaluation information of the audio information of the music to be evaluated specifically comprises:

Obtaining an output value of the sound quality evaluation model;

Find an evaluation index having a mapping relationship with the output value in the evaluation list.
The computer device according to claim 9, when the user searches for the target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the audio information of the music to be evaluated After the evaluation information, it also includes:

Obtain playback instructions;

Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;

When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
The computer device according to claim 12, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining the evaluation index of the audio to be played according to the playback instruction and comparing it with a preset index threshold, further comprising:

When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;

Displaying the audio information.
The computer device according to any one of claims 9 to 12, wherein the method for training the sound quality evaluation model comprises:

Acquiring a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality;

Obtaining an expected value of the plurality of Mel frequency cepstrum coefficient maps by using the preset convolutional neural network model;

Input the training sample set into the convolutional neural network model, and obtain an excitation value of the convolutional neural network model;

Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold, and when the distance between the expected value and the incentive value is greater than the first threshold, repeat the loop and iteration The weighting in the convolutional neural network model is updated by an inverse algorithm, and ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
The computer device according to claim 14, wherein the obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps by the preset convolutional neural network model specifically comprises:

Inputting the multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in turn, and respectively obtaining output values of the multiple Mel frequency cepstrum coefficient maps;

Sort the output values with a numerical value as a limiting condition;

It is confirmed that the output value in the middle position in the ranking result is an expected output value of the multiple Mel frequency cepstrum coefficient graphs.
A non-volatile storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to perform the following steps of a method for evaluating music quality :

Obtain audio information of the music to be evaluated;

Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;

Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
The non-volatile storage medium according to claim 16, wherein the converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition, specifically comprising:

Acquiring a Mel frequency of the audio information of the music to be evaluated;

Obtaining cepstrum of Mel frequency according to the map of Mel frequency;

A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
The non-volatile storage medium according to claim 16, wherein the frequency map of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated , Including:

Obtaining an output value of the sound quality evaluation model;

Find an evaluation index having a mapping relationship with the output value in the evaluation list.
The non-volatile storage medium according to claim 16, when the user searches for a target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the to-be-evaluated After the evaluation information of the audio information of the music, it also includes:

Obtain playback instructions;

Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;

When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
The non-volatile storage medium according to claim 19, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing with a preset index threshold value ,Also includes:

When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;

Displaying the audio information.