WO2020024556A1 - Music quality evaluation method and apparatus, and computer device and storage medium - Google Patents

Music quality evaluation method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2020024556A1
WO2020024556A1 PCT/CN2018/125449 CN2018125449W WO2020024556A1 WO 2020024556 A1 WO2020024556 A1 WO 2020024556A1 CN 2018125449 W CN2018125449 W CN 2018125449W WO 2020024556 A1 WO2020024556 A1 WO 2020024556A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
audio
evaluated
audio information
frequency
Prior art date
Application number
PCT/CN2018/125449
Other languages
French (fr)
Chinese (zh)
Inventor
梅亚琦
刘奡智
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020024556A1 publication Critical patent/WO2020024556A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Definitions

  • the embodiments of the present application relate to the field of computers, and in particular, to a method, a device, a computer device, and a storage medium for evaluating music quality.
  • Digital music as its name implies, is stored in a database in the form of digital signals, and is transmitted in the network space. It is fast and can be downloaded and deleted according to people's needs. Digital music does not rely on traditional music carriers, such as magnetic tapes or CDs, to avoid wear and tear and to ensure music quality.
  • the embodiment of the present application provides a method for evaluating a frequency spectrum obtained by converting Dai Peini's national audio information by using a sound quality evaluation model.
  • a technical solution adopted in the embodiment created by the present application is to provide a method for evaluating music quality, which includes the following steps: acquiring audio information of the music to be evaluated; The audio information of the music to be evaluated is converted into a frequency map; the frequency map of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model A pre-trained convolutional neural network model.
  • an embodiment of the present application further provides a music quality evaluation device, including: an acquisition module for acquiring audio information of music to be evaluated; and a processing module for converting the music to be evaluated with frequency as a limiting condition Convert the audio information into a frequency map; an execution module configured to input the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein:
  • the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
  • an embodiment of the present application further provides a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the following steps of a music quality evaluation method: acquiring audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and converting the audio of the music to be evaluated
  • the frequency spectrum of the information is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
  • an embodiment of the present application further provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions execute one
  • the one or more processors execute one
  • the following steps of the music quality evaluation method are: obtaining audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and inputting the frequency map of the audio information of the music to be evaluated into
  • the preset sound quality evaluation model obtains the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
  • the audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by using a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information of each piece of music.
  • a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information of each piece of music.
  • FIG. 1 is a schematic flowchart of a music quality evaluation method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for converting audio information of music to be evaluated into a frequency map using frequency as a limiting condition according to an embodiment of the present application;
  • FIG. 3 is a schematic flowchart of a method for training a music quality evaluation model according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient diagram of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application;
  • FIG. 5 is a schematic flowchart of an audio playing method according to an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another audio playing method according to an embodiment of the present application.
  • FIG. 7 is a block diagram of a basic structure of an audio quality evaluation device according to an embodiment of the present application.
  • FIG. 8 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.
  • terminal and terminal equipment include both wireless signal receiver devices, and only devices with wireless signal receivers that have no transmitting capability, as well as receiving and transmitting hardware.
  • Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, Personal Communication System), which can combine voice and data Processing, fax and / or data communication capabilities; PDA (Personal Digital Assistant), which may include radio frequency receivers, pagers, internet / intranet access, web browsers, notepads, calendars and / or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and / or palmtop computer or other device having and / or a conventional laptop and / or palmtop computer or other device including a radio frequency receiver.
  • GPS Global Positioning System
  • terminal may be portable, transportable, installed in a vehicle (air, sea, and / or land), or suitable and / or configured to operate locally, and / or Operate in a distributed fashion on any other location on Earth and / or space.
  • the "terminal” and “terminal equipment” used herein may also be communication terminals, Internet terminals, music / video playback terminals, such as PDA, MID (Mobile Internet Device), and / or have music / video playback
  • Functional mobile phones can also be smart TVs, set-top boxes and other devices.
  • the client terminal in this embodiment is the terminal described above.
  • FIG. 1 is a schematic flowchart of a method for configuring an insurance product according to this embodiment.
  • the method for configuring insurance products includes the following steps:
  • the audio information of the music to be evaluated includes the audio of the music to be evaluated, which may be a digital audio file generated from a digital signal, an audio file created by a musical instrument, an audio file spread on the Internet, or an audio file extracted from a video file.
  • the formats of various audio files are MP3, WAVE, WMA, VQF, MIDI, AIFF, MPEG, etc.
  • a method for obtaining audio information of music to be evaluated includes directly obtaining audio information of music to be evaluated from a network or a local file, or obtaining audio information of music to be evaluated by extracting an audio file from a video file.
  • S1200 Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition
  • the frequency information spectrum of the music information to be evaluated can be converted by spectrum application software, for example, PC Sound Spectrum software, FFT spectrum analysis software, and SmartLive software.
  • spectrum application software for example, PC Sound Spectrum software, FFT spectrum analysis software, and SmartLive software.
  • the audio of the music to be evaluated is pre-emphasized, windowed, and Fourier transformed in the process of transforming the frequency spectrum.
  • the frequency information is used to convert the audio information of the music to be evaluated into a Mel frequency cepstrum coefficient map.
  • Mel frequency cepstrum coefficient graph can be obtained from the frequency spectrum transformed by the above spectrum application software.
  • the Mel-Frequency Cepstral Coefficients is a map composed of the coefficients of the Mel-Frequency Cepstrum Coefficients. They are derived from cepstrum of audio clips, in which the frequency band division of the Mel frequency cepstrum is divided equally on the Mel scale, which is better than the normal log cepstrum (the frequency obtained by the application software above)
  • the linearly spaced frequency bands in the graph can more closely approximate the human auditory system.
  • This kind of frequency bending curve of the curve in the Mel frequency cepstrum coefficient graph
  • the change curve of the coefficients in the Mel frequency cepstrum coefficient graph is more consistent with the human hearing system.
  • the changes in the coefficients in the Mel frequency cepstrum coefficient graph are not consistent with the human hearing system. .
  • the sound quality evaluation model is a convolutional neural network model that is pre-trained to convergence, and may be, for example, a CNN convolutional neural network model, a VGG convolutional neural network model, and the like.
  • the training data used are Mel frequency cepstrum coefficient maps obtained by smooth audio conversion, and the obtained sound quality evaluation model conforms to the human auditory system and is obtained Evaluation information is more accurate.
  • the frequency spectrum of the audio information of the music to be evaluated is a Mel frequency cepstrum coefficient map.
  • an embodiment of the application provides a method for evaluating music quality.
  • the audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by a sound quality evaluation model trained by a convolutional neural network model.
  • the evaluation is performed to obtain the evaluation information of each piece of music. In this way, it is convenient for users to filter music according to the evaluation information, avoiding interference of low-quality music to users, and purifying the network environment.
  • a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated may be used.
  • An embodiment of the present application provides a method for converting audio information of music to be evaluated into a frequency map with a frequency as a limiting condition. As shown in FIG. 2, FIG. 2 illustrates the audio information of the music to be evaluated with a frequency as a limiting condition. Schematic diagram of the basic process of transforming into a frequency spectrum method.
  • step S1200 includes:
  • the audio information of the music to be evaluated is converted into a frequency spectrum by a spectrum application software, for example, PC Spectrum Sound Spectrum software, FFT spectrum analysis software, and SmaartLive software.
  • a spectrum application software for example, PC Spectrum Sound Spectrum software, FFT spectrum analysis software, and SmaartLive software.
  • Framed, and windowed preprocessing, and the frequency of each frame signal in the audio of the music to be evaluated is obtained by Fourier transform.
  • the frame value can be set according to the actual situation, preferably 32ms (milliseconds), and the windowing can be processed using a hamming window.
  • f is the logarithmic frequency.
  • a map of the Mel frequency is obtained by calculating the Mel frequency.
  • H [k] is a Mel frequency cepstrum coefficient
  • E [k] is a high frequency spectrum.
  • the Mel frequency cepstrum can be obtained by using a low-pass filter, and then the Mel frequency cepstrum chart can be obtained.
  • the change trend of cepstrum frequency is extracted from the Mel frequency cepstrum chart, thereby obtaining a Mel frequency cepstrum coefficient chart.
  • the method of this embodiment further includes a training method of a sound quality evaluation model.
  • FIG. 3 is a schematic flowchart of a training method of a sound quality evaluation model according to an embodiment of the present application.
  • the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of smooth audio.
  • 6000 short audio files with a duration of 5 seconds are extracted from 2000 clear and smooth recordings as a training data source.
  • An arbitrary number of short audio frequencies are extracted from the training data source as training data, and respective Mel frequency cepstrum coefficient maps are extracted from each audio of the training data to obtain a training sample set.
  • step S1312 includes the following steps:
  • Step 1 input multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in order to obtain output values of multiple Mel frequency cepstrum coefficient maps respectively;
  • Step 2 Sort the output values by using the numerical value as a limiting condition
  • Step 3 Confirm that the output value at the middle position in the ranking result is the expected output value of multiple Mel frequency cepstrum coefficient graphs.
  • S1313 Input the training sample set into the convolutional neural network model, and obtain the excitation value of the convolutional neural network model;
  • the Mel frequency cepstrum coefficient map of the training sample set is sequentially input into the neural network model, and the neural network model performs feature extraction on the Mel frequency cepstrum coefficient map.
  • the convolutional layer neural network includes four layers of dual convolutional layers, four layers of pooling layers, and fully connected layers.
  • the convolution kernel in the convolutional layer is collected from the training samples.
  • Features are extracted to get the weight of each unit in the convolution.
  • the preset activation function is used to limit the range of output values.
  • the weights extracted from the volume base layer are used to reduce the pixels to the Mel frequency cepstrum coefficient map, and in order to make the model more stable and independent of the training data, the output value of the pooling layer can be randomly discarded according to a preset discard probability.
  • the fully connected layer is used to output the finally obtained value to the classifier, which is subjected to normalization processing in the classifier to obtain the incentive value.
  • a Mel cepstrum map is input in the first layer of the base layer, and features are extracted using 32 filters with a receptive field of 3 * 3 and a step size of 1, and output in the first pooling layer.
  • the preset drop probability is 0.25 to randomly drop the output value of the pooling layer. It should be noted that after the output of the pooling layer of the fourth layer, because the fully connected layer is prone to overfitting, the output value is immediately discarded in the fully connected layer according to the drop probability of 0.5, and then the fully connected layer is pooled. The remaining output values of the layer are output to the classifier.
  • the excitation value is the excitation data output by the convolutional neural network model according to the input Mel frequency cepstrum coefficient graph.
  • the excitation value is a value with large dispersion.
  • the excitation value is relatively stable data.
  • a loss function is used to determine whether the excitation value output from the fully connected layer of the neural network model is consistent with the set expected classification value. When the results are not consistent, the back-propagation algorithm needs to be used to adjust the weights in the first channel.
  • the loss function determines whether the incentive value is consistent with the set expected value by calculating the distance (Euclidean distance or spatial distance) between the incentive value and the set expected value, and sets a first threshold value (for example, 0.05). When the distance between the incentive value and the set expected classification value is less than or equal to the first threshold, it is determined that the incentive value is consistent with the set expected value; otherwise, the incentive value is not consistent with the set expected value.
  • a first threshold value for example, 0.05
  • a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of audio information of the music to be evaluated.
  • the embodiment of the present application provides a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated by using a sound quality evaluation model.
  • FIG. 4 shows a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application.
  • step S1300 includes:
  • the Mel cepstrum coefficient map of the audio of the music to be evaluated is input into the sound quality evaluation model for calculation, and the output value of the sound quality evaluation model is obtained. Because the sound quality evaluation model is obtained by audio training with smooth speech, the output result indicates the probability of belonging to audio with smooth speech. Therefore, the larger the output value is, the smoother the speech to be evaluated is, and the higher the quality is, the smaller the output value is, the lower the audio quality of the music to be evaluated is.
  • the evaluation index is an index to measure the audio quality of the music to be evaluated. It can be customized and can be expressed in letters. For example, ABCDEF indicates the quality from high to low in turn; it can also be expressed as a score. , The higher the audio quality of the music to be evaluated.
  • the evaluation list is a list showing the mapping relationship between the output value of the sound quality evaluation model and the evaluation index. Using the output value, the corresponding evaluation index can be found through the evaluation list.
  • FIG. 5 is a schematic flowchart of the audio playback method.
  • the method further includes:
  • Play instruction The instruction for the user to play the audio to be played.
  • the playback instruction can be triggered by clicking the audio to be played.
  • the terminal After the terminal obtains the playback instruction, it obtains the quality index of the audio to be played according to the playback instruction. It should be noted that the quality index may be pre-stored in the information of each audio to be played, and the quality index may be directly retrieved after obtaining the playback instruction; or the terminal may use the sound quality evaluation model to perform the audio to be played according to the acquired playback instruction in real time. Evaluation to get quality index.
  • the terminal sets an index threshold for audio playback in advance. For example, the audio can only be played when the quality index of the audio is greater than 95 points.
  • the terminal compares the quality index of the audio to be played with the index threshold, and plays the audio to be played when the index is greater than the index threshold. In this way, the terminal filters the audio quality in the application software through the sound quality evaluation model, on the one hand, it can improve the user's listening experience, On the other hand, it saves time for user selection.
  • FIG. 6 is a schematic flowchart of the audio playback method.
  • step S1332 the method further includes:
  • audio information When audio information is displayed, it can be displayed in descending order of quality index. To facilitate user selection and further improve the user experience.
  • an embodiment of the present application further provides a music quality evaluation device.
  • FIG. 7 is a block diagram of the basic structure of the music quality evaluation device of this embodiment.
  • a music quality evaluation device includes an acquisition module 2100, a processing module 2200, and an execution module 2300.
  • an obtaining module is used to obtain audio information of the music to be evaluated;
  • a processing module is used to convert audio information of the music to be evaluated into a frequency map with frequency as a limiting condition;
  • an executing module is used to convert the music to be evaluated
  • the frequency spectrum of the audio information is input into a preset sound quality evaluation model to obtain the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
  • the music quality evaluation device converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users Filtering music based on evaluation information, avoiding the interference of low-quality music to users, and purifying the network environment.
  • the processing module in the music quality evaluation device includes: a first acquisition sub-module for acquiring a Mel frequency of the audio information of the music to be evaluated; a first processing sub-module for receiving Obtain a Mel frequency cepstrum by using a map of the Mer frequency; a first execution submodule is configured to extract a Mel frequency cepstrum coefficient map from the Mel frequency cepstrum.
  • the execution module specifically includes: a second acquisition sub-module for acquiring an output value of the sound quality evaluation model; and a second execution sub-module for finding in the evaluation list that the output value has The evaluation index of the mapping relationship.
  • the music quality evaluation device when the user searches for the target audio, the music quality evaluation device further includes: a third acquisition sub-module for acquiring a playback instruction; and a second processing sub-module for acquiring a to-be-played according to the playback instruction
  • the evaluation index of the audio is compared with a preset index threshold; a third execution sub-module is configured to play the audio to be played when the evaluation index of the audio to be played is greater than or equal to the index threshold.
  • the playback instruction includes: a keyword of the audio to be played; the music quality evaluation device further includes: a third processing sub-module for when the evaluation index of the audio of the music to be evaluated is smaller than the evaluation index At the exponential threshold, the audio information matching the keywords is searched in a preset database according to the keywords of the audio to be played; a fourth execution sub-module is configured to display the audio information.
  • the music quality evaluation device further includes: a fourth acquisition submodule, configured to acquire a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality
  • a fourth processing sub-module for obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps from the preset convolutional neural network model
  • a fifth processing sub-module for inputting the training sample set To the convolutional neural network model to obtain an excitation value of the convolutional neural network model
  • a fifth execution submodule for comparing whether a distance between the expected value and the excitation value is less than or equal to a preset And when the distance between the expected value and the stimulus value is greater than the first threshold value, the weights in the convolutional neural network model are updated by a reverse algorithm through repeated loop iterations, to the The process ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
  • the fourth processing sub-module specifically includes: a sixth acquisition sub-module for sequentially inputting the plurality of Mel frequency cepstrum coefficient graphs into a preset convolutional neural network model, and respectively acquiring Output values of the plurality of Mel frequency cepstrum coefficient graphs; a sixth processing sub-module for sorting the output values with a numerical value as a limiting condition; a sixth execution sub-module for confirming that the ranking results are in the middle
  • the output value of the position is the expected output value of the multiple Mel frequency cepstrum coefficient maps.
  • FIG. 8 is a block diagram of the basic structure of the computer device of this embodiment.
  • the computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor may implement a An insurance product configuration method.
  • the processor of the computer equipment is used to provide computing and control capabilities to support the operation of the entire computer equipment.
  • the memory of the computer device may store computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor may cause the processor to perform a method for evaluating music quality.
  • the network interface of the computer equipment is used to connect and communicate with the terminal.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • the processor is configured to execute the specific content of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 7, and the memory stores program codes and various types of data required to execute the modules.
  • the network interface is used for data transmission to user terminals or servers.
  • the memory in this embodiment stores the program code and data required for executing all the sub-modules in the insurance product configuration method, and the server can call the program code and data of the server to perform the functions of all the sub-modules.
  • the computer equipment converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users to evaluate Information to filter music, avoid the interference of low quality music to users, and purify the network environment.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors cause the music quality evaluation according to any one of the foregoing embodiments to be performed. Method steps.
  • the computer program may be stored in a computer-readable storage medium.
  • the foregoing storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (Random Access Memory, RAM).
  • steps in the flowchart of the drawings are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited, and they can be performed in other orders. Moreover, at least a part of the steps in the flowchart of the drawing may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution order is also It is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a music quality evaluation method and apparatus, and a computer device and a storage medium. The method comprises the following steps: acquiring audio information of music to be evaluated (S1100); converting the audio information of the music to be evaluated into a frequency spectrum by taking a frequency as a limiting condition (S1200); and inputting the frequency spectrum of the audio information of the music to be evaluated into a pre-set sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated (S1300), wherein the sound quality evaluation model is a convolutional neural network model that is pre-trained to be convergent. Audio information of music to be evaluated is converted into a frequency spectrum, and the frequency spectrum is evaluated by means of a sound quality evaluation model obtained through the training of a convolutional neural network model so as to obtain evaluation information of each piece of music, such that a user can screen music according to evaluation information, and the user is not bothered by low-quality music, and the network environment is purified.

Description

音乐质量评价方法、装置、计算机设备及存储介质Music quality evaluation method, device, computer equipment and storage medium
本申请要求于2018年8月2日提交中国专利局、申请号为201810873498.0,发明名称为“音乐质量评价方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority from a Chinese patent application filed with the Chinese Patent Office on August 2, 2018, with an application number of 201810873498.0, and the invention name is "Music Quality Evaluation Method, Device, Computer Equipment, and Storage Medium", the entire contents of which are incorporated by reference Incorporated in this application.
技术领域Technical field
本申请实施例涉及计算机领域,尤其是一种音乐质量评价方法、装置、计算机设备及存储介质。The embodiments of the present application relate to the field of computers, and in particular, to a method, a device, a computer device, and a storage medium for evaluating music quality.
背景技术Background technique
数字音乐,顾名思义就是以数字信号的方式被存储于数据库中,在网络空间中流动传输,速度较快,可以根据人们的需求进行下载和删除的音乐。数字音乐不依赖传统的音乐载体,如磁带或CD等,可以避免磨损,能保证音乐品质。Digital music, as its name implies, is stored in a database in the form of digital signals, and is transmitted in the network space. It is fast and can be downloaded and deleted according to people's needs. Digital music does not rely on traditional music carriers, such as magnetic tapes or CDs, to avoid wear and tear and to ensure music quality.
近年来,由于数字音乐的发展,音乐作品的数量呈现爆炸式增,但同时也出现了许多电脑自动生成、随机生成音乐,其中,发明人发现此类音乐中绝大多数为无调性音乐,节拍错乱,重复音过多,和声连续不和谐,旋律混乱或存在旋律突然中断的情况,属于低质量音乐。In recent years, due to the development of digital music, the number of musical works has exploded, but many computer-generated and randomly generated music have also appeared. Among them, the inventor found that most of this music is atonal The beats are disordered, there are too many repetitions, the harmony is continuously discordant, the melody is confused, or there is a sudden interruption of the melody, which belongs to low-quality music.
低质量音乐在网络上传播会对网络用户造成干扰,影响其上网体验。The spread of low-quality music on the Internet will cause interference to network users and affect their online experience.
发明内容Summary of the invention
本申请实施例提供一种利用音质评价模型对戴佩妮国家音频信息转化得到的频率谱图进行评价的方法。The embodiment of the present application provides a method for evaluating a frequency spectrum obtained by converting Dai Peini's national audio information by using a sound quality evaluation model.
为解决上述技术问题,本申请创造的实施例采用的一个技术方案是提供一种音乐质量评价方法,包括下述步骤:获取待评价音乐的音频信息;以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。In order to solve the above technical problem, a technical solution adopted in the embodiment created by the present application is to provide a method for evaluating music quality, which includes the following steps: acquiring audio information of the music to be evaluated; The audio information of the music to be evaluated is converted into a frequency map; the frequency map of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model A pre-trained convolutional neural network model.
为解决上述技术问题,本申请实施例还提供一种音乐质量评价装置,包括:获取模块,用于获取待评价音乐的音频信息;处理模块,用于以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;执行模块,用于将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信 息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。In order to solve the above technical problems, an embodiment of the present application further provides a music quality evaluation device, including: an acquisition module for acquiring audio information of music to be evaluated; and a processing module for converting the music to be evaluated with frequency as a limiting condition Convert the audio information into a frequency map; an execution module configured to input the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein: The sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
为解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行一种音乐质量评价方法的下述步骤:获取待评价音乐的音频信息;以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。In order to solve the above technical problem, an embodiment of the present application further provides a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, The processor executes the following steps of a music quality evaluation method: acquiring audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and converting the audio of the music to be evaluated The frequency spectrum of the information is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
为解决上述技术问题,本申请实施例还提供一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行一种音乐质量评价方法的下述步骤:获取待评价音乐的音频信息;以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。In order to solve the above technical problem, an embodiment of the present application further provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute one The following steps of the music quality evaluation method are: obtaining audio information of the music to be evaluated; converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; and inputting the frequency map of the audio information of the music to be evaluated into The preset sound quality evaluation model obtains the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
本申请实施例将待评价音乐的音频信息转化为频率图谱,并通过由卷积神经网络模型训练得到的音质评价模型对该频率图谱进行评价,得到每段音乐的评价信息,如此,可以便于用户根据评价信息来筛选音乐,避免了低质量音乐对用户的干扰,净化了网络环境。In the embodiment of the present application, the audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by using a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information of each piece of music. In this way, it is convenient for users Filtering music based on evaluation information, avoiding the interference of low-quality music to users, and purifying the network environment.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those skilled in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1为本申请实施例音乐质量评价方法的基本流程示意图;FIG. 1 is a schematic flowchart of a music quality evaluation method according to an embodiment of the present application;
图2为本申请实施例以频率为限定条件将待评价音乐的音频信息转化为频率图谱方法的基本流程示意图;FIG. 2 is a schematic flowchart of a method for converting audio information of music to be evaluated into a frequency map using frequency as a limiting condition according to an embodiment of the present application; FIG.
图3为本申请实施例音乐质量评价模型的训练方法的基本流程示意图;3 is a schematic flowchart of a method for training a music quality evaluation model according to an embodiment of the present application;
图4为本申请实施例利用音质评价模型对待评价音乐的音频的梅尔频率倒谱系数图进行评价的方法的基本流程示意图;4 is a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient diagram of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application;
图5为本申请实施例音频播放方法的基本流程示意图;5 is a schematic flowchart of an audio playing method according to an embodiment of the present application;
图6为本申请实施例另一音频播放方法的基本流程示意图;6 is a schematic flowchart of another audio playing method according to an embodiment of the present application;
图7为本申请实施例音频质量评价装置基本结构框图;7 is a block diagram of a basic structure of an audio quality evaluation device according to an embodiment of the present application;
图8为本申请实施例计算机设备基本结构框图。FIG. 8 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
在本申请的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。In the description and claims of this application and some of the processes described in the above drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that these operations may not follow the order in which they appear in this document. Execution or parallel execution. The sequence numbers of operations such as 101 and 102 are only used to distinguish different operations. The sequence numbers themselves do not represent any order of execution. In addition, these processes may include more or fewer operations, and these operations may be performed sequentially or in parallel. It should be noted that the descriptions such as "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent the sequence, nor do they limit "first" and "second" Are different types.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall into the protection scope of the present application.
实施例Examples
本技术领域技术人员可以理解,这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备,其仅具备无发射能力的无线信号接收器的设备,又包括接收和发射硬件的设备,其具有能够在双向通信链路上,执行双向通信的接收和发射硬件的设备。这种设备可以包括:蜂窝或其他通信设备,其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备;PCS(Personal Communications Service,个人通信系统),其可以组合语音、数据处理、传真和/或数据通信能力;PDA(Personal Digital Assistant,个人数字助理),其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global Positioning System,全球定位系统)接收器;常规膝上型和/或掌上型计算机或其他设备,其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的,或者适合于和/或配置为在本地运行,和/或以分布形式,运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端,例如可以是PDA、MID(Mobile Internet Device,移动互联网设备)和/或具有音乐/视频播放功能的移动电话,也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the "terminal" and "terminal equipment" used herein include both wireless signal receiver devices, and only devices with wireless signal receivers that have no transmitting capability, as well as receiving and transmitting hardware. A device having receiving and transmitting hardware capable of performing two-way communication over a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, Personal Communication System), which can combine voice and data Processing, fax and / or data communication capabilities; PDA (Personal Digital Assistant), which may include radio frequency receivers, pagers, internet / intranet access, web browsers, notepads, calendars and / or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and / or palmtop computer or other device having and / or a conventional laptop and / or palmtop computer or other device including a radio frequency receiver. As used herein, "terminal", "terminal equipment" may be portable, transportable, installed in a vehicle (air, sea, and / or land), or suitable and / or configured to operate locally, and / or Operate in a distributed fashion on any other location on Earth and / or space. The "terminal" and "terminal equipment" used herein may also be communication terminals, Internet terminals, music / video playback terminals, such as PDA, MID (Mobile Internet Device), and / or have music / video playback Functional mobile phones can also be smart TVs, set-top boxes and other devices.
本实施方式中的客户终端即为上述的终端。The client terminal in this embodiment is the terminal described above.
具体地,请参阅图1,图1为本实施例保险产品配置方法的基本流程示意图。Specifically, please refer to FIG. 1, which is a schematic flowchart of a method for configuring an insurance product according to this embodiment.
如图1所示,保险产品配置方法包括下述步骤:As shown in Figure 1, the method for configuring insurance products includes the following steps:
S1100、获取待评价音乐的音频信息;S1100. Acquire audio information of the music to be evaluated.
待评价音乐的音频信息包括待评价音乐的音频,可以为由数字信号生成的数字音频文件,通过乐器创作的音频文件,网络上传播的音频文件,或者从视频文件中提取的音频文件等。其中,各类音频文件的格式为MP3、WAVE、WMA、VQF、MIDI、AIFF、MPEG等。The audio information of the music to be evaluated includes the audio of the music to be evaluated, which may be a digital audio file generated from a digital signal, an audio file created by a musical instrument, an audio file spread on the Internet, or an audio file extracted from a video file. Among them, the formats of various audio files are MP3, WAVE, WMA, VQF, MIDI, AIFF, MPEG, etc.
在实际应用中,获取待评价音乐的音频信息的方法包括直接从网络上、本地文件获取待评价音乐的音频信息,或者通过从视频文件中提取音频文件来获取待评价音乐的音频信息。In practical applications, a method for obtaining audio information of music to be evaluated includes directly obtaining audio information of music to be evaluated from a network or a local file, or obtaining audio information of music to be evaluated by extracting an audio file from a video file.
S1200、以频率为限定条件将待评价音乐的音频信息转化为频率图谱;S1200: Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;
将待评价音乐的音频信息转化为频率图谱可以通过频谱应用软件来转化,例如,PC Sound Spectrum软件、FFT频谱分析软件、SmartLive软件等。在实际应用中,为了使频率图谱中的频率连续、清楚通常在转化频率图谱的过程中,对待评价音乐的音频进行预加重、加窗及傅里叶变换处理。The frequency information spectrum of the music information to be evaluated can be converted by spectrum application software, for example, PC Sound Spectrum software, FFT spectrum analysis software, and SmartLive software. In practice, in order to make the frequencies in the frequency spectrum continuous and clear, the audio of the music to be evaluated is pre-emphasized, windowed, and Fourier transformed in the process of transforming the frequency spectrum.
本申请的一个实施例,以频率为限定条件将待评价音乐的音频信息转化为梅尔频率倒谱系数图。梅尔频率倒谱系数图可以由上述频谱应用软件转化得到的频率图谱获得。According to an embodiment of the present application, the frequency information is used to convert the audio information of the music to be evaluated into a Mel frequency cepstrum coefficient map. Mel frequency cepstrum coefficient graph can be obtained from the frequency spectrum transformed by the above spectrum application software.
需要说明的是,梅尔频率倒谱系数图(Mel-Frequency Cepstral Coefficients,MFCCs)是梅尔频率倒谱的系数组成的图谱。它们派生自音频片段的倒谱(cepstrum),其中,梅尔频率倒谱的频带划分是在梅尔刻度上等距划分的,比用于正常的对数倒频谱(上述通过应用软件获得的频率图谱)中的线性间隔的频带更能近似人类的听觉系统,这种频率的弯曲(梅尔频率倒谱系数图中曲线的弯曲)可以更好的表示声音。因此,对于声音流畅的音频,其梅尔频率倒谱系数图中系数的变化曲线更加符合人类听觉系统,对于噪音、杂音,其梅尔频率倒谱系数图中系数的变化与不符合人类听觉系统。It should be noted that the Mel-Frequency Cepstral Coefficients (MFCCs) is a map composed of the coefficients of the Mel-Frequency Cepstrum Coefficients. They are derived from cepstrum of audio clips, in which the frequency band division of the Mel frequency cepstrum is divided equally on the Mel scale, which is better than the normal log cepstrum (the frequency obtained by the application software above) The linearly spaced frequency bands in the graph) can more closely approximate the human auditory system. This kind of frequency bending (curvature of the curve in the Mel frequency cepstrum coefficient graph) can better represent the sound. Therefore, for smooth audio, the change curve of the coefficients in the Mel frequency cepstrum coefficient graph is more consistent with the human hearing system. For noise and noise, the changes in the coefficients in the Mel frequency cepstrum coefficient graph are not consistent with the human hearing system. .
S1300、将待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到待评价音乐的音频信息的评价信息。S1300. Input the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated.
其中,音质评价模型为预先训练至收敛的卷积神经网络模型,例如可以为CNN卷积神经网络模型、VGG卷积神经网络模型等。The sound quality evaluation model is a convolutional neural network model that is pre-trained to convergence, and may be, for example, a CNN convolutional neural network model, a VGG convolutional neural network model, and the like.
本申请的一个实施例,训练音质评价模型时,所使用的训练数据均为由语音流畅 的音频转换得到的梅尔频率倒谱系数图,由此得到的音质评价模型符合人类听觉系统,得出的评价信息更为准确。与此同时,为确保评价准确,输入的待评价音乐的音频信息的频率图谱为梅尔频率倒谱系数图。In an embodiment of the present application, when training a sound quality evaluation model, the training data used are Mel frequency cepstrum coefficient maps obtained by smooth audio conversion, and the obtained sound quality evaluation model conforms to the human auditory system and is obtained Evaluation information is more accurate. At the same time, in order to ensure accurate evaluation, the frequency spectrum of the audio information of the music to be evaluated is a Mel frequency cepstrum coefficient map.
为解决本申请中的问题,本申请实施例提供一种音乐质量评价方法,将待评价音乐的音频信息转化为频率图谱,并通过由卷积神经网络模型训练得到的音质评价模型对该频率图谱进行评价,得到每段音乐的评价信息,如此,可以便于用户根据评价信息来筛选音乐,避免了低质量音乐对用户的干扰,净化了网络环境。In order to solve the problems in this application, an embodiment of the application provides a method for evaluating music quality. The audio information of the music to be evaluated is converted into a frequency map, and the frequency map is evaluated by a sound quality evaluation model trained by a convolutional neural network model. The evaluation is performed to obtain the evaluation information of each piece of music. In this way, it is convenient for users to filter music according to the evaluation information, avoiding interference of low-quality music to users, and purifying the network environment.
在上述实施例中,为了评价准确可以使用待评价音乐的音频的梅尔频率倒谱系数图。本申请的一个实施例提供一种以频率为限定条件将待评价音乐的音频信息转化为频率图谱方法,如图2所示,图2示出了以频率为限定条件将待评价音乐的音频信息转化为频率图谱方法的基本流程示意图。In the above embodiment, in order to evaluate accurately, a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated may be used. An embodiment of the present application provides a method for converting audio information of music to be evaluated into a frequency map with a frequency as a limiting condition. As shown in FIG. 2, FIG. 2 illustrates the audio information of the music to be evaluated with a frequency as a limiting condition. Schematic diagram of the basic process of transforming into a frequency spectrum method.
如图2所示,步骤S1200包括:As shown in FIG. 2, step S1200 includes:
S1210、获取待评价音乐的音频信息的梅尔频率;S1210. Obtain the Mel frequency of the audio information of the music to be evaluated.
将待评价音乐的音频信息通过频谱应用软件转化为频率图谱,例如,PC Sound Spectrum软件、FFT频谱分析软件、SmaartLive软件等,在转化对数频率图谱的过程中将待评价音乐的音频进行预加重、分帧、加窗的预处理,并通过傅里叶变换得到待评价音乐的音频中每帧信号的频率。其中,分帧可按照实际情况进行取值,优选为32ms(毫秒),加窗可以使用hamming窗处理。The audio information of the music to be evaluated is converted into a frequency spectrum by a spectrum application software, for example, PC Spectrum Sound Spectrum software, FFT spectrum analysis software, and SmaartLive software. , Framed, and windowed preprocessing, and the frequency of each frame signal in the audio of the music to be evaluated is obtained by Fourier transform. Among them, the frame value can be set according to the actual situation, preferably 32ms (milliseconds), and the windowing can be processed using a hamming window.
利用梅尔频率转化公式计算梅尔频率f melCalculate Mel frequency f mel using Mel frequency conversion formula,
Figure PCTCN2018125449-appb-000001
Figure PCTCN2018125449-appb-000001
其中,f为对数频率。通过计算梅尔频率得到梅尔频率的图谱。Where f is the logarithmic frequency. A map of the Mel frequency is obtained by calculating the Mel frequency.
S1220、根据梅尔频率的图谱获取梅尔频率倒谱;S1220. Obtain a Mel frequency cepstrum according to the map of the Mel frequency;
假设梅尔频谱为X[k],Assuming the Mel spectrum is X [k],
X[k]=H[k]E[k]X [k] = H [k] E [k]
其中,H[k]为梅尔频率倒谱系数,E[k]为高频谱。Among them, H [k] is a Mel frequency cepstrum coefficient, and E [k] is a high frequency spectrum.
对公式X[k]取对数,得到Take the logarithm of the formula X [k] to get
log X[k]=log H[k]+log E[k]log X [k] = log H [k] + log E [k]
再通过反离散余弦进行逆变换得到Then inverse transform by inverse discrete cosine
X[k]=H[k]+E[k]X [k] = H [k] + E [k]
即梅尔频率倒谱系数H[k],That is, the Mel frequency cepstrum coefficient H [k],
H[k]=X[k]-E[k]H [k] = X [k] -E [k]
由于E[k]为高频谱,利用低通滤波器即可得到梅尔频率倒谱,进而得到梅尔频率倒谱图。Since E [k] is a high frequency spectrum, the Mel frequency cepstrum can be obtained by using a low-pass filter, and then the Mel frequency cepstrum chart can be obtained.
S1230、从梅尔频率倒谱中提取梅尔频率倒谱系数图。S1230. Extract a Mel frequency cepstrum coefficient map from the Mel frequency cepstrum.
由梅尔频率倒谱图中提取倒谱频率的变化趋势,从而得到梅尔频率倒谱系数图。The change trend of cepstrum frequency is extracted from the Mel frequency cepstrum chart, thereby obtaining a Mel frequency cepstrum coefficient chart.
本实施例方式中,还包括音质评价模型的训练方法,具体请参阅图3,图3为本申请实施例音质评价模型的训练方法的基本流程示意图。The method of this embodiment further includes a training method of a sound quality evaluation model. Please refer to FIG. 3 for details. FIG. 3 is a schematic flowchart of a training method of a sound quality evaluation model according to an embodiment of the present application.
如图3所示,包括如下步骤:As shown in Figure 3, it includes the following steps:
S1311、获取训练样本集;S1311. Obtain a training sample set;
训练样本集包括从多段音质流畅的音频中提取的多张梅尔频率倒谱系数图。本申请的一个实施例,从2000收清晰流畅的录音中提取6000个时长为5秒的短音频作为训练数据源。从训练数据源中提取任意多个短音频作为训练数据,从训练数据的每个音频中提取各自的梅尔频率倒谱系数图,得到训练样本集。其中,从训练数据的每个音频中提取各自的梅尔频率倒谱系数图的方法请参照上述实施例,在此不再赘述。The training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of smooth audio. In an embodiment of the present application, 6000 short audio files with a duration of 5 seconds are extracted from 2000 clear and smooth recordings as a training data source. An arbitrary number of short audio frequencies are extracted from the training data source as training data, and respective Mel frequency cepstrum coefficient maps are extracted from each audio of the training data to obtain a training sample set. For a method of extracting a respective Mel frequency cepstrum coefficient map from each audio of the training data, refer to the foregoing embodiment, and details are not described herein again.
S1312、由预设的卷积神经网络模型获取多张梅尔频率倒谱系数图的期望值;S1312. Obtain the expected values of multiple Mel frequency cepstrum coefficient graphs by a preset convolutional neural network model;
具体地,获取梅尔频率倒谱系数图的方法,即步骤S1312包括如下步骤:Specifically, a method for obtaining a Mel frequency cepstrum coefficient map, that is, step S1312 includes the following steps:
步骤一、将多张梅尔频率倒谱系数图依次输入到预设的卷积神经网络模型中,分别获取多张梅尔频率倒谱系数图的输出值;Step 1: input multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in order to obtain output values of multiple Mel frequency cepstrum coefficient maps respectively;
步骤二、以数值为限定条件对输出值进行排序;Step 2: Sort the output values by using the numerical value as a limiting condition;
步骤三、确认排序结果中处于中间位置的输出值为多张梅尔频率倒谱系数图的期望输出值。Step 3: Confirm that the output value at the middle position in the ranking result is the expected output value of multiple Mel frequency cepstrum coefficient graphs.
需要说明的是,梅尔频率倒谱系数图的选取个数可以自定义设置,个数越多,评价模型的评价指数越准确。It should be noted that the number of Mel frequency cepstrum coefficient graphs can be customized. The more the number, the more accurate the evaluation index of the evaluation model.
S1313、将训练样本集输入到卷积神经网络模型中,获取卷积神经网络模型的激励值;S1313: Input the training sample set into the convolutional neural network model, and obtain the excitation value of the convolutional neural network model;
将训练样本集的梅尔频率倒谱系数图依次输入到神经网络模型中,神经网络模型对梅尔频率倒谱系数图进行特征提取。The Mel frequency cepstrum coefficient map of the training sample set is sequentially input into the neural network model, and the neural network model performs feature extraction on the Mel frequency cepstrum coefficient map.
需要说明的是,本实施例中,卷积层神经网络包括四层双卷积层、四层池化层以及全连接层,在特征提取过程中,卷基层中的卷积核从训练样本集中提取特征,以此得到卷积中每个单元的权重。为了使模型更加准确,利用预设的激活函数限定输出值的范围。在池化层中,利用卷基层提取的权重对梅尔频率倒谱系数图降低像素,并为了使模型更加稳定不依赖于训练数据可以按照预设的丢弃概率随机丢弃池化层的输出 值。全连接层用于将最后得到的值输出到分类器,在分类器中进行归一化处理,得到激励值。It should be noted that, in this embodiment, the convolutional layer neural network includes four layers of dual convolutional layers, four layers of pooling layers, and fully connected layers. During the feature extraction process, the convolution kernel in the convolutional layer is collected from the training samples. Features are extracted to get the weight of each unit in the convolution. To make the model more accurate, the preset activation function is used to limit the range of output values. In the pooling layer, the weights extracted from the volume base layer are used to reduce the pixels to the Mel frequency cepstrum coefficient map, and in order to make the model more stable and independent of the training data, the output value of the pooling layer can be randomly discarded according to a preset discard probability. The fully connected layer is used to output the finally obtained value to the classifier, which is subjected to normalization processing in the classifier to obtain the incentive value.
本申请的一个实施方式,在第一卷基层中输入梅尔倒谱图,采用32个感受野为3*3,步长为1的滤波器提取特征,并在第一池化层输出,按照预设的丢弃概率0.25随机丢弃池化层的输出值。需要说明的是,在第四层的池化层输出后,由于全连接层容易出现过度拟合,因此,在全连接层按照0.5的丢弃概率随即丢弃输出值,然后由全连接层将池化层剩余的输出值输出至分类器。In an embodiment of the present application, a Mel cepstrum map is input in the first layer of the base layer, and features are extracted using 32 filters with a receptive field of 3 * 3 and a step size of 1, and output in the first pooling layer. The preset drop probability is 0.25 to randomly drop the output value of the pooling layer. It should be noted that after the output of the pooling layer of the fourth layer, because the fully connected layer is prone to overfitting, the output value is immediately discarded in the fully connected layer according to the drop probability of 0.5, and then the fully connected layer is pooled. The remaining output values of the layer are output to the classifier.
其中,激励值是卷积神经网络模型根据输入的梅尔频率倒谱系数图输出的激励数据,在神经网络模型未被训练至收敛之前,激励值为离散性较大的数值,当神经网络模型被训练至收敛之后,激励值为相对稳定的数据。Among them, the excitation value is the excitation data output by the convolutional neural network model according to the input Mel frequency cepstrum coefficient graph. Before the neural network model is not trained to convergence, the excitation value is a value with large dispersion. When the neural network model After being trained to convergence, the excitation value is relatively stable data.
S1314、比对期望值与激励值之间的距离是否小于或等于预设的第一阈值,并当期望值与激励值之间的距离大于第一阈值时,反复循环迭代的通过反向算法更新卷积神经网络模型中的权重,至期望值与激励值之间的距离小于或等于预设的第一阈值时结束。S1314. Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold value, and when the distance between the expected value and the incentive value is greater than the first threshold value, iteratively update the convolution through an inverse algorithm repeatedly and repeatedly. The weight in the neural network model ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
通过损失函数判断神经网络模型全连接层输出的激励值与设定的期望分类值是否一致,当结果不一致时,需要通过反向传播算法对第一通道内的权重进行调整。A loss function is used to determine whether the excitation value output from the fully connected layer of the neural network model is consistent with the set expected classification value. When the results are not consistent, the back-propagation algorithm needs to be used to adjust the weights in the first channel.
在一些实施方式中,损失函数通过计算激励值与设定的期望值之间的距离(欧氏距离或者空间距离),来确定激励值与设定的期望值是否一致,设定第一阈值(例如,0.05),当激励值与设定的期望分类值之间的距离小于或等于第一阈值时,则确定激励值与设定的期望值一致,否则,则激励值与设定的期望值不一致。In some embodiments, the loss function determines whether the incentive value is consistent with the set expected value by calculating the distance (Euclidean distance or spatial distance) between the incentive value and the set expected value, and sets a first threshold value (for example, 0.05). When the distance between the incentive value and the set expected classification value is less than or equal to the first threshold, it is determined that the incentive value is consistent with the set expected value; otherwise, the incentive value is not consistent with the set expected value.
当神经网络模型的激励值与设定的期望值不一致时,需要采用随机梯度下降算法对神经网络模型中的权重进行校正,以使卷积神经网络模型的输出结果与分类判断信息的期望结果相同。通过若干训练样本集(在一些实施方式中,训练时将所有训练样本集内的图片打乱进行训练,以增加模型的靠干扰能力,增强输出的稳定性。)的反复的训练与校正,当神经网络模型输出值与各训练样本的参照信息比对达到(不限于)99.5%时,训练结束。When the excitation value of the neural network model is inconsistent with the set expected value, a random gradient descent algorithm needs to be used to correct the weights in the neural network model so that the output result of the convolutional neural network model is the same as the expected result of the classification judgment information. Through repeated training and correction through several training sample sets (in some embodiments, pictures in all training sample sets are scrambled during training to increase the model's ability to rely on interference and enhance the stability of the output.) When the comparison between the output value of the neural network model and the reference information of each training sample reaches (not limited to) 99.5%, the training ends.
为了评价准确,将待评价音乐的音频的梅尔频率倒谱系数图输入到预设的音质评价模型中,得到待评价音乐的音频信息的评价信息。具体本申请实施例提供一种利用音质评价模型对待评价音乐的音频的梅尔频率倒谱系数图进行评价的方法。如图4所示,图4示出了本申请实施例利用音质评价模型对待评价音乐的音频的梅尔频率倒谱系数图进行评价的方法的基本流程示意图。In order to evaluate accurately, a Mel frequency cepstrum coefficient map of the audio of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of audio information of the music to be evaluated. Specifically, the embodiment of the present application provides a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated by using a sound quality evaluation model. As shown in FIG. 4, FIG. 4 shows a schematic flowchart of a method for evaluating a Mel frequency cepstrum coefficient map of audio of music to be evaluated using a sound quality evaluation model according to an embodiment of the present application.
如图4所示,步骤S1300包括:As shown in FIG. 4, step S1300 includes:
S1321、获取音质评价模型的输出值;S1321. Obtain the output value of the sound quality evaluation model.
将待评价音乐的音频的梅尔倒谱系数图输入到音质评价模型中进行计算,得到音质评价模型的输出值。由于音质评价模型是有语音流畅的音频训练得到的,其输出的结果表示属于语音流畅的音频的概率。因此,其输出值越大表示待评价语音越流畅,质量越高,输出值越小表示待评价音乐的音频的质量越低。The Mel cepstrum coefficient map of the audio of the music to be evaluated is input into the sound quality evaluation model for calculation, and the output value of the sound quality evaluation model is obtained. Because the sound quality evaluation model is obtained by audio training with smooth speech, the output result indicates the probability of belonging to audio with smooth speech. Therefore, the larger the output value is, the smoother the speech to be evaluated is, and the higher the quality is, the smaller the output value is, the lower the audio quality of the music to be evaluated is.
S1322、在评价列表中查找与输出值具有映射关系的评价指数。S1322. Find an evaluation index having a mapping relationship with the output value in the evaluation list.
评价指数为衡量待评价音乐的音频质量的指数,可以进行自定义设置,可以采用字母表示,例如,ABCDEF依次表示质量由高到低;也可以用分数表示,例如,满分100分,分数越高,待评价音乐的音频的质量越高。The evaluation index is an index to measure the audio quality of the music to be evaluated. It can be customized and can be expressed in letters. For example, ABCDEF indicates the quality from high to low in turn; it can also be expressed as a score. , The higher the audio quality of the music to be evaluated.
评价列表为表示音质评价模型的输出值与评价指数的映射关系的列表,利用输出值可以通过评价列表查找对应的评价指数。The evaluation list is a list showing the mapping relationship between the output value of the sound quality evaluation model and the evaluation index. Using the output value, the corresponding evaluation index can be found through the evaluation list.
本申请实施例的一个应用场景,用户在音乐播放的应用软件中搜索目标音频以进行播放。由于目标音频的版本众多,同时为了商家为了流量网络上还有很多与目标音频的关键词相同的低质量音频,因此,用户在音乐播放软件中输入目标音频的关键词后,会出现大量与关键词匹配的音频,使得用户无从选择。本申请的一个实施例,本申请实施例提供一种音频播放方法,如图5所示,图5为音频播放方法的基本流程示意图。In an application scenario of the embodiment of the present application, a user searches for a target audio in application software for music playback for playback. Because there are many versions of target audio, and there are many low-quality audios on the network that are the same as the keywords of target audio for the purpose of traffic, so after the user enters the keywords of target audio in the music player software, a large number of key words will appear. Word matching audio makes it impossible for users to choose. An embodiment of the present application provides an audio playback method. As shown in FIG. 5, FIG. 5 is a schematic flowchart of the audio playback method.
如图5所示,步骤S1300之后,还包括:As shown in FIG. 5, after step S1300, the method further includes:
S1331、获取播放指令;S1331. Obtain a playback instruction.
播放指令用户使待播放音频进行播放的指令,播放指令可以通过单击待播放音频触发。Play instruction The instruction for the user to play the audio to be played. The playback instruction can be triggered by clicking the audio to be played.
S1332、根据播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较;S1332. Obtain the evaluation index of the audio to be played according to the playback instruction, and compare it with a preset index threshold;
终端获取播放指令后,根据播放指令获取待播放音频的质量指数。需要说明的是,质量指数可以预存于每个待播放音频的信息中,在获取到播放指令后直接调取质量指数;也可以是终端根据获取的播放指令实时的利用音质评价模型对待播放音频进行评价,以得到质量指数。After the terminal obtains the playback instruction, it obtains the quality index of the audio to be played according to the playback instruction. It should be noted that the quality index may be pre-stored in the information of each audio to be played, and the quality index may be directly retrieved after obtaining the playback instruction; or the terminal may use the sound quality evaluation model to perform the audio to be played according to the acquired playback instruction in real time. Evaluation to get quality index.
S1333、当待播放音频的评价指数大于或等于指数阈值时,播放待播放音频。S1333. When the evaluation index of the audio to be played is greater than or equal to the index threshold, play the audio to be played.
终端预先设置关于音频播放的指数阈值,例如,当音频的质量指数大于95分才可以播放。终端将待播放音频的质量指数与指数阈值进行比较,当大于指数阈值时播放待播放音频,如此,终端通过音质评价模型对应用软件中的音频质量进行筛选,一方 面可以提高用户的听觉体验,另一方面为用户挑选节省了时间。The terminal sets an index threshold for audio playback in advance. For example, the audio can only be played when the quality index of the audio is greater than 95 points. The terminal compares the quality index of the audio to be played with the index threshold, and plays the audio to be played when the index is greater than the index threshold. In this way, the terminal filters the audio quality in the application software through the sound quality evaluation model, on the one hand, it can improve the user's listening experience, On the other hand, it saves time for user selection.
本申请的一个实施例,本申请实施例提供了另一种音频播放方法,如图6所示,图6为音频播放方法的基本流程示意图。An embodiment of the present application. The embodiment of the present application provides another audio playback method. As shown in FIG. 6, FIG. 6 is a schematic flowchart of the audio playback method.
如图6所示,步骤S1332之后,还包括:As shown in FIG. 6, after step S1332, the method further includes:
S1334、当待评价音乐的音频的评价指数小于指数阈值时,根据待播放音频的关键词在预设的数据库中查找与关键词匹配的音频信息;S1334. When the evaluation index of the audio of the music to be evaluated is less than the index threshold, search for a preset database of audio information matching the keywords according to the keywords of the audio to be played;
S1335、显示音频信息。S1335: Display audio information.
当显示音频信息时,可以按照质量指数由高到低排列显示。以便于用户挑选,进一步提高用户体验。When audio information is displayed, it can be displayed in descending order of quality index. To facilitate user selection and further improve the user experience.
为解决上述技术问题本申请实施例还提供一种音乐质量评价装置。具体请参阅图7,图7为本实施例音乐质量评价装置基本结构框图。In order to solve the above technical problems, an embodiment of the present application further provides a music quality evaluation device. For details, please refer to FIG. 7, which is a block diagram of the basic structure of the music quality evaluation device of this embodiment.
如图7所示,一种音乐质量评价装置,包括:获取模块2100、处理模块2200和执行模块2300。其中,获取模块,用于获取待评价音乐的音频信息;处理模块,用于以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;执行模块,用于将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。As shown in FIG. 7, a music quality evaluation device includes an acquisition module 2100, a processing module 2200, and an execution module 2300. Wherein, an obtaining module is used to obtain audio information of the music to be evaluated; a processing module is used to convert audio information of the music to be evaluated into a frequency map with frequency as a limiting condition; an executing module is used to convert the music to be evaluated The frequency spectrum of the audio information is input into a preset sound quality evaluation model to obtain the evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a convolutional neural network model trained in advance to convergence.
音乐质量评价装置将待评价音乐的音频信息转化为频率图谱,并通过由卷积神经网络模型训练得到的音质评价模型对该频率图谱进行评价,得到每段音乐的评价信息,如此,可以便于用户根据评价信息来筛选音乐,避免了低质量音乐对用户的干扰,净化了网络环境。The music quality evaluation device converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users Filtering music based on evaluation information, avoiding the interference of low-quality music to users, and purifying the network environment.
在一些实施方式中,音乐质量评价装置中的处理模块包括:第一获取子模块,用于获取所述待评价音乐的音频信息的梅尔频率;第一处理子模块,用于根据所述梅尔频率的图谱获取梅尔频率倒谱;第一执行子模块,用于从所述梅尔频率倒谱中提取梅尔频率倒谱系数图。In some embodiments, the processing module in the music quality evaluation device includes: a first acquisition sub-module for acquiring a Mel frequency of the audio information of the music to be evaluated; a first processing sub-module for receiving Obtain a Mel frequency cepstrum by using a map of the Mer frequency; a first execution submodule is configured to extract a Mel frequency cepstrum coefficient map from the Mel frequency cepstrum.
在一些实施方式中,所述执行模块具体包括:第二获取子模块,用于获取所述音质评价模型的输出值;第二执行子模块,用于在评价列表中查找与所述输出值具有映射关系的评价指数。In some embodiments, the execution module specifically includes: a second acquisition sub-module for acquiring an output value of the sound quality evaluation model; and a second execution sub-module for finding in the evaluation list that the output value has The evaluation index of the mapping relationship.
在一些实施方式中,当用户搜索目标音频时,所述音乐质量评价装置还包括:第三获取子模块,用于获取播放指令;第二处理子模块,用于根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较;第三执行子模块,用于当所述 待播放音频的评价指数大于或等于所述指数阈值时,播放所述待播放音频。In some embodiments, when the user searches for the target audio, the music quality evaluation device further includes: a third acquisition sub-module for acquiring a playback instruction; and a second processing sub-module for acquiring a to-be-played according to the playback instruction The evaluation index of the audio is compared with a preset index threshold; a third execution sub-module is configured to play the audio to be played when the evaluation index of the audio to be played is greater than or equal to the index threshold.
在一些实施方式中,所述播放指令包括:待播放音频的关键词;所述音乐质量评价装置还包括:第三处理子模块,用于当所述待评价音乐的音频的评价指数小于所述指数阈值时,根据所述待播放音频的关键词在预设的数据库中查找与所述关键词匹配的音频信息;第四执行子模块,用于显示所述音频信息。In some implementation manners, the playback instruction includes: a keyword of the audio to be played; the music quality evaluation device further includes: a third processing sub-module for when the evaluation index of the audio of the music to be evaluated is smaller than the evaluation index At the exponential threshold, the audio information matching the keywords is searched in a preset database according to the keywords of the audio to be played; a fourth execution sub-module is configured to display the audio information.
在一些实施方式中,音乐质量评价装置还包括:第四获取子模块,用于获取训练样本集,所述训练样本集包括从多段音质流畅的音频中提取的多张梅尔频率倒谱系数图;第四处理子模块,用于由预设的所述卷积神经网络模型获取所述多张梅尔频率倒谱系数图的期望值;第五处理子模块,用于将所述训练样本集输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的激励值;第五执行子模块,用于比对所述期望值与所述激励值之间的距离是否小于或等于预设的第一阈值,并当所述期望值与所述激励值之间的距离大于所述第一阈值时,反复循环迭代的通过反向算法更新所述卷积神经网络模型中的权重,至所述期望值与所述激励值之间的距离小于或等于预设的第一阈值时结束。In some implementations, the music quality evaluation device further includes: a fourth acquisition submodule, configured to acquire a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality A fourth processing sub-module for obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps from the preset convolutional neural network model; a fifth processing sub-module for inputting the training sample set To the convolutional neural network model to obtain an excitation value of the convolutional neural network model; a fifth execution submodule for comparing whether a distance between the expected value and the excitation value is less than or equal to a preset And when the distance between the expected value and the stimulus value is greater than the first threshold value, the weights in the convolutional neural network model are updated by a reverse algorithm through repeated loop iterations, to the The process ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
在一些实施方式中,第四处理子模块,具体包括:第六获取子模块,用于将所述多张梅尔频率倒谱系数图依次输入到预设的卷积神经网络模型中,分别获取所述多张梅尔频率倒谱系数图的输出值;第六处理子模块,用于以数值为限定条件对所述输出值进行排序;第六执行子模块,用于确认排序结果中处于中间位置的输出值为所述多张梅尔频率倒谱系数图的期望输出值。In some implementations, the fourth processing sub-module specifically includes: a sixth acquisition sub-module for sequentially inputting the plurality of Mel frequency cepstrum coefficient graphs into a preset convolutional neural network model, and respectively acquiring Output values of the plurality of Mel frequency cepstrum coefficient graphs; a sixth processing sub-module for sorting the output values with a numerical value as a limiting condition; a sixth execution sub-module for confirming that the ranking results are in the middle The output value of the position is the expected output value of the multiple Mel frequency cepstrum coefficient maps.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。In order to solve the above technical problems, embodiments of the present application further provide computer equipment. For details, please refer to FIG. 8, which is a block diagram of the basic structure of the computer device of this embodiment.
如图8所示,计算机设备的内部结构示意图。如图8所示,该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、存储器和网络接口。其中,该计算机设备的非易失性存储介质存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器执行时,可使得处理器实现一种保险产品配置方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种音乐质量评价方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件, 或者具有不同的部件布置。As shown in FIG. 8, a schematic diagram of the internal structure of the computer equipment. As shown in FIG. 8, the computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor may implement a An insurance product configuration method. The processor of the computer equipment is used to provide computing and control capabilities to support the operation of the entire computer equipment. The memory of the computer device may store computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor may cause the processor to perform a method for evaluating music quality. The network interface of the computer equipment is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
本实施方式中处理器用于执行图7中获取模块2100、处理模块2200和执行模块2300的具体内容,存储器存储有执行上述模块所需的程序代码和各类数据。网络接口用于向用户终端或服务器之间的数据传输。本实施方式中的存储器存储有保险产品配置方法中执行所有子模块所需的程序代码及数据,服务器能够调用服务器的程序代码及数据执行所有子模块的功能。In this embodiment, the processor is configured to execute the specific content of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 7, and the memory stores program codes and various types of data required to execute the modules. The network interface is used for data transmission to user terminals or servers. The memory in this embodiment stores the program code and data required for executing all the sub-modules in the insurance product configuration method, and the server can call the program code and data of the server to perform the functions of all the sub-modules.
计算机设备将待评价音乐的音频信息转化为频率图谱,并通过由卷积神经网络模型训练得到的音质评价模型对该频率图谱进行评价,得到每段音乐的评价信息,如此,可以便于用户根据评价信息来筛选音乐,避免了低质量音乐对用户的干扰,净化了网络环境。The computer equipment converts the audio information of the music to be evaluated into a frequency map, and evaluates the frequency map through a sound quality evaluation model trained by a convolutional neural network model to obtain evaluation information for each piece of music. In this way, it is convenient for users to evaluate Information to filter music, avoid the interference of low quality music to users, and purify the network environment.
本申请还提供一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一实施例所述音乐质量评价方法的步骤。The present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors cause the music quality evaluation according to any one of the foregoing embodiments to be performed. Method steps.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art may understand that all or part of the processes in the methods of the foregoing embodiments may be implemented by using a computer program to instruct related hardware. The computer program may be stored in a computer-readable storage medium. When executed, the processes of the embodiments of the methods described above may be included. The foregoing storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (Random Access Memory, RAM).
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowchart of the drawings are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited, and they can be performed in other orders. Moreover, at least a part of the steps in the flowchart of the drawing may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution order is also It is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.
以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above description is only part of the implementation of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and retouches can be made. These improvements and retouches also It should be regarded as the protection scope of this application.

Claims (20)

  1. 一种音乐质量评价方法,包括下述步骤:A music quality evaluation method includes the following steps:
    获取待评价音乐的音频信息;Obtain audio information of the music to be evaluated;
    以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;
    将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
  2. 根据权利要求1所述的音乐质量评价方法,所述以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱,具体包括:The method for evaluating music quality according to claim 1, wherein the converting the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition specifically includes:
    获取所述待评价音乐的音频信息的梅尔频率;Acquiring a Mel frequency of the audio information of the music to be evaluated;
    根据所述梅尔频率的图谱获取梅尔频率倒谱;Obtaining cepstrum of Mel frequency according to the map of Mel frequency;
    从所述梅尔频率倒谱中提取梅尔频率倒谱系数图。A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
  3. 根据权利要求1所述的音乐质量评价方法,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,具体包括:The music quality evaluation method according to claim 1, wherein the frequency map of audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, specifically include:
    获取所述音质评价模型的输出值;Obtaining an output value of the sound quality evaluation model;
    在评价列表中查找与所述输出值具有映射关系的评价指数。Find an evaluation index having a mapping relationship with the output value in the evaluation list.
  4. 根据权利要求1所述的音乐质量评价方法,当用户搜索目标音频时,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息之后,还包括:According to the music quality evaluation method according to claim 1, when a user searches for a target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the After the audio information evaluation information, it also includes:
    获取播放指令;Obtain playback instructions;
    根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较;Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;
    当所述待播放音频的评价指数大于或等于所述指数阈值时,播放所述待播放音频。When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
  5. 根据权利要求4所述的音乐质量评价方法,所述播放指令包括:待播放音频的关键词;根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较之后,还包括:The music quality evaluation method according to claim 4, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining the evaluation index of the audio to be played according to the playback instruction, and comparing with a preset index threshold, include:
    当所述待评价音乐的音频的评价指数小于所述指数阈值时,根据所述待播放音频的关键词在预设的数据库中查找与所述关键词匹配的音频信息;When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;
    显示所述音频信息。Displaying the audio information.
  6. 根据权利要求1~4任一项所述的音乐质量评价方法,所述音质评价模型的训练方法包括:The music quality evaluation method according to any one of claims 1 to 4, the training method of the sound quality evaluation model comprises:
    获取训练样本集,所述训练样本集包括从多段音质流畅的音频中提取的多张梅尔频率倒谱系数图;Acquiring a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality;
    由预设的所述卷积神经网络模型获取所述多张梅尔频率倒谱系数图的期望值;Obtaining an expected value of the plurality of Mel frequency cepstrum coefficient maps by using the preset convolutional neural network model;
    将所述训练样本集输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的激励值;Input the training sample set into the convolutional neural network model, and obtain an excitation value of the convolutional neural network model;
    比对所述期望值与所述激励值之间的距离是否小于或等于预设的第一阈值,并当所述期望值与所述激励值之间的距离大于所述第一阈值时,反复循环迭代的通过反向算法更新所述卷积神经网络模型中的权重,至所述期望值与所述激励值之间的距离小于或等于预设的第一阈值时结束。Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold, and when the distance between the expected value and the incentive value is greater than the first threshold, repeat the loop and iteration The weighting in the convolutional neural network model is updated by an inverse algorithm, and ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
  7. 根据权利要求6所述的音乐质量评价方法,所述由预设的所述卷积神经网络模型获取所述多张梅尔频率倒谱系数图的期望值,具体包括:The method for evaluating music quality according to claim 6, wherein the obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps by the preset convolutional neural network model specifically comprises:
    将所述多张梅尔频率倒谱系数图依次输入到预设的卷积神经网络模型中,分别获取所述多张梅尔频率倒谱系数图的输出值;Inputting the multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in turn, and respectively obtaining output values of the multiple Mel frequency cepstrum coefficient maps;
    以数值为限定条件对所述输出值进行排序;Sort the output values with a numerical value as a limiting condition;
    确认排序结果中处于中间位置的输出值为所述多张梅尔频率倒谱系数图的期望输出值。It is confirmed that the output value in the middle position in the ranking result is an expected output value of the multiple Mel frequency cepstrum coefficient graphs.
  8. 一种音乐质量评价装置,包括:A music quality evaluation device includes:
    获取模块,用于获取待评价音乐的音频信息;An acquisition module for acquiring audio information of the music to be evaluated;
    处理模块,用于以频率为限定条件将所述音频信息转化为频率图谱;A processing module, configured to convert the audio information into a frequency map with a frequency as a limiting condition;
    执行模块,用于将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。An execution module configured to input a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is pre-trained Convergent neural network model to convergence.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行一种音乐质量评价方法的下述步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to execute a method of evaluating music quality. Said steps:
    获取待评价音乐的音频信息;Obtain audio information of the music to be evaluated;
    以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;
    将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
  10. 根据权利要求9所述的计算机设备,所述以频率为限定条件将所述待评价音 乐的音频信息转化为频率图谱,具体包括:The computer device according to claim 9, wherein the converting the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition specifically comprises:
    获取所述待评价音乐的音频信息的梅尔频率;Acquiring a Mel frequency of the audio information of the music to be evaluated;
    根据所述梅尔频率的图谱获取梅尔频率倒谱;Obtaining cepstrum of Mel frequency according to the map of Mel frequency;
    从所述梅尔频率倒谱中提取梅尔频率倒谱系数图。A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
  11. 根据权利要求9所述的计算机设备,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,具体包括:The computer device according to claim 9, wherein the inputting the frequency map of the audio information of the music to be evaluated into a preset sound quality evaluation model to obtain the evaluation information of the audio information of the music to be evaluated specifically comprises:
    获取所述音质评价模型的输出值;Obtaining an output value of the sound quality evaluation model;
    在评价列表中查找与所述输出值具有映射关系的评价指数。Find an evaluation index having a mapping relationship with the output value in the evaluation list.
  12. 根据权利要求9所述的计算机设备,当用户搜索目标音频时,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息之后,还包括:The computer device according to claim 9, when the user searches for the target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the audio information of the music to be evaluated After the evaluation information, it also includes:
    获取播放指令;Obtain playback instructions;
    根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较;Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;
    当所述待播放音频的评价指数大于或等于所述指数阈值时,播放所述待播放音频。When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
  13. 根据权利要求12所述的计算机设备,所述播放指令包括:待播放音频的关键词;根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较之后,还包括:The computer device according to claim 12, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining the evaluation index of the audio to be played according to the playback instruction and comparing it with a preset index threshold, further comprising:
    当所述待评价音乐的音频的评价指数小于所述指数阈值时,根据所述待播放音频的关键词在预设的数据库中查找与所述关键词匹配的音频信息;When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;
    显示所述音频信息。Displaying the audio information.
  14. 根据权利要求9~12任一项所述的计算机设备,所述音质评价模型的训练方法包括:The computer device according to any one of claims 9 to 12, wherein the method for training the sound quality evaluation model comprises:
    获取训练样本集,所述训练样本集包括从多段音质流畅的音频中提取的多张梅尔频率倒谱系数图;Acquiring a training sample set, where the training sample set includes multiple Mel frequency cepstrum coefficient maps extracted from multiple pieces of audio with smooth sound quality;
    由预设的所述卷积神经网络模型获取所述多张梅尔频率倒谱系数图的期望值;Obtaining an expected value of the plurality of Mel frequency cepstrum coefficient maps by using the preset convolutional neural network model;
    将所述训练样本集输入到所述卷积神经网络模型中,获取所述卷积神经网络模型的激励值;Input the training sample set into the convolutional neural network model, and obtain an excitation value of the convolutional neural network model;
    比对所述期望值与所述激励值之间的距离是否小于或等于预设的第一阈值,并当所述期望值与所述激励值之间的距离大于所述第一阈值时,反复循环迭代的通过反向算法更新所述卷积神经网络模型中的权重,至所述期望值与所述激励值之间的距离小 于或等于预设的第一阈值时结束。Compare whether the distance between the expected value and the incentive value is less than or equal to a preset first threshold, and when the distance between the expected value and the incentive value is greater than the first threshold, repeat the loop and iteration The weighting in the convolutional neural network model is updated by an inverse algorithm, and ends when the distance between the expected value and the excitation value is less than or equal to a preset first threshold.
  15. 根据权利要求14所述的计算机设备,所述由预设的所述卷积神经网络模型获取所述多张梅尔频率倒谱系数图的期望值,具体包括:The computer device according to claim 14, wherein the obtaining the expected values of the plurality of Mel frequency cepstrum coefficient maps by the preset convolutional neural network model specifically comprises:
    将所述多张梅尔频率倒谱系数图依次输入到预设的卷积神经网络模型中,分别获取所述多张梅尔频率倒谱系数图的输出值;Inputting the multiple Mel frequency cepstrum coefficient maps into a preset convolutional neural network model in turn, and respectively obtaining output values of the multiple Mel frequency cepstrum coefficient maps;
    以数值为限定条件对所述输出值进行排序;Sort the output values with a numerical value as a limiting condition;
    确认排序结果中处于中间位置的输出值为所述多张梅尔频率倒谱系数图的期望输出值。It is confirmed that the output value in the middle position in the ranking result is an expected output value of the multiple Mel frequency cepstrum coefficient graphs.
  16. 一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行一种音乐质量评价方法的下述步骤:A non-volatile storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to perform the following steps of a method for evaluating music quality :
    获取待评价音乐的音频信息;Obtain audio information of the music to be evaluated;
    以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱;Convert the audio information of the music to be evaluated into a frequency map with the frequency as a limiting condition;
    将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,其中,所述音质评价模型为预先训练至收敛的卷积神经网络模型。Inputting a frequency map of audio information of the music to be evaluated into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated, wherein the sound quality evaluation model is a pre-trained to convolutional convolution Neural network model.
  17. 根据权利要求16所述的非易失性存储介质,所述以频率为限定条件将所述待评价音乐的音频信息转化为频率图谱,具体包括:The non-volatile storage medium according to claim 16, wherein the converting the audio information of the music to be evaluated into a frequency map with frequency as a limiting condition, specifically comprising:
    获取所述待评价音乐的音频信息的梅尔频率;Acquiring a Mel frequency of the audio information of the music to be evaluated;
    根据所述梅尔频率的图谱获取梅尔频率倒谱;Obtaining cepstrum of Mel frequency according to the map of Mel frequency;
    从所述梅尔频率倒谱中提取梅尔频率倒谱系数图。A Mel frequency cepstrum coefficient map is extracted from the Mel frequency cepstrum.
  18. 根据权利要求16所述的非易失性存储介质,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息,具体包括:The non-volatile storage medium according to claim 16, wherein the frequency map of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain evaluation information of the audio information of the music to be evaluated , Including:
    获取所述音质评价模型的输出值;Obtaining an output value of the sound quality evaluation model;
    在评价列表中查找与所述输出值具有映射关系的评价指数。Find an evaluation index having a mapping relationship with the output value in the evaluation list.
  19. 根据权利要求16所述的非易失性存储介质,当用户搜索目标音频时,所述将所述待评价音乐的音频信息的频率图谱输入到预设的音质评价模型中,得到所述待评价音乐的音频信息的评价信息之后,还包括:The non-volatile storage medium according to claim 16, when the user searches for a target audio, the frequency spectrum of the audio information of the music to be evaluated is input into a preset sound quality evaluation model to obtain the to-be-evaluated After the evaluation information of the audio information of the music, it also includes:
    获取播放指令;Obtain playback instructions;
    根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较;Obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing the evaluation index with a preset index threshold;
    当所述待播放音频的评价指数大于或等于所述指数阈值时,播放所述待播放音频。When the evaluation index of the audio to be played is greater than or equal to the index threshold, the audio to be played is played.
  20. 根据权利要求19所述的非易失性存储介质,所述播放指令包括:待播放音频的关键词;根据所述播放指令获取待播放音频的评价指数,并与预设的指数阈值进行比较之后,还包括:The non-volatile storage medium according to claim 19, wherein the playback instruction comprises: a keyword of the audio to be played; after obtaining an evaluation index of the audio to be played according to the playback instruction, and comparing with a preset index threshold value ,Also includes:
    当所述待评价音乐的音频的评价指数小于所述指数阈值时,根据所述待播放音频的关键词在预设的数据库中查找与所述关键词匹配的音频信息;When the evaluation index of the audio of the music to be evaluated is less than the index threshold, searching for audio information matching the keywords in a preset database according to the keywords of the audio to be played;
    显示所述音频信息。Displaying the audio information.
PCT/CN2018/125449 2018-08-02 2018-12-29 Music quality evaluation method and apparatus, and computer device and storage medium WO2020024556A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810873498.0A CN109308913A (en) 2018-08-02 2018-08-02 Sound quality evaluation method, device, computer equipment and storage medium
CN201810873498.0 2018-08-02

Publications (1)

Publication Number Publication Date
WO2020024556A1 true WO2020024556A1 (en) 2020-02-06

Family

ID=65226059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125449 WO2020024556A1 (en) 2018-08-02 2018-12-29 Music quality evaluation method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109308913A (en)
WO (1) WO2020024556A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961802B (en) * 2019-03-26 2021-05-18 北京达佳互联信息技术有限公司 Sound quality comparison method, device, electronic equipment and storage medium
CN110197447B (en) * 2019-04-17 2022-09-30 哈尔滨沥海佳源科技发展有限公司 Communication index based online education method and device, electronic equipment and storage medium
CN110189771A (en) 2019-05-31 2019-08-30 腾讯音乐娱乐科技(深圳)有限公司 With the sound quality detection method, device and storage medium of source audio
CN110322894B (en) * 2019-06-27 2022-02-11 电子科技大学 Sound-based oscillogram generation and panda detection method
CN110675879B (en) * 2019-09-04 2023-06-23 平安科技(深圳)有限公司 Audio evaluation method, system, equipment and storage medium based on big data
CN110728966B (en) * 2019-09-12 2023-05-23 上海麦克风文化传媒有限公司 Audio album content quality evaluation method and system
CN112559794A (en) * 2019-09-25 2021-03-26 北京达佳互联信息技术有限公司 Song quality identification method, device, equipment and storage medium
CN110909202A (en) * 2019-10-28 2020-03-24 广州荔支网络技术有限公司 Audio value evaluation method and device and readable storage medium
CN111161759B (en) * 2019-12-09 2022-12-06 科大讯飞股份有限公司 Audio quality evaluation method and device, electronic equipment and computer storage medium
CN113593607A (en) * 2020-04-30 2021-11-02 北京破壁者科技有限公司 Audio processing method and device and electronic equipment
CN111768801A (en) * 2020-06-12 2020-10-13 瑞声科技(新加坡)有限公司 Airflow noise eliminating method and device, computer equipment and storage medium
CN114171062A (en) * 2020-09-10 2022-03-11 安克创新科技股份有限公司 Sound quality evaluation method, device and computer storage medium
CN112017986A (en) * 2020-10-21 2020-12-01 季华实验室 Semiconductor product defect detection method and device, electronic equipment and storage medium
CN112634928B (en) * 2020-12-08 2023-09-29 北京有竹居网络技术有限公司 Sound signal processing method and device and electronic equipment
CN113077815B (en) * 2021-03-29 2024-05-14 腾讯音乐娱乐科技(深圳)有限公司 Audio evaluation method and assembly
CN113192536B (en) * 2021-04-28 2023-07-28 北京达佳互联信息技术有限公司 Training method of voice quality detection model, voice quality detection method and device
CN113436644B (en) * 2021-07-16 2023-09-01 北京达佳互联信息技术有限公司 Sound quality evaluation method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0722164A1 (en) * 1995-01-10 1996-07-17 AT&T Corp. Method and apparatus for characterizing an input signal
CN104581758A (en) * 2013-10-25 2015-04-29 中国移动通信集团广东有限公司 Voice quality estimation method and device as well as electronic equipment
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system
CN106531190A (en) * 2016-10-12 2017-03-22 科大讯飞股份有限公司 Speech quality evaluation method and device
CN108206027A (en) * 2016-12-20 2018-06-26 北京酷我科技有限公司 A kind of audio quality evaluation method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106816158B (en) * 2015-11-30 2020-08-07 华为技术有限公司 Voice quality assessment method, device and equipment
CN106558308B (en) * 2016-12-02 2020-05-15 深圳撒哈拉数据科技有限公司 Internet audio data quality automatic scoring system and method
CN106919662B (en) * 2017-02-14 2021-08-31 复旦大学 Music identification method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0722164A1 (en) * 1995-01-10 1996-07-17 AT&T Corp. Method and apparatus for characterizing an input signal
CN104581758A (en) * 2013-10-25 2015-04-29 中国移动通信集团广东有限公司 Voice quality estimation method and device as well as electronic equipment
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system
CN106531190A (en) * 2016-10-12 2017-03-22 科大讯飞股份有限公司 Speech quality evaluation method and device
CN108206027A (en) * 2016-12-20 2018-06-26 北京酷我科技有限公司 A kind of audio quality evaluation method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111488485B (en) * 2020-04-16 2023-11-17 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device

Also Published As

Publication number Publication date
CN109308913A (en) 2019-02-05

Similar Documents

Publication Publication Date Title
WO2020024556A1 (en) Music quality evaluation method and apparatus, and computer device and storage medium
US11875807B2 (en) Deep learning-based audio equalization
EP2612261B1 (en) Internet search related methods and apparatus
US8666963B2 (en) Method and apparatus for processing spoken search queries
US8990182B2 (en) Methods and apparatus for searching the Internet
US9679257B2 (en) Method and apparatus for adapting a context model at least partially based upon a context-related search criterion
Yang et al. Revisiting the problem of audio-based hit song prediction using convolutional neural networks
US20120060113A1 (en) Methods and apparatus for displaying content
US20120059658A1 (en) Methods and apparatus for performing an internet search
US20140201276A1 (en) Accumulation of real-time crowd sourced data for inferring metadata about entities
CN107705805B (en) Audio duplicate checking method and device
CN114443891B (en) Encoder generation method, fingerprint extraction method, medium, and electronic device
CN113257283B (en) Audio signal processing method and device, electronic equipment and storage medium
CN110287788A (en) A kind of video classification methods and device
Yang et al. Semi-supervised feature selection for audio classification based on constraint compensated Laplacian score
CN109360072B (en) Insurance product recommendation method and device, computer equipment and storage medium
CN111460215B (en) Audio data processing method and device, computer equipment and storage medium
CN111859008A (en) Music recommending method and terminal
CN115116469A (en) Feature representation extraction method, feature representation extraction device, feature representation extraction apparatus, feature representation extraction medium, and program product
CN114023289A (en) Music identification method and device and training method and device of music feature extraction model
CN113987258A (en) Audio identification method and device, readable medium and electronic equipment
WO2023160515A1 (en) Video processing method and apparatus, device and medium
Ramli et al. Bio-acoustic signal identification based on sparse representation classifier
CN114722234A (en) Music recommendation method, device and storage medium based on artificial intelligence
CN115881067A (en) Music genre classification method, system and medium based on Resnet101

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928210

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928210

Country of ref document: EP

Kind code of ref document: A1