CN113808618A - Audio quality evaluation method and device, equipment, medium and product thereof - Google Patents

Audio quality evaluation method and device, equipment, medium and product thereof Download PDF

Info

Publication number
CN113808618A
CN113808618A CN202111040485.3A CN202111040485A CN113808618A CN 113808618 A CN113808618 A CN 113808618A CN 202111040485 A CN202111040485 A CN 202111040485A CN 113808618 A CN113808618 A CN 113808618A
Authority
CN
China
Prior art keywords
data
frequency
audio data
audio
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111040485.3A
Other languages
Chinese (zh)
Other versions
CN113808618B (en
Inventor
张金华
黄裕佳
张舒婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyinlian Software Technology Co ltd
Original Assignee
Guangzhou Shiyinlian Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyinlian Software Technology Co ltd filed Critical Guangzhou Shiyinlian Software Technology Co ltd
Priority to CN202111040485.3A priority Critical patent/CN113808618B/en
Publication of CN113808618A publication Critical patent/CN113808618A/en
Application granted granted Critical
Publication of CN113808618B publication Critical patent/CN113808618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Abstract

The application relates to the field of audio processing, and discloses an audio quality evaluation method and a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: acquiring audio data, wherein the audio data is sampled by a preset sampling bit number; determining a cut-off frequency of the audio data according to the power spectral density of the audio data; determining the sampling precision grade of the audio data according to the data distribution characteristics of the low-order data corresponding to the low-order part of the preset sampling digit in the audio data in the time domain of the audio data; determining evaluation information of the audio data according to a preset evaluation rule, wherein the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency. According to the method and the device, the tone quality of the audio data is analyzed and evaluated from the time domain and the frequency domain, and the evaluation information corresponding to the tone quality is determined by synthesizing the analysis results of the time domain and the frequency domain, so that the tone quality evaluation is more accurate and efficient.

Description

Audio quality evaluation method and device, equipment, medium and product thereof
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to an audio quality assessment method and a corresponding apparatus, computer device, computer-readable storage medium, and computer program product.
Background
With the development of internet online music services, song works are more and more abundant, but the quality is not uniform. With the increasing requirements of people on the tone quality of music works and the need of maintaining a music library to achieve the removal of false and truthfulness, it becomes more necessary to detect the tone quality required for constructing the music library or creating the music works.
In the prior art, a method for performing quality evaluation on musical compositions such as songs mostly depends on analyzing power spectral density data of audio data of the musical compositions, the power spectral density is a physical quantity representing a relationship between power energy and frequency of a signal, and the tone quality information of the audio data is evaluated by performing frequency domain analysis on the power spectral density data, so as to determine the tone quality of the musical compositions.
In the prior art, the tone quality of a musical composition is only analyzed in a frequency domain, the performance of audio data in a time domain is ignored, so that the audio data with poor tone quality cannot be effectively analyzed, the evaluation effect is influenced, and the normal operation of related services is also influenced.
Disclosure of Invention
A primary object of the present application is to solve at least one of the above problems and provide an audio quality assessment method and corresponding apparatus, computer device, computer readable storage medium, and computer program product, so as to facilitate music creation.
In order to meet various purposes of the application, the following technical scheme is adopted in the application:
an audio quality assessment method adapted to one of the objects of the present application is provided, including the steps of:
acquiring audio data, wherein the audio data is sampled by a preset sampling bit number;
determining a cut-off frequency of the audio data according to the power spectral density of the audio data;
determining the sampling precision grade of the audio data according to the data distribution characteristics of the low-order data corresponding to the low-order part of the preset sampling digit in the audio data in the time domain of the audio data;
determining evaluation information of the audio data according to a preset evaluation rule, wherein the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency.
In a deepened embodiment, the method for obtaining audio data, which is sampled by a predetermined sampling bit number, includes the following steps:
acquiring a local lossless audio file submitted by a user or a lossless audio file in a music library specified by the user;
and converting the lossless audio file into audio data in a pulse code modulation format by using a preset sampling bit number.
In a further embodiment, determining the cut-off frequency of the audio data based on the power spectral density of the audio data comprises the steps of:
converting metadata corresponding to the spectrogram according to the audio data;
converting the metadata into first power spectral density data, determining a first candidate cutoff frequency from the first power spectral density data;
converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data;
and selecting the minimum one of the first candidate cutoff frequency and the second candidate cutoff frequency as the cutoff frequency.
In an embodiment, the method further comprises the steps of converting the metadata into first power spectral density data, determining a first candidate cutoff frequency from the first power spectral density data, and:
determining a first frequency sequence corresponding to the first power spectral density data according to the metadata, wherein the first frequency sequence comprises a total power value corresponding to a plurality of frequencies, and the total power value is the sum of a plurality of power values distributed along the audio data time domain along the corresponding frequency;
and determining the frequency corresponding to the maximum change of the slope of the curve as a first candidate cut-off frequency according to the smooth curve data synthesized by the total power values in the first frequency sequence.
In an embodiment, the binarizing the metadata to convert the metadata into second power spectral density data, and determining a second candidate cut-off frequency according to the second power spectral density data includes the following steps:
carrying out binarization conversion on the metadata to obtain a binarization data sequence;
determining a second frequency sequence corresponding to second power spectrum density data according to the binarization data sequence, wherein the second frequency sequence comprises binarization accumulated values corresponding to a plurality of frequencies, and the binarization accumulated values are the accumulated sum of a plurality of binarization data distributed along the audio data time domain along the corresponding frequencies;
and determining the frequency corresponding to the maximum change of the slope of the curve as a second candidate cut-off frequency according to the smooth curve data synthesized by each binary accumulated value in the second frequency sequence.
In a further embodiment, determining a sampling precision level of the audio data according to a data distribution characteristic of lower data corresponding to a lower part of the predetermined sampling number of bits in the audio data in a time domain of the audio data includes:
converting the audio data into absolute value form;
acquiring low-order data corresponding to a low-order part in the audio data, clustering all the low-order data into a plurality of numerical tags corresponding to classifications, wherein the bit length of the low-order part is a preset value;
counting the frequency of the numerical value label in the audio data to form a frequency data sequence;
and identifying the data distribution characteristics of the frequency data sequence, and judging the corresponding sampling precision level according to the data distribution characteristics.
In an embodiment, the step of identifying a data distribution characteristic of the frequency data sequence and determining a corresponding sampling precision level according to the data distribution characteristic includes:
and when the data distribution characteristics show the characteristics of all-zero data, the characteristics decrease along the sequence, the characteristics increase along the sequence or the characteristics with the frequency of odd and even numbers alternating up and down in the sequence, judging that the sampling precision level of the audio data is a level lower than the preset sampling digit number.
In a preferred embodiment, the predetermined value of the lower portion is lower 8 bits.
In a further embodiment, determining the evaluation information of the audio data according to a preset evaluation rule, where the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency, includes:
determining reference scores corresponding to two adjacent sampling frequencies of the cut-off frequency under the sampling precision level;
applying an evaluation rule, and accumulating a normalized difference value between two benchmark scores by taking the benchmark score corresponding to the lower sampling frequency as an initial score to determine a quantitative score as the evaluation information;
and outputting the evaluation information.
An audio quality evaluation apparatus adapted to one of the objects of the present application includes: the device comprises a data acquisition module, a frequency domain analysis module, a time domain analysis module and a comprehensive evaluation module, wherein the data acquisition module is used for acquiring audio data, and the audio data is formed by sampling with a preset sampling digit; the frequency domain analysis module is used for determining the cut-off frequency of the audio data according to the power spectral density of the audio data; the time domain analysis module is used for determining the sampling precision level of the audio data according to the data distribution characteristics of the low-order data corresponding to the low-order part of the preset sampling digit in the audio data in the time domain of the audio data; the comprehensive evaluation module is used for determining evaluation information of the audio data according to a preset evaluation rule, and the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency.
In a further embodiment, the data acquisition module comprises: the file acquisition submodule is used for acquiring a local lossless audio file submitted by a user or a lossless audio file in a music library specified by the user; and the sampling conversion sub-module is used for converting the lossless audio file into audio data in a pulse code modulation format by using a preset sampling bit number.
In a further embodiment, the frequency domain analysis module comprises: the frequency domain conversion sub-module is used for converting metadata corresponding to the spectrogram according to the audio data; a first frequency sub-module, configured to convert the metadata into first power spectral density data, and determine a first candidate cutoff frequency according to the first power spectral density data; the second frequency sub-module is used for converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data; and the frequency preference submodule is used for selecting the minimum one of the first candidate cut-off frequency and the second candidate cut-off frequency as the cut-off frequency.
In a specific embodiment, the first frequency sub-module includes: a power vector unit, configured to determine, according to the metadata, a first frequency sequence corresponding to the first power spectral density data, where the first frequency sequence includes a total power value corresponding to a plurality of frequencies, and the total power value is a sum of a plurality of power values distributed along a time domain of the audio data along the corresponding frequency; and the power calculation unit is used for determining the frequency corresponding to the maximum change of the slope of the curve as the first candidate cut-off frequency according to the smooth curve data which is synthesized by the total power values in the first frequency sequence.
In a specific embodiment, the second frequency sub-module comprises: a binary conversion unit, configured to perform binary conversion on the metadata to obtain a binary data sequence; a binary vector unit, configured to determine, according to the binary data sequence, a second frequency sequence corresponding to second power spectral density data, where the second frequency sequence includes a binary accumulated value corresponding to multiple frequencies, and the binary accumulated value is an accumulated sum of multiple binary data distributed along a time domain of audio data along a frequency corresponding to the binary accumulated value; and the binary calculation unit is used for determining the frequency corresponding to the maximum change of the slope of the curve as the second candidate cut-off frequency according to the smooth curve data synthesized by the binary accumulated values in the second frequency sequence.
In a further embodiment, the time domain analysis module comprises: the coordinate conversion submodule is used for converting the audio data into an absolute value form; the low-order extraction submodule is used for acquiring low-order data corresponding to a low-order part in the audio data, clustering all the low-order data into a plurality of numerical value labels corresponding to the classification, and the bit length of the low-order part is a preset value; the frequency counting submodule is used for counting the frequency of the numerical value label in the audio data to form a frequency data sequence; and the precision judging submodule is used for identifying the data distribution characteristics of the frequency data sequence and judging the corresponding sampling precision grade according to the data distribution characteristics.
In a specific embodiment, the accuracy determination sub-module is configured to adjust the sampling accuracy level by any one of: and when the data distribution characteristics show the characteristics of all-zero data, the characteristics decrease along the sequence, the characteristics increase along the sequence or the characteristics with the frequency of odd and even numbers alternating up and down in the sequence, judging that the sampling precision level of the audio data is a level lower than the preset sampling digit number.
In a preferred embodiment, the predetermined value of the lower portion is lower 8 bits.
In a further embodiment, the comprehensive evaluation module comprises: the reference retrieval submodule is used for determining reference scores corresponding to two adjacent sampling frequencies of the cut-off frequency under the sampling precision level; the evaluation quantization submodule is used for applying an evaluation rule, and determining a quantization score by accumulating a normalized difference value between two reference scores by taking the reference score corresponding to the lower sampling frequency as an initial score to serve as the evaluation information; and the information output submodule is used for outputting the evaluation information.
A computer device adapted to one of the objects of the present application is provided, comprising a central processing unit and a memory, the central processing unit being adapted to invoke execution of a computer program stored in the memory to perform the steps of the audio quality assessment method described herein.
A computer-readable storage medium, which stores in the form of computer-readable instructions a computer program implemented according to the audio quality assessment method described, which, when invoked by a computer, performs the steps comprised by the method.
A computer program product, provided to adapt another object of the present application, comprises computer programs/instructions which, when executed by a processor, implement the steps of the audio quality assessment method described in any of the embodiments of the present application.
Compared with the prior art, the application has the following advantages:
firstly, according to the method and the device, on one hand, frequency domain analysis of the audio data is realized by using the power spectral density data to determine the cut-off frequency of the audio data, on the other hand, time domain analysis of the audio data is realized by using low-bit data corresponding to a low-bit part in the sampling bit number of the audio data to determine the actual sampling precision grade of the audio data, and finally, evaluation information is determined according to the cut-off frequency and the sampling precision grade according to a preset evaluation rule, quality factors of the audio data in two dimensions of the frequency domain and the time domain are comprehensively considered, more accurate evaluation information can be provided for the sound quality evaluation of the audio data, and the reliability of the sound quality evaluation result of the audio data can be improved.
Secondly, the method and the device are suitable for a preset evaluation rule, and the tone quality of the audio data is converted into evaluation information according to the corresponding relation between the cut-off frequency obtained by frequency domain analysis and the sampling precision grade obtained by time domain analysis, so that the normalized representation of tone quality evaluation is realized, tone quality evaluation standards are conveniently unified, and the efficiency of marking, screening and processing the audio data according to the evaluation information is improved.
In addition, the evaluation information of the normalized representation has readability and is easy to understand, and user experience in the human-computer interaction process is improved.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram of an exemplary embodiment of an audio quality assessment method of the present application;
FIG. 2 is a detailed flow chart illustrating the determination of a cutoff frequency for audio data according to an embodiment of the present application;
fig. 3 is an exemplary reference graph cited in calculating a cutoff frequency according to an embodiment of the present application, wherein a is a spectrogram made from exemplary audio data, b is a spectrogram made from first power spectral density data transformed from the spectrogram, and c is a schematic representation of a slope calculated from a smooth curve fitted to the first power spectral density data;
FIG. 4 is a schematic flow chart illustrating the determination of candidate cut-off frequencies based on binarized data of audio data according to an embodiment of the present application;
FIG. 5 is a detailed flow chart illustrating the determination of the sampling accuracy level for audio data according to an embodiment of the present application;
FIG. 6 is an exemplary reference diagram referenced in calculating a sampling accuracy level according to an embodiment of the present application, where a is a spectrogram of exemplary audio data; b is a spectrogram expressed after absolute value of metadata corresponding to the spectrogram; c is a spectrogram constructed from low-level data of a single channel in the audio data; d is a histogram expressed by frequency statistics of the numerical value labels corresponding to the low-order data;
fig. 7 is a schematic specific flowchart for determining evaluation information according to an embodiment of the present application;
fig. 8 is a diagram illustrating various notification interface effects generated during a human-computer interaction process performed on a graphical user interface of a client for sound quality assessment, where a is used to indicate that a sound quality is in a sound quality identification state, and b and c are respectively used to display different evaluation information in an embodiment of the present application;
fig. 9 is a functional block diagram of an exemplary embodiment of an audio quality estimation apparatus of the present application;
fig. 10 is a schematic structural diagram of a computer device used in the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As will be appreciated by those skilled in the art, "client," "terminal," and "terminal device" as used herein include both devices that are wireless signal receivers, which are devices having only wireless signal receivers without transmit capability, and devices that are receive and transmit hardware, which have receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.
The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.
It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.
One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.
The neural network models referenced or potentially referenced in this application, unless specified in the clear, may be deployed either on a remote server and remotely invoked at the client, or directly invoked at the device-capable client. Those skilled in the art will appreciate that the device can be used as a model training device and a model operating device corresponding to the neural network model as long as the device operating resources are suitable. In some embodiments, when the client-side hardware execution system runs on the client-side, the corresponding intelligence of the client-side hardware execution system can be obtained through migration learning, so that the requirement on the hardware execution resources of the client-side is reduced, and the excessive occupation of the hardware execution resources of the client-side is avoided.
Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.
The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.
The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.
The audio quality evaluation method can be programmed into a computer program product and is realized by being deployed in terminal equipment and/or a server for operation, so that a client can access an open user interface after the computer program product is operated in a webpage program or application program mode to realize man-machine interaction.
Referring to fig. 1, in an exemplary embodiment, the method includes the steps of:
step S1100, obtaining audio data, wherein the audio data is sampled by a preset sampling bit number:
the audio data of the tone quality to be detected can be either audio data created by a user or pre-stored audio data. The audio data may be data stored locally by the client device or may belong to a library of songs stored by a remote server. In an exemplary application scenario of the application, the audio data stores music data corresponding to songs and music, and may be simple accompaniment data or song data carrying a vocal singing part; either purely instrumental performance data or multi-channel reverberation data.
The audio data is typically in a raw format as a lossless audio file, typically stored as a music file. To facilitate the present application to perform the digitization processing, the audio data is converted into a PCM format, i.e., a pulse code modulation format, and when the PCM format is converted, the original audio data is sampled according to a sampling bit number corresponding to a preset sampling precision level to obtain the audio data in the PCM format. The sampling precision grades and different sampling precision grades indicate the quality obtained by sampling with different sampling precision, the sampling precision of the audio data is correspondingly expressed according to the word length of different digits, the commonly used sampling precision grades are 8 bits, 16 bits, 24 bits, 32 bits and the like, the method is suitable for the storage principle of a computer, and the corresponding sampling precision grades are basically determined by taking bytes, namely 8-bit word length as a basic unit. In the exemplary application of the present application, it is preset to sample the original audio data with a sampling precision of 24 bits and convert it into audio data in PCM format, and those skilled in the art can flexibly implement the sampling precision.
Step S1200, determining a cut-off frequency of the audio data according to the power spectral density of the audio data:
the Power Spectral Density (PSD), called Power spectrum for short, is a physical quantity that characterizes the relationship between the Power energy and the frequency of a signal, and is often used to study stochastic vibration signals, and the PSD is usually normalized according to the frequency resolution. The power spectral density is defined as the signal power within a unit frequency band. It shows the variation of signal power with frequency, i.e. the distribution of signal power in frequency domain.
The data corresponding to the power spectral density can be obtained by converting the spectral information of the audio data in the PCM format, the spectral information provides metadata required for calculating the power spectral density, a spectrogram can be directly constructed according to the metadata, the metadata comprises power values corresponding to various frequencies counted along the time domain of the audio data, when the metadata is converted into the spectrogram, a two-dimensional coordinate system can be established by taking the time domain and the frequency domain as dimensions, and different power values can be represented by color saturation or color gradient relation for the power values corresponding to different frequencies at the same moment. The metadata can be further processed and converted into various data, including first power spectral density data required for representing the power spectrogram, second power spectral density data converted on the basis of metadata binarization and the like. Theoretically, the various data converted by the method can effectively characterize the relative relationship between different frequencies of the audio data and the power value thereof to some extent. As for the bandwidth occupied by dividing each frequency, it can be determined according to the sampling frequency, or determined by other methods, and those skilled in the art can flexibly determine.
According to the characteristics presented by the data corresponding to the power spectral density of the audio data, corresponding to the power spectrogram, the power energy distribution of each frequency has a descending characteristic at a certain frequency, so that the power energy of partial frequency points is concentrated below the certain frequency. The frequency to which the location of the falling feature corresponds may be referred to as the cut-off frequency of the audio data. The cut-off frequency is accurately calculated, and the method can play a very important role in scenes such as the spread spectrum of audio data, the elimination of clipping noise, voice recognition and the like.
Therefore, according to the characteristics of the power spectral density, the corresponding cut-off frequency in the audio data can be determined by examining the law of the power spectral density data presentation and applying an algorithm well known to those skilled in the art in combination with the characteristics of the cut-off frequency. In addition, the algorithm that will be disclosed in the embodiments immediately following the present application can also be used to determine the cut-off frequency of the audio data from the power spectral density, which is not shown here.
The cutoff frequencies determined according to the different types of power spectral density data converted from metadata in different ways are not necessarily completely consistent, and one of the cutoff frequencies may be determined by one of the methods, or candidate cutoff frequencies may be determined in a plurality of ways, and then one of the candidate cutoff frequencies is preferentially determined as the final cutoff frequency. In general, it is appropriate to adopt a candidate cutoff frequency having a smaller frequency value as the finally determined cutoff frequency.
In this embodiment, the cut-off frequency determined by the present application is used for sound quality evaluation, and the sound quality evaluation result itself can be applied to various application scenarios, such as sound quality evaluation of audio files in a music library or sound quality evaluation of individual audio files, and in such scenarios, the cut-off frequency can play a role of a judgment basis. For example, 44.1kHz/16bit can be defined as the lowest standard for the quality of audio data, and in conjunction with the Nyquist theorem, the highest frequency that can be achieved for 44.1kHz audio is 22.05 kHz. Most of the actual audio data may be around 20 kHz. This embodiment can use this 20kHz as a reference threshold, and by detecting the cut-off frequency of the lossless audio file with the parameter higher than 44.1kHz/16bit in the song library, if the cut-off frequency is smaller than the threshold given above, the audio is considered to be unable to meet the expected sound quality standard, so that the benchmark score for determining the evaluation information can be correspondingly given or the song library can be optimized accordingly for the purpose of removing false positives and the like, and in short, once the cut-off frequency is determined, the audio data can be used for the sound quality evaluation.
Step S1300, determining a sampling precision level of the audio data according to a data distribution characteristic of low-order data corresponding to the low-order part of the predetermined sampling number of bits in the audio data in a time domain of the audio data:
as mentioned above, the audio data is first sampled by a predetermined number of sampling bits and converted into PCM format, and for example, the audio data is expressed by a 24-bit word length including three bytes. In consideration of the relation of conversion corresponding formats with different sampling precision, the application determines the low-order data corresponding to the last word length of the low-order bits to examine and judge the sampling precision level. For example, the present application exemplarily takes 8 bits of single byte length as a unit, and takes the lower 8 bits of each data in the corresponding data sequence of the audio data as the lower data for examination. Of course, the word length of the lower part is determined by the diversity of sampling precision levels and the investigation precision, and can be flexibly adjusted by a person skilled in the art as required.
It can be understood by combining the principle related to the digital processing of audio data that, assuming that the original audio data is expected to be sampled by the predetermined sampling bit number, the expected sampling precision should also be 24 bits in this example, in this case, if the original audio data is sampled by a number lower than the predetermined sampling bit number or the sampling precision is lost after being processed by the audio editing software, the lower data will inevitably present certain rule information, and these rule information are reflected on the lower data sequence to present a data distribution characteristic, so that it can be determined that the audio data is sampled by a sampling precision lower than the expected sampling precision, and therefore, the original audio data can be determined as a lower sampling precision level lower than the 24-bit sampling precision level, that is, a 16-bit sampling precision level. Of course, when no relevant rule information is found, the sampling precision level of the original audio data may be determined to be the desired sampling precision level.
In an alternative embodiment, using this principle, determining that the word length of the lower portion is 16 bits can further determine whether the sampling precision level of the original audio data is lower.
In a further alternative embodiment, the predetermined number of sampling bits at the time of the first determination may be set to the highest sampling precision, for example, a word length of 32 bits, and then the foregoing principle is applied to determine the level of sampling precision of the original audio data from top to bottom in stages.
According to the above principle, it can be seen that, when determining the sampling precision level of the audio data according to the data distribution characteristics of the low-bit data corresponding to the low-bit part of the predetermined sampling bit number in the audio data in the time domain of the audio data, specifically, the sampling precision level corresponding to the sound quality of the audio data can be determined by analyzing and determining the comparison result between the low-bit data and the total data in the audio data, and searching the low-bit data in the total data and finding the data distribution characteristics.
Step S1400, determining evaluation information of the audio data according to a preset evaluation rule, where the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency:
in order to provide the corresponding evaluation information of the original audio data according to the determined cut-off frequency and the sampling precision level, a preset evaluation rule is set for evaluation. The evaluation rule may determine the evaluation information in a manner of performing simultaneous determination according to the cut-off frequency and the sampling accuracy level, or in a manner of performing fusion determination. The person skilled in the art knows numerous known ways of quantifying with two parameters to give corresponding evaluation information with a quantifying effect and can therefore carry out this accordingly by the person skilled in the art. In addition, in the embodiments to be disclosed later in this application, an implementation concept with more innovative features will be provided, which is not shown here.
In the present application, the step S1200 of determining the cut-off frequency of the audio data and the step S1300 of determining the sampling precision level of the audio data may be implemented synchronously, and they are not dependent on each other, and those skilled in the art should understand this variation.
According to the disclosure of the exemplary embodiment of the audio quality evaluation method of the present application, it can be seen that the following advantages are provided:
firstly, according to the method and the device, on one hand, frequency domain analysis of the audio data is realized by using the power spectral density data to determine the cut-off frequency of the audio data, on the other hand, time domain analysis of the audio data is realized by using low-bit data corresponding to a low-bit part in the sampling bit number of the audio data to determine the actual sampling precision grade of the audio data, and finally, evaluation information is determined according to the cut-off frequency and the sampling precision grade according to a preset evaluation rule, quality factors of the audio data in two dimensions of the frequency domain and the time domain are comprehensively considered, more accurate evaluation information can be provided for the sound quality evaluation of the audio data, and the reliability of the sound quality evaluation result of the audio data can be improved.
Secondly, the method and the device are suitable for a preset evaluation rule, and the tone quality of the audio data is converted into evaluation information according to the corresponding relation between the cut-off frequency obtained by frequency domain analysis and the sampling precision grade obtained by time domain analysis, so that the normalized representation of tone quality evaluation is realized, tone quality evaluation standards are conveniently unified, and the efficiency of marking, screening and processing the audio data according to the evaluation information is improved.
In addition, the evaluation information of the normalized representation has readability and is easy to understand, and user experience in the human-computer interaction process is improved.
In a deepened embodiment, the step S1100 of obtaining audio data, the audio data being sampled by a predetermined sampling bit number, includes the following steps:
step S1110, acquiring a local lossless audio file submitted by the user or a lossless audio file in the music library specified by the user:
in this embodiment, the audio file may be specified by the client user, and the audio file is preferably a lossless audio file in the Flac format. In an alternative embodiment, the corresponding lossless audio file is obtained by providing a guide interface in the client that allows the user to import his local file. In another alternative embodiment, a list interface is provided in the client to list a plurality of songs stored in the song library in the cloud, and after a user selects one song, a lossless audio file corresponding to the song is acquired correspondingly.
Step S1120, converting the lossless audio file into audio data in a pulse code modulation format with a predetermined number of sampling bits:
in this embodiment, the predetermined sampling bit number is set to 24 bits, so that the desired sampling precision in the foregoing embodiment is determined, and the predetermined sampling bit number of 24 bits is an empirical setting set according to the actual audio sampling precision, which can be flexibly adjusted by those skilled in the art.
The lossless audio file is transcoded into a pulse code modulation format, i.e., PCM format, at predetermined sample positions in order to perform the subsequent processing of the present application.
According to the embodiment, the client guides the user to submit or designate the lossless audio file so that the user can evaluate the designated target audio data, and the audio quality evaluation interface realized by the method is substantially opened for the user, so that the user can more conveniently call the interface realized by the method for evaluating the tone quality of the local or designated audio data, the subsequent deep interaction, such as music auxiliary creation realization, is facilitated, and the user experience is improved.
In a further embodiment, as shown in fig. 2, the step S1200 of determining the cutoff frequency of the audio data according to the power spectral density of the audio data includes the following steps:
step S1210, converting metadata corresponding to the spectrogram according to the audio data:
metadata corresponding to the power spectral density is converted from the audio data, and specifically, the power energy, i.e., the power value, corresponding to each frequency of the audio data may be counted along the time domain according to the audio data. Referring to the spectrogram in fig. 3(a), the metadata is a vector sequence distributed along the time domain, and a vector in the vector sequence includes a frequency corresponding to the vector and a power value corresponding to the frequency at a corresponding time.
In an alternative embodiment, to improve the computational efficiency, the audio data may be pre-processed in advance and converted into a floating point number between 0 and 1, so as to facilitate the conversion of the metadata.
Step S1220, converting the metadata into first power spectral density data, and determining a first candidate cutoff frequency according to the first power spectral density data:
as described above, the metadata sequence corresponding to the spectrogram, when converted into power spectral density data, represents the accumulation result of the power values distributed in the time domain corresponding to the respective frequencies, i.e. the total power value, and the power spectral density data determined thereby is the first power spectral density data, and is referred to the power spectrogram shown in fig. 3(b), which includes the total power value corresponding to the respective frequencies, where the total power value is the sum of the power values distributed in the time domain of the respective frequencies. Thus, the frequency corresponding to the characteristic of the total power drop is determined from the first power spectral density data, and the frequency is determined as the first candidate cut-off frequency.
Step S1230, converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data:
in order to ensure the accuracy of the cut-off frequency, the metadata sequence is further binarized on the basis of the determination of the first candidate cut-off frequency, which may be implemented by an empirically determined power threshold, for example, 120db, and is denoted by 1 when the power value corresponding to a certain frequency is higher than the power threshold, and is denoted by 0 otherwise. Processing in this manner results in a binarized data sequence.
In the previous step, regarding the manner of constructing the first power spectral density data, the binarized data sequence may be converted into second power spectral density data, which contains the binarized accumulated values corresponding to the respective frequencies, the binarized accumulated values being the sum of the respective binarized data whose corresponding frequencies are distributed along the time domain. Thus, the frequency corresponding to the time when the binarized accumulated value falling characteristic appears can be determined from the second power spectral density data, and the frequency is determined as the first candidate cutoff frequency.
Step S1240, selecting the minimum of the first candidate cutoff frequency and the second candidate cutoff frequency as the cutoff frequency:
in the foregoing, according to the conversion processing of different forms of metadata, the first candidate cut-off frequency and the second candidate cut-off frequency are obtained respectively, and a deviation often occurs between the two, for this, the two can be compared, and the one with the smallest frequency value is selected as the cut-off frequency finally determined by the audio data.
In the embodiment, different forms of conversion are performed on metadata corresponding to the audio data spectrogram by adopting multiple modes to obtain different power spectral density data, multiple candidate cut-off frequencies are calculated on the basis of the different power spectral density data, and then the minimum value is determined to be the finally selected cut-off frequency, so that the condition that evaluation is inaccurate due to the fact that the cut-off frequency calculation is performed by adopting a single mode is avoided, and the accuracy of the cut-off frequency evaluation is improved.
In an embodiment, the step S1220 of converting the metadata into first power spectral density data, and determining a first candidate cutoff frequency according to the first power spectral density data includes the steps of:
step S1221, determining a first frequency sequence corresponding to the first power spectral density data according to the metadata, where the first frequency sequence includes a total power value corresponding to a plurality of frequencies, and the total power value is a sum of a plurality of power values distributed along a time domain of the audio data by the corresponding frequency:
as mentioned above, the power values distributed in the time domain may be summed according to each frequency on the basis of the metadata sequence to determine the total power value corresponding to each step, so that the correspondence between different frequencies and the total power value is established, which may be represented as a power spectrogram, and correspondingly obtain a first power spectral density data, which is a first frequency sequence including the total power values corresponding to a plurality of frequencies, the total power value being the sum of the power values distributed in the time domain of the audio data along the corresponding frequency, and at a program level, the first frequency sequence may be represented by an array.
Step S1222 determines, according to the smooth curve data synthesized by the total power values in the first frequency sequence, that the frequency corresponding to the maximum slope change of the curve is the first candidate cut-off frequency:
it can be understood that when the total power values of the first frequency sequence are mapped onto a two-dimensional coordinate system, a non-smooth curve may be obtained, and accordingly, the curve is further fitted by using a fitting technique to form a smooth curve, and accordingly, the total power values of the first frequency sequence are also fit and fine-tuned to actually become data corresponding to the smooth curve, that is, smooth curve data.
On the basis of the smoothed curve data, the slope of the curve can be calculated by derivation, an image constructed from the slope can be referred to as shown in fig. 3(c), and then the frequency corresponding to the point where the slope of the curve changes most is determined as the first candidate cutoff frequency.
The embodiment further determines the corresponding first candidate cut-off frequency according to the first power spectral density data in a data fitting mode, the algorithm is simple, convenient and easy to implement, the calculation efficiency is high, the response is quick, and the improvement of the tone quality evaluation efficiency is facilitated.
In an embodiment, as shown in fig. 4, the step S1230, converting the metadata into the second power spectral density data after binarization, and determining the second candidate cut-off frequency according to the second power spectral density data, includes the following steps:
step S1231, performing binarization conversion on the metadata to obtain a binarization data sequence:
as previously mentioned, this can be achieved by means of an empirically determined power threshold, for example 120db, which is indicated by 1 when the power value corresponding to a certain frequency is higher than the power threshold, and by 0 otherwise. Processing in this manner results in a binarized data sequence.
Step S1232, determining a second frequency sequence corresponding to the second power spectral density data according to the binarized data sequence, where the second frequency sequence includes binarized accumulated values corresponding to multiple frequencies, and the binarized accumulated value is an accumulated sum of multiple binarized data distributed along the audio data time domain along the corresponding frequency:
on the basis of the binarized data sequence, summing each binarized data distributed on the time domain according to each frequency to determine a binarized accumulated value corresponding to each step, thereby establishing a corresponding relationship between different frequencies and the binarized accumulated value, and correspondingly obtaining second power spectral density data, wherein the second power spectral density data is a second frequency sequence, the second frequency sequence comprises the binarized accumulated values corresponding to a plurality of frequencies, the binarized accumulated value is the sum of the binarized data distributed on the time domain of the audio data along the corresponding frequency, and at the program level, an array can be used for representing the second frequency sequence.
Step S1233, determining, according to the smooth curve data synthesized by fitting the respective binarized accumulated values in the second frequency sequence, that the frequency corresponding to the maximum slope change of the curve is the second candidate cut-off frequency:
it can be understood that when each of the binarized accumulated values of the second frequency series is mapped onto a two-dimensional coordinate system, a non-smooth curve may be obtained, and accordingly, the curve is further fitted by using a fitting technique to form a smooth curve, and accordingly, each of the binarized accumulated values of the second frequency series is also fitted and fine-tuned to actually become data corresponding to the smooth curve, that is, smooth curve data.
On the basis of the smoothed curve data, the slope of the curve can be calculated by derivation, and then the frequency corresponding to the position where the slope of the curve changes most is determined as the second candidate cut-off frequency.
The embodiment further determines the corresponding second candidate cut-off frequency according to the second power spectral density data in a data fitting mode, the algorithm is simple, convenient and easy to implement, the calculation efficiency is high, the response is quick, and the improvement of the tone quality evaluation efficiency is facilitated.
In a further embodiment, referring to fig. 5, the step S1300 of determining a sampling precision level of the audio data according to a data distribution characteristic of lower data corresponding to a lower part of the predetermined sampling number in the audio data in a time domain of the audio data includes the following steps:
step S1310, converting the audio data into an absolute value form:
for the purpose of calculation, the absolute value of the audio data is first obtained, so that the images corresponding to all audio data are normalized to the reference baseline of the same reference axis, as shown in fig. 6 (b).
Step S1320, obtaining low-order data corresponding to a low-order portion in the audio data, clustering all the low-order data into a plurality of numerical labels corresponding to categories, where the bit length of the low-order portion is a preset value:
in this embodiment, taking sampling of a lossless audio file with 24-bit sampling precision as an example, the lower-order data to be intercepted is set as the data of the lower 8-order part, and accordingly, all the lower 8-order data corresponding to one channel in the audio data that has been subjected to absolute value is intercepted. An exemplary image obtained by truncating the monaural lower eight bits of data is shown with reference to fig. 6 (c). As mentioned above, the length of the lower portion to be cut off is usually a predetermined value, and can be flexibly adjusted by those skilled in the art. Since one word is 8 bits in length, the method is more efficient and is preferable.
For the convenience of calculation, all the truncated lower data are converted into decimal positive integers, 2 in view of the word length of 8 bits8256, therefore, all low order data will be converted to [0,255 ]]N, each integer canAre treated as numerical tags, whereby up to M numerical tags are obtained, M, N are all [0,255 ]]Taking values in between. On the basis, all low-order data converted into positive integers can be clustered to determine corresponding numerical labels, and the numerical labels can be used for subsequent statistical classification.
Step S1330, counting the frequency of the numerical label appearing in the audio data to form a frequency data sequence:
after M numerical value tags occupied by the low-order data are determined, the frequency of the numerical value N appearing in all the sampling data (calculated in decimal) of the audio data can be counted according to the numerical value N corresponding to each numerical value tag, so as to obtain M statistical values, and a frequency data sequence formed by sequencing the M statistical values can be used to represent a corresponding histogram, as shown in fig. 6(d), so as to facilitate the human investigation or deepen the understanding of the embodiment.
Step S1340, identifying the data distribution characteristics of the frequency data sequence, and judging the corresponding sampling precision level according to the data distribution characteristics:
furthermore, the data distribution characteristics presented by the frequency data sequence can be analyzed, the analysis of the data distribution characteristics is realized by searching the rule information hidden in the frequency data sequence, the example corresponding to the principle is difficult to exhaust, a person skilled in the art can combine the priori knowledge to summarize, then the rule information enough for determining the abnormal quality of the audio data is expressed as the data distribution characteristics and is realized as a program code, and the rule information is found in the frequency data sequence by utilizing the instruction when the program code runs, so that the judgment of the sampling precision level is made.
For example, it can be understood in conjunction with the histogram constructed from the frequency data sequence that if the frequency histogram shows a very obvious rule, such as the frequency is all 0, the two ends in the histogram are high, the middle is low, the frequency of the histogram display along the sorting direction of the frequency data sequence is always decreased or increased, the frequency of the odd number in the histogram alternates between high and low, and so on, in conjunction with a priori knowledge, the audio data can be considered to have a defect in sampling precision, such as an audio file with a possible sampling precision of 16 bits is converted from, or the sampling precision is lost after being processed by the editing software, and thus the level of the sampling precision of the audio data can be determined to be 16 bits or lower. Referring to the example of fig. 6(d), the histogram shows that the two high ends are high and the middle is low, and the data around the numerical labels 30 to 220 are all 0, which indicates that the audio sampling precision is defective.
Corresponding to the above example, represented in the data identification level of the frequency data sequence, the following features may be considered:
when the data distribution characteristics show the characteristics of all-zero data, defect identification is formed, so that the sampling degree of the audio data can be reduced;
when the data distribution characteristics are displayed as characteristics which are decreased or increased along the sorted frequency data sequence, defect identification is formed, and therefore the sampling degree of the audio data can be reduced;
when the data distribution characteristics show that the characteristics of the odd and even frequency alternate along the sorting direction of the frequency data sequence, defect identification is formed, and therefore the sampling degree of the audio data can be reduced.
In any of the above cases, the sampling precision level of the audio data can be determined to be a level lower than the predetermined sampling bit number, for example, in the present application, 24 bits are exemplarily used for sampling a lossless audio file, and if the above case occurs, it can be determined that the actual sampling precision level of the lossless audio file should be 16 bits or lower.
In the embodiment, the numerical value of the low-order data in the audio data is further utilized to convert the numerical value tags, and then the frequency of each numerical value tag in the audio data is searched, so that the data distribution characteristics of the audio data in the time domain are fully considered, and the defects related to sampling precision are easily found, so that the actual sampling precision level of the original audio data is conveniently determined again, the tone quality marking of the original audio data is facilitated, and the corresponding subsequent processing and the like are facilitated. The method is easy to understand, the algorithm is simple and easy to implement, the calculated amount is small, the calculation efficiency is high, and the sampling precision analysis can be quickly realized.
In a further embodiment, as shown in fig. 7, the step S1400 of determining the evaluation information of the audio data according to a preset evaluation rule, where the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency, includes the following steps:
step S1410, determining reference scores corresponding to two adjacent sampling frequencies of the cutoff frequency under the sampling precision level:
in this embodiment, in order to quantify the evaluation information, the preset evaluation rule is configured to comprehensively determine the corresponding evaluation information according to the sampling precision level and the cut-off frequency corresponding to the audio data determined in the present application.
Therefore, a two-dimensional mapping table can be constructed by taking the sampling precision grade and the sampling frequency as dimensions respectively, and each sampling precision grade corresponds to each sampling frequency and has a corresponding reference score, so that the reference scores corresponding to the sampling frequencies can be found according to the actual sampling precision grade of the audio data determined by the application, and further, the reference scores corresponding to two sampling frequencies adjacent to the cut-off frequency can be determined according to the cut-off frequency of the audio data determined by the application. In particular, if the cut-off frequency is exactly equal to one of the sampling frequencies, the reference score thereof may be directly used as the evaluation information of the present application. It can be understood that, in general, for the same sampling precision level, the higher the sampling frequency is, the higher the corresponding benchmark score is, so as to characterize that the higher the sampling frequency is, the better the sound quality is. Similarly, for the same sampling frequency, the higher the sampling precision is, the higher the corresponding reference score is, so as to represent that the sampling precision is higher and the sound quality is better. It is to be understood that the evaluation information is typically implemented based on an evaluation criterion, for example, the evaluation criterion is defined as a value between 0 and 100 points.
Step S1420, applying an evaluation rule, and determining a quantization score by accumulating a normalized difference between two reference scores with the reference score corresponding to the lower sampling frequency as an initial score, as the evaluation information:
according to the preset evaluation rule, the evaluation rule is specifically configured to use a first reference score which is relatively lower in the two searched reference scores as an initial score, then normalize a relative difference value of a cut-off frequency relative to two adjacent sampling frequencies of the cut-off frequency to a value which is within a score range specified by the evaluation standard, and then add the normalized difference value by the initial score to obtain an effective quantitative score meeting the specification of the evaluation standard, wherein the quantitative score can be used as the evaluation information.
Step S1430, outputting the evaluation information:
after the quantitative score is determined as evaluation information, it may be output to a graphical user interface or a relational database. In an alternative embodiment, the evaluation information may be converted into the information of the characteristic sound quality grade according to the score of the quantitative score, and a person skilled in the art can flexibly process the information.
According to a flexible embodiment, a computer program product realized according to the present application can be run by a terminal device according to the application scenario implemented or called in a client according to various embodiments of the present application, two dimensions of a frequency domain and a time domain are analyzed on audio data submitted or designated by a user, the analysis results are integrated to obtain the evaluation information, and then the evaluation information is output to a graphical user interface of the user for display.
As shown in fig. 8(a), after the user submits or designates his/her audio data, his/her graphic user interface displays first notification information notifying the user that sound quality authentication of the audio data is being performed, during which the user may be allowed to cancel the authentication; as shown in fig. 8(b) and 8(c), the interface notification message corresponding to the rating information obtained from the audio data may be output according to the quantization score or the rating indicated by the rating information. Therefore, after the audio data of the user is determined, the sound quality evaluation result of the audio data corresponding to the designated or submitted song can be quickly obtained.
In another alternative embodiment, the application scenario invoked at the client is adapted to be implemented at the server in accordance with various embodiments of the present application, the user inputs a keyword in a graphical user interface of the client to search for a related song, the server performs a search for a lossless audio file of a matching song in a song library according to the keyword, then determines corresponding evaluation information for each audio data according to the technical solution of the present application according to the audio data corresponding to the lossless audio file corresponding to each song, finally constructs a naming identifier of the song and an access address of the lossless audio file thereof as well as the evaluation information corresponding to the lossless audio file as mapping relationship data, constructs the mapping relationship data of all searched songs as a search result list and pushes the search result list to the client, and after the search result list is parsed by the client, outputs the naming identifier of the song and the evaluation information corresponding to the song on the graphical user interface thereof, the user can be clear at a glance, the target song can be selected through the evaluation information given by the song, and the lossless audio file is called through the access address corresponding to the song to play music.
In another alternative embodiment, the application program process of the present application is adapted to an application scenario implemented and called by a server, and audio data tone quality evaluation is performed on lossless audio files in a song library maintained by the server, so that quality evaluation information of each lossless audio file is determined, a marking information list is constructed for the lossless audio files, the marking information list may be stored in the database, and the marking information may include an ID of the lossless audio file, a cutoff frequency obtained after the lossless audio file is subjected to the evaluation scheme of the present application, a sampling accuracy level, a quantization score, and the like. Therefore, the management user can filter the marking information list to realize batch processing of lossless audio files according to tone quality, and the maintenance efficiency of the music library is improved.
The embodiment exemplarily provides a specific evaluation scheme, realizes that the evaluation information of the tone quality of the audio data is provided according to the cut-off frequency and the sampling precision level of the audio data determined by the application, provides a quantization standard of tone quality evaluation, simultaneously enables the evaluation information to be visualized, improves human-computer interaction experience, and is more scientific and capable of evaluating the tone quality of the audio data more accurately and effectively due to the fact that the evaluation information integrates the two data of the cut-off frequency and the sampling precision level.
The embodiment further discloses various application scenarios of the technical scheme of the application, and it can be seen that as a basic technology, each technical scheme of the application has very wide application scenarios.
Referring to fig. 9, an audio quality assessment apparatus adapted to the audio quality assessment method of the present application for functional deployment includes: the device comprises a data acquisition module 1100, a frequency domain analysis module 1200, a time domain analysis module 1300 and a comprehensive evaluation module 1400, wherein the data acquisition module 1100 is used for acquiring audio data, and the audio data is sampled by preset sampling bits; the frequency domain analysis module 1200 is configured to determine a cut-off frequency of the audio data according to a power spectral density of the audio data; the time domain analysis module 1300 is configured to determine a sampling precision level of the audio data according to a data distribution characteristic of low-order data corresponding to a low-order part of the predetermined sampling number of bits in the audio data in a time domain of the audio data; the comprehensive evaluation module 1400 is configured to determine evaluation information of the audio data according to a preset evaluation rule, where the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency.
In a further embodiment, the data acquisition module 1100 includes: the file acquisition submodule is used for acquiring a local lossless audio file submitted by a user or a lossless audio file in a music library specified by the user; and the sampling conversion sub-module is used for converting the lossless audio file into audio data in a pulse code modulation format by using a preset sampling bit number.
In a further embodiment, the frequency domain analysis module 1200 comprises: the frequency domain conversion sub-module is used for converting metadata corresponding to the spectrogram according to the audio data; a first frequency sub-module, configured to convert the metadata into first power spectral density data, and determine a first candidate cutoff frequency according to the first power spectral density data; the second frequency sub-module is used for converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data; and the frequency preference submodule is used for selecting the minimum one of the first candidate cut-off frequency and the second candidate cut-off frequency as the cut-off frequency.
In a specific embodiment, the first frequency sub-module includes: a power vector unit, configured to determine, according to the metadata, a first frequency sequence corresponding to the first power spectral density data, where the first frequency sequence includes a total power value corresponding to a plurality of frequencies, and the total power value is a sum of a plurality of power values distributed along a time domain of the audio data along the corresponding frequency; and the power calculation unit is used for determining the frequency corresponding to the maximum change of the slope of the curve as the first candidate cut-off frequency according to the smooth curve data which is synthesized by the total power values in the first frequency sequence.
In a specific embodiment, the second frequency sub-module comprises: a binary conversion unit, configured to perform binary conversion on the metadata to obtain a binary data sequence; a binary vector unit, configured to determine, according to the binary data sequence, a second frequency sequence corresponding to second power spectral density data, where the second frequency sequence includes a binary accumulated value corresponding to multiple frequencies, and the binary accumulated value is an accumulated sum of multiple binary data distributed along a time domain of audio data along a frequency corresponding to the binary accumulated value; and the binary calculation unit is used for determining the frequency corresponding to the maximum change of the slope of the curve as the second candidate cut-off frequency according to the smooth curve data synthesized by the binary accumulated values in the second frequency sequence.
In a further embodiment, the time domain analysis module 1300 comprises: the coordinate conversion submodule is used for converting the audio data into an absolute value form; the low-order extraction submodule is used for acquiring low-order data corresponding to a low-order part in the audio data, clustering all the low-order data into a plurality of numerical value labels corresponding to the classification, and the bit length of the low-order part is a preset value; the frequency counting submodule is used for counting the frequency of the numerical value label in the audio data to form a frequency data sequence; and the precision judging submodule is used for identifying the data distribution characteristics of the frequency data sequence and judging the corresponding sampling precision grade according to the data distribution characteristics.
In a specific embodiment, the accuracy determination sub-module is configured to adjust the sampling accuracy level by any one of: and when the data distribution characteristics show the characteristics of all-zero data, the characteristics decrease along the sequence, the characteristics increase along the sequence or the characteristics with the frequency of odd and even numbers alternating up and down in the sequence, judging that the sampling precision level of the audio data is a level lower than the preset sampling digit number.
In a preferred embodiment, the predetermined value of the lower portion is lower 8 bits.
In a further embodiment, the comprehensive evaluation module 1400 comprises: the reference retrieval submodule is used for determining reference scores corresponding to two adjacent sampling frequencies of the cut-off frequency under the sampling precision level; the evaluation quantization submodule is used for applying an evaluation rule, and determining a quantization score by accumulating a normalized difference value between two reference scores by taking the reference score corresponding to the lower sampling frequency as an initial score to serve as the evaluation information; and the information output submodule is used for outputting the evaluation information.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. As shown in fig. 10, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize an audio quality assessment method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the audio quality assessment method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 9, and the memory stores program codes and various data required for executing the modules or the sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the audio quality evaluation apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.
The present application also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the audio quality assessment method of any of the embodiments of the present application.
The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the audio quality assessment method described in any of the embodiments of the present application.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
To sum up, this application realizes the tone quality analysis aassessment to audio data from time domain and frequency domain, synthesizes the analysis result of time domain and frequency domain and determines the corresponding evaluation information of tone quality, makes tone quality aassessment more accurate high-efficient.
Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (12)

1. An audio quality assessment method, comprising the steps of:
acquiring audio data, wherein the audio data is sampled by a preset sampling bit number;
determining a cut-off frequency of the audio data according to the power spectral density of the audio data;
determining the sampling precision grade of the audio data according to the data distribution characteristics of the low-order data corresponding to the low-order part of the preset sampling digit in the audio data in the time domain of the audio data;
determining evaluation information of the audio data according to a preset evaluation rule, wherein the evaluation rule is configured to determine the evaluation information according to the sampling precision level and the cut-off frequency.
2. The audio quality evaluation method according to claim 1, wherein audio data sampled with a predetermined number of sampling bits is acquired, comprising the steps of:
acquiring a local lossless audio file submitted by a user or a lossless audio file in a music library specified by the user;
and converting the lossless audio file into audio data in a pulse code modulation format by using a preset sampling bit number.
3. The audio quality assessment method according to claim 1, wherein determining a cut-off frequency of the audio data based on the power spectral density of the audio data comprises the steps of:
converting metadata corresponding to the spectrogram according to the audio data;
converting the metadata into first power spectral density data, determining a first candidate cutoff frequency from the first power spectral density data;
converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data;
and selecting the minimum one of the first candidate cutoff frequency and the second candidate cutoff frequency as the cutoff frequency.
4. The audio quality assessment method according to claim 3, wherein converting said metadata into first power spectral density data, determining a first candidate cut-off frequency from the first power spectral density data, comprises the steps of:
determining a first frequency sequence corresponding to the first power spectral density data according to the metadata, wherein the first frequency sequence comprises a total power value corresponding to a plurality of frequencies, and the total power value is the sum of a plurality of power values distributed along the audio data time domain along the corresponding frequency;
and determining the frequency corresponding to the maximum change of the slope of the curve as a first candidate cut-off frequency according to the smooth curve data synthesized by the total power values in the first frequency sequence.
5. The audio quality assessment method according to claim 3, wherein converting the metadata into second power spectral density data after binarization, and determining a second candidate cut-off frequency according to the second power spectral density data comprises the following steps:
carrying out binarization conversion on the metadata to obtain a binarization data sequence;
determining a second frequency sequence corresponding to second power spectrum density data according to the binarization data sequence, wherein the second frequency sequence comprises binarization accumulated values corresponding to a plurality of frequencies, and the binarization accumulated values are the accumulated sum of a plurality of binarization data distributed along the audio data time domain along the corresponding frequencies;
and determining the frequency corresponding to the maximum change of the slope of the curve as a second candidate cut-off frequency according to the smooth curve data synthesized by each binary accumulated value in the second frequency sequence.
6. The audio quality assessment method according to claim 1, wherein the step of determining the sampling precision level of the audio data according to the data distribution characteristics of the lower data corresponding to the lower part of the predetermined sampling number of bits in the audio data in the time domain of the audio data comprises the steps of:
converting the audio data into absolute value form;
acquiring low-order data corresponding to a low-order part in the audio data, clustering all the low-order data into a plurality of numerical tags corresponding to classifications, wherein the bit length of the low-order part is a preset value;
counting the frequency of the numerical value label in the audio data to form a frequency data sequence;
and identifying the data distribution characteristics of the frequency data sequence, and judging the corresponding sampling precision level according to the data distribution characteristics.
7. The audio quality assessment method according to claim 6, wherein the step of identifying the data distribution characteristics of the frequency data sequence and determining the corresponding sampling accuracy level based on the data distribution characteristics comprises:
and when the data distribution characteristics show the characteristics of all-zero data, the characteristics decrease along the sequence, the characteristics increase along the sequence or the characteristics with the frequency of odd and even numbers alternating up and down in the sequence, judging that the sampling precision level of the audio data is a level lower than the preset sampling digit number.
8. The audio quality estimation method according to claim 6, wherein the predetermined value of the lower portion is lower 8 bits.
9. The audio quality assessment method according to claim 1, wherein the evaluation information of the audio data is determined according to a preset evaluation rule configured to determine the evaluation information according to the sampling precision level and the cut-off frequency, comprising the steps of:
determining reference scores corresponding to two adjacent sampling frequencies of the cut-off frequency under the sampling precision level;
applying an evaluation rule, and accumulating a normalized difference value between two benchmark scores by taking the benchmark score corresponding to the lower sampling frequency as an initial score to determine a quantitative score as the evaluation information;
and outputting the evaluation information.
10. A computer device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 9.
11. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 9, which, when invoked by a computer, performs the steps comprised by the corresponding method.
12. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 9.
CN202111040485.3A 2021-09-06 2021-09-06 Audio quality assessment method and device, equipment, medium and product thereof Active CN113808618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111040485.3A CN113808618B (en) 2021-09-06 2021-09-06 Audio quality assessment method and device, equipment, medium and product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111040485.3A CN113808618B (en) 2021-09-06 2021-09-06 Audio quality assessment method and device, equipment, medium and product thereof

Publications (2)

Publication Number Publication Date
CN113808618A true CN113808618A (en) 2021-12-17
CN113808618B CN113808618B (en) 2024-04-16

Family

ID=78940447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111040485.3A Active CN113808618B (en) 2021-09-06 2021-09-06 Audio quality assessment method and device, equipment, medium and product thereof

Country Status (1)

Country Link
CN (1) CN113808618B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318637A (en) * 1995-05-31 2006-11-24 Sony Corp Device and method for recording
CN102057634A (en) * 2008-06-11 2011-05-11 日本电信电话株式会社 Audio quality estimation method, audio quality estimation device, and program
CN103632679A (en) * 2012-08-21 2014-03-12 华为技术有限公司 An audio stream quality assessment method and an apparatus
CN105788612A (en) * 2016-03-31 2016-07-20 广州酷狗计算机科技有限公司 Method and device for testing tone quality
CN111312290A (en) * 2020-02-19 2020-06-19 腾讯音乐娱乐科技(深圳)有限公司 Audio data tone quality detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006318637A (en) * 1995-05-31 2006-11-24 Sony Corp Device and method for recording
CN102057634A (en) * 2008-06-11 2011-05-11 日本电信电话株式会社 Audio quality estimation method, audio quality estimation device, and program
CN103632679A (en) * 2012-08-21 2014-03-12 华为技术有限公司 An audio stream quality assessment method and an apparatus
CN105788612A (en) * 2016-03-31 2016-07-20 广州酷狗计算机科技有限公司 Method and device for testing tone quality
CN111312290A (en) * 2020-02-19 2020-06-19 腾讯音乐娱乐科技(深圳)有限公司 Audio data tone quality detection method and device

Also Published As

Publication number Publication date
CN113808618B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
US11366850B2 (en) Audio matching based on harmonogram
CN107767869B (en) Method and apparatus for providing voice service
US8688453B1 (en) Intent mining via analysis of utterances
CN108989882B (en) Method and apparatus for outputting music pieces in video
CN112199548A (en) Music audio classification method based on convolution cyclic neural network
EP2657884A2 (en) Identifying multimedia objects based on multimedia fingerprint
EP3014612B1 (en) Acoustic music similarity determiner
CN103534755B (en) Sound processing apparatus, sound processing method, program and integrated circuit
CN111192601A (en) Music labeling method and device, electronic equipment and medium
Dogan et al. A novel ternary and signum kernelled linear hexadecimal pattern and hybrid feature selection based environmental sound classification method
US11907288B2 (en) Audio identification based on data structure
CN114625918A (en) Video recommendation method, device, equipment, storage medium and program product
CN111583963B (en) Repeated audio detection method, device, equipment and storage medium
CN108847251A (en) A kind of voice De-weight method, device, server and storage medium
CN113808618B (en) Audio quality assessment method and device, equipment, medium and product thereof
CN113808617B (en) Audio data checking method and device, equipment, medium and product thereof
CN110955789A (en) Multimedia data processing method and equipment
CN114817622A (en) Song fragment searching method and device, equipment, medium and product thereof
CN115115986A (en) Video quality evaluation model production method, device, equipment and medium
US20230129350A1 (en) Section-based music similarity searching
Bengani et al. Efficient music auto-tagging with convolutional neural networks
CN113223487B (en) Information identification method and device, electronic equipment and storage medium
CN112182285A (en) Data processing method and equipment
CN114764452A (en) Song searching method and device, equipment, medium and product thereof
CN114817621A (en) Song semantic information indexing method and device, equipment, medium and product thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant