[ summary of the invention ]
Aspects of the present invention provide a method and an apparatus for recognizing sound quality of an audio file, so as to realize sound quality recognition of the audio file.
In one aspect of the present invention, a method for identifying a sound quality of an audio file is provided, including:
acquiring a target audio file to be identified;
according to the target audio file, at least one of a time domain waveform characteristic of the target audio file and a frequency domain spectral line characteristic of the target audio file is obtained;
according to the time domain waveform characteristics and at least one of the frequency domain spectral line characteristics, identifying that the tone quality of the target audio file is a first tone quality or a second tone quality, wherein the first tone quality is higher than the second tone quality.
The above aspect and any possible implementation manner further provide an implementation manner, where the obtaining, according to the target audio file, at least one of a time-domain waveform characteristic of the target audio file and a frequency-domain spectral line characteristic of the target audio file includes:
determining the number of channels of the target audio file;
decoding the data blocks of the target audio file to obtain original audio data;
and obtaining the audio data of the sound channel corresponding to each sound channel according to the number of the sound channels and the original audio data.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the identifying, according to at least one of the time-domain waveform feature and the frequency-domain spectral line feature, the sound quality of the target audio file as a first sound quality or a second sound quality includes:
if the number of the sound channels is larger than or equal to 2, obtaining first sound channel audio data and second sound channel audio data corresponding to at least two sound channels according to the sound channel audio data corresponding to each sound channel;
adding the first channel audio data and the second channel audio data to obtain mixed channel audio data;
if the mixed channel audio data is greater than or equal to the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the first tone quality;
if the mixed channel audio data is smaller than the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the second tone quality; wherein,
n is a number greater than 1; m is a number greater than 1.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the identifying, according to at least one of the time-domain waveform feature and the frequency-domain spectral line feature, the sound quality of the target audio file as a first sound quality or a second sound quality includes:
if the difference value between every two of the values of the target sound channel audio data with the continuously specified number is smaller than or equal to a first amplitude threshold value, identifying the sound quality of the target audio file as the second sound quality, wherein the target sound channel audio data comprises sound channel audio data corresponding to any sound channel in the sound channel audio data corresponding to each sound channel; or
And if the difference value of the values of the two continuous target channel audio data is greater than or equal to a second amplitude threshold value and the signs of the values of the two continuous target channel audio data are opposite, identifying that the tone quality of the target audio file is the second tone quality, wherein the target channel audio data comprises channel audio data corresponding to any channel in the channel audio data corresponding to each channel.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where after obtaining the channel audio data corresponding to each channel according to the number of channels and the original audio data, the method further includes:
performing frame processing on target channel audio data to obtain at least one frame of audio data, wherein the target channel audio data comprises channel audio data corresponding to any channel in the channel audio data corresponding to each channel;
and performing frequency domain transformation processing on the at least one frame of audio data to obtain frequency domain data corresponding to each frame of audio data.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the identifying, according to at least one of the time-domain waveform feature and the frequency-domain spectral line feature, the sound quality of the target audio file as a first sound quality or a second sound quality includes:
according to the frequency domain data corresponding to each frame of audio data, obtaining the energy component of the frequency domain data corresponding to each frame of audio data at each frequency point;
and if the difference value between every two frequency domain data corresponding to each frame of audio data in the energy components at least one same frequency point is smaller than or equal to the energy threshold value, identifying the tone quality of the target audio file as the second tone quality.
The above-described aspect and any possible implementation manner further provide an implementation manner, before the obtaining of the target audio file to be identified, further including:
acquiring format parameters of the candidate audio files;
determining the candidate audio file as the target audio file according to the format parameter; or identifying the tone quality of the candidate audio file as the second tone quality.
The above aspect and any possible implementation further provide an implementation, wherein the format parameter includes at least one of a compression format, a sampling rate, a sampling depth, and a code rate.
In another aspect of the present invention, there is provided an apparatus for recognizing a sound quality of an audio file, including:
the acquisition unit is used for acquiring a target audio file to be identified;
the characteristic unit is used for acquiring at least one of the time domain waveform characteristic of the target audio file and the frequency domain spectral line characteristic of the target audio file according to the target audio file;
the identification unit is used for identifying the tone quality of the target audio file as first tone quality or second tone quality according to at least one of the time domain waveform characteristics and the frequency domain spectral line characteristics, and the first tone quality is higher than the second tone quality.
The above aspects and any possible implementations further provide an implementation, the feature unit being specifically configured to
Determining the number of channels of the target audio file;
decoding the data blocks of the target audio file to obtain original audio data; and
and obtaining the audio data of the sound channel corresponding to each sound channel according to the number of the sound channels and the original audio data.
The above-mentioned aspects and any possible implementation further provide an implementation in which the identification unit is specifically configured to
If the number of the sound channels is larger than or equal to 2, obtaining first sound channel audio data and second sound channel audio data corresponding to at least two sound channels according to the sound channel audio data corresponding to each sound channel;
adding the first channel audio data and the second channel audio data to obtain mixed channel audio data; and
if the mixed channel audio data is greater than or equal to the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the first tone quality;
if the mixed channel audio data is smaller than the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the second tone quality; wherein,
n is a number greater than 1; m is a number greater than 1.
The above-mentioned aspects and any possible implementation further provide an implementation in which the identification unit is specifically configured to
If the difference value between every two of the values of the target sound channel audio data with the continuously specified number is smaller than or equal to a first amplitude threshold value, identifying the sound quality of the target audio file as the second sound quality, wherein the target sound channel audio data comprises sound channel audio data corresponding to any sound channel in the sound channel audio data corresponding to each sound channel; or
And if the difference value of the values of the two continuous target channel audio data is greater than or equal to a second amplitude threshold value and the signs of the values of the two continuous target channel audio data are opposite, identifying that the tone quality of the target audio file is the second tone quality, wherein the target channel audio data comprises channel audio data corresponding to any channel in the channel audio data corresponding to each channel.
The above-mentioned aspects and any possible implementation further provide an implementation, and the feature unit is further configured to
Performing frame processing on target channel audio data to obtain at least one frame of audio data, wherein the target channel audio data comprises channel audio data corresponding to any channel in the channel audio data corresponding to each channel; and
and performing frequency domain transformation processing on the at least one frame of audio data to obtain frequency domain data corresponding to each frame of audio data.
The above-mentioned aspects and any possible implementation further provide an implementation in which the identification unit is specifically configured to
According to the frequency domain data corresponding to each frame of audio data, obtaining the energy component of the frequency domain data corresponding to each frame of audio data at each frequency point; and
and if the difference value between every two frequency domain data corresponding to each frame of audio data in the energy components at least one same frequency point is smaller than or equal to the energy threshold value, identifying the tone quality of the target audio file as the second tone quality.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the identification unit is further configured to
Acquiring format parameters of the candidate audio files; and
determining the candidate audio file as the target audio file according to the format parameter; or identifying the tone quality of the candidate audio file as the second tone quality.
The above aspect and any possible implementation further provide an implementation, wherein the format parameter includes at least one of a compression format, a sampling rate, a sampling depth, and a code rate.
According to the technical scheme, at least one of the time domain waveform characteristic of the target audio file and the frequency domain spectral line characteristic of the target audio file is obtained by obtaining the target audio file to be identified according to the target audio file, so that the tone quality of the target audio file can be identified to be the first tone quality or the second tone quality according to the at least one of the time domain waveform characteristic and the frequency domain spectral line characteristic, and the first tone quality is higher than the second tone quality.
In addition, the technical scheme provided by the invention is simple to operate, and can effectively improve the efficiency of tone quality identification of the audio file.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terminal according to the embodiment of the present invention may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a wireless netbook, a portable Computer, a Personal Computer (PC), an MP3 player, an MP4 player, and the like.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 is a flowchart illustrating a method for recognizing a sound quality of an audio file according to an embodiment of the present invention, as shown in fig. 1.
101. And acquiring a target audio file to be identified.
The target Audio file may include Audio files in various encoding formats in the prior art, such as a Moving Picture Experts Group (MPEG) layer 3(MPEG layer-3, MP3) format Audio file, a wma (windows Media Audio) format Audio file, an Advanced Audio Coding (AAC) format Audio file, a Lossless Audio compression Coding (FLAC) or APE format Audio file, and the like, which is not particularly limited in this embodiment.
102. And acquiring at least one of the time domain waveform characteristic of the target audio file and the frequency domain spectral line characteristic of the target audio file according to the target audio file.
The time-domain waveform characteristics of the target audio file may include, but are not limited to, amplitude information of the original audio data.
The original audio data is a digital signal converted from an audio signal, and for example, the audio signal is sampled, quantized and encoded to obtain Pulse Code Modulation (PCM) data, which can be obtained by parsing a data block of a target audio file.
The frequency-domain spectral line characteristics of the target audio file may include, but are not limited to, spectral information of the original audio data.
103. According to the time domain waveform characteristics and at least one of the frequency domain spectral line characteristics, identifying that the tone quality of the target audio file is a first tone quality or a second tone quality, wherein the first tone quality is higher than the second tone quality.
The execution main bodies 101 to 103 may be processing devices, and may be located in a local Application (App), for example, hundredth music, or may also be located in a server on a network side, or may also be partially located in the local Application, and another portion is located in the server on the network side.
It should be understood that the application may be an application installed on the terminal (native app), or may also be a web page of a browser on the terminal (webAPP), as long as an objective existence form of processing of the audio data can be implemented, which is not limited in this embodiment.
Like this, through acquireing the target audio file of treating discernment, and then according to the target audio file obtains the time domain waveform characteristics of target audio file with at least one item in the frequency domain spectral line characteristic of target audio file makes can be according to time domain waveform characteristics with at least one item in the frequency domain spectral line characteristic is discerned the tone quality of target audio file is first tone quality or second tone quality, first tone quality is higher than second tone quality, like this, just can provide the audio file of real high tone quality to the user, makes the user can appreciate the audio file of real high tone quality.
Optionally, in a possible implementation manner of this embodiment, before 101, the processing device may further obtain format parameters of the candidate audio file. Then, the processing device may determine the candidate audio file as the target audio file according to the format parameter; or identifying the tone quality of the candidate audio file as the second tone quality.
Wherein the format parameter may include, but is not limited to, at least one of a compression format, a sampling rate, a sampling depth, and a code rate.
The compression format is a compression method in which original audio data is compressed by a program, such as MP3 format, WMA format, AAC format, FLAC format, APE format, or the like.
The sampling rate, also referred to as sampling speed or sampling frequency, defines the number of samples per second that are extracted from a continuous signal and constitute a discrete signal, which is expressed in hertz (Hz).
The sampling depth, which means that the value of a sample point is represented by a few bits, determines the number of bits of the value of each sample point, for example, 8 bits (bit), 16 bits or 24 bits, etc.
The code rate is the number of bits processed per unit time, and the unit is bits per second (bps).
Specifically, the processing device may specifically parse a frame header of the candidate audio file to obtain a format parameter of the candidate audio file.
For example, if the sampling depth is 8 bits, the tone quality of the candidate audio file is identified as the second tone quality; and if the sampling depth is 16 bits, determining the candidate audio file as the target audio file.
Or, for another example, if the sampling rate is less than 44100Hz, identifying the tone quality of the candidate audio file as the second tone quality; and if the sampling rate is greater than or equal to 44100Hz, determining the candidate audio file as the target audio file.
Or, for another example, the compression format is MP3, and the bitrate is less than 320 kilobits per second (kbps), identifying the timbre of the candidate audio file as the second timbre; the compression format is MP3, and the code rate is greater than or equal to 320kbps, and the candidate audio file is determined as the target audio file.
Therefore, by acquiring the format parameters of the candidate audio file, the tone quality of the candidate audio file can be identified as the second tone quality in advance according to the format parameters, so that the candidate audio file does not need to be used as a target audio file for further identification, and the efficiency of tone quality identification of the audio file can be effectively improved.
In addition, since the candidate audio file does not need to be decoded, and the format parameters of the candidate audio file can be obtained only by analyzing the frame header, the efficiency of sound quality identification of the audio file can be further improved.
Optionally, in a possible implementation manner of this embodiment, in 102, the processing device may specifically determine the number of channels of the target audio file, and decode the data blocks of the target audio file to obtain the original audio data. Then, the processing device may obtain channel audio data corresponding to each channel according to the number of channels and the original audio data. For a detailed description of the parsing method and the decoding method, reference may be made to related contents in the prior art, and details are not repeated here.
For example, the processing device may specifically parse a frame header of the target audio file to determine the number of channels of the target audio file.
Or for another example, the processing device specifically parses the file header of the target audio file to determine the number of channels of the target audio file.
For another example, the processing device may further parse other portions of the target audio file to determine the number of channels of the target audio file, which is not particularly limited in this embodiment.
Or for another example, the processing device may specifically obtain the number of channels of the target audio file from the configuration file.
It is to be understood that, without a fixed order, the two steps of "determining the number of channels of the target audio file" and "decoding the data blocks of the target audio file to obtain the original audio data" may be performed first, and then the step of "decoding the data blocks of the target audio file to obtain the original audio data" may be performed first, or the step of "decoding the data blocks of the target audio file to obtain the original audio data" may be performed first, and then the step of "determining the number of channels of the target audio file" may be performed, or both of these steps may be performed simultaneously, which is not particularly limited in this embodiment.
Accordingly, in a possible implementation manner of this embodiment, in 103, if the number of channels is greater than or equal to 2, the processing device may obtain, according to the channel audio data corresponding to each channel, first channel audio data and second channel audio data corresponding to at least two channels, and further add the first channel audio data and the second channel audio data to obtain mixed channel audio data.
If the mixed channel audio data is greater than or equal to the first channel audio data/N or the second channel audio data/M, the processing device may identify the sound quality of the target audio file as the first sound quality. Wherein N is a number greater than 1; m is a number greater than 1.
If the mixed channel audio data is smaller than the first channel audio data/N or the second channel audio data/M, the processing device may identify the sound quality of the target audio file as the second sound quality; wherein N is a number greater than 1; m is a number greater than 1.
Accordingly, in a possible implementation manner of this embodiment, in 103, if the difference between two of the values of the successively specified number (e.g. 3) of target channel audio data is smaller than or equal to the first amplitude threshold, and the corresponding waveform in this case may be as shown in fig. 2, then the processing device may identify the sound quality of the target audio file as the second sound quality. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment. In fig. 2, the abscissa represents time and the ordinate represents amplitude.
Accordingly, in a possible implementation manner of this embodiment, in 103, if the difference between the values of the two consecutive target channel audio data is greater than or equal to the second amplitude threshold and the signs of the values of the two consecutive target channel audio data are opposite, and the corresponding waveforms may be as shown in fig. 3, then the processing device may identify that the sound quality of the target audio file is the second sound quality. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment. In fig. 3, the abscissa represents time, and the ordinate represents amplitude.
Optionally, in a possible implementation manner of this embodiment, in 102, after obtaining the channel audio data corresponding to each channel, the processing device may further perform frame division processing on target channel audio data to obtain at least one frame of audio data, where the target channel audio data includes channel audio data corresponding to any channel in the channel audio data corresponding to each channel. Then, the processing device may perform frequency domain transform processing on the at least one frame of audio data to obtain frequency domain data corresponding to each frame of audio data. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment.
In particular, the frequency domain Transform process may include, but is not limited to, a Fast Fourier Transform (FFT).
For example, the processing device may perform framing processing on the target channel audio data at intervals of 20ms with 50% of data overlap between adjacent frames to obtain at least one frame of audio data. Then, the processing device may perform FFT processing on the at least one frame of audio data to obtain frequency domain data, denoted as a, corresponding to each frame of audio datai,j(ii) a Wherein i represents the number of frequency points, j represents the number of frames, Ai,jRepresenting the frequency domain data of the jth frame at the ith frequency point.
Accordingly, in a possible implementation manner of this embodiment, in 103, the processing device may specifically obtain, according to the frequency domain data corresponding to each frame of audio data, an energy component at each frequency point of the frequency domain data corresponding to each frame of audio data. If the difference between every two of the energy components of the frequency domain data corresponding to each frame of audio data at least one same frequency point is less than or equal to the energy threshold, and the energy spectrum corresponding to this situation may be as shown in fig. 4, then the processing device may identify the sound quality of the target audio file as the second sound quality. In fig. 4, the abscissa represents time, the ordinate represents frequency, and the color of each point represents energy.
For example, the frequency domain data corresponding to each frame of audio data obtained by the processing device is recorded as Ai,jObtaining the energy component E of the frequency domain data corresponding to each frame of audio data at each frequency pointi,j(ii) a Wherein i represents the number of frequency points, j represents the number of frames, Ei,jRepresenting the energy component of the jth frame at the ith frequency point.
In this embodiment, through obtaining the target audio file of treating discernment, and then according to the target audio file, obtain the time domain waveform characteristic of target audio file with at least one item in the frequency domain spectral line characteristic of target audio file makes can be according to time domain waveform characteristic with at least one item in the frequency domain spectral line characteristic is discerned the tone quality of target audio file is first tone quality or second tone quality, first tone quality is higher than second tone quality, like this, just can provide the audio file of real high tone quality to the user, makes the user can appreciate the audio file of real high tone quality.
In addition, the technical scheme provided by the invention is simple to operate, and can effectively improve the efficiency of tone quality identification of the audio file.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
Fig. 5 is a schematic structural diagram of a sound quality recognition apparatus for audio files according to another embodiment of the present invention, as shown in fig. 5. The sound quality recognition apparatus of an audio file of the present embodiment may include an acquisition unit 51, a feature unit 52, and a recognition unit 53. Wherein,
an obtaining unit 51, configured to obtain a target audio file to be identified.
The target Audio file may include Audio files in various encoding formats in the prior art, such as a Moving Picture Experts Group (MPEG) layer 3(MPEG layer-3, MP3) format Audio file, a wma (windows Media Audio) format Audio file, an Advanced Audio Coding (AAC) format Audio file, a Lossless Audio compression Coding (FLAC) or APE format Audio file, and the like, which is not particularly limited in this embodiment.
A feature unit 52, configured to obtain at least one of a time-domain waveform feature of the target audio file and a frequency-domain spectral line feature of the target audio file according to the target audio file.
The time-domain waveform characteristics of the target audio file may include, but are not limited to, amplitude information of the original audio data.
The original audio data is a digital signal converted from an audio signal, and for example, the audio signal is sampled, quantized and encoded to obtain Pulse Code Modulation (PCM) data, which can be obtained by parsing a data block of a target audio file.
The frequency-domain spectral line characteristics of the target audio file may include, but are not limited to, spectral information of the original audio data.
And the identifying unit 53 is used for identifying that the tone quality of the target audio file is a first tone quality or a second tone quality according to at least one of the time domain waveform characteristics and the frequency domain spectral line characteristics, wherein the first tone quality is higher than the second tone quality.
It should be noted that the sound quality recognition device for the audio file provided in this embodiment may be a processing device, may be located in a local Application (App), for example, hundredth music, or may also be located in a server on the network side, or may also be located in a part of the local Application, and another part is located in the server on the network side.
It should be understood that the application may be an application installed on the terminal (native app), or may also be a web page of a browser on the terminal (webAPP), as long as an objective existence form of processing of the audio data can be implemented, which is not limited in this embodiment.
Like this, acquire the target audio file of treating discernment through the acquisition element, and then by the characteristic cell basis the target audio file obtains the time domain waveform characteristics of target audio file with at least one item in the frequency domain spectral line characteristic of target audio file for the identification element can be according to time domain waveform characteristics with at least one item in the frequency domain spectral line characteristic is discerned the tone quality of target audio file is first tone quality or second tone quality, first tone quality is higher than second tone quality, like this, just can provide the audio file of real high tone quality to the user, makes the user can appreciate the audio file of real high tone quality.
Optionally, in a possible implementation manner of this embodiment, the identifying unit may be further configured to obtain a format parameter of the candidate audio file; determining the candidate audio file as the target audio file according to the format parameter; or identifying the tone quality of the candidate audio file as the second tone quality.
Wherein the format parameter may include, but is not limited to, at least one of a compression format, a sampling rate, a sampling depth, and a code rate.
The compression format is a compression method in which original audio data is compressed by a program, such as MP3 format, WMA format, AAC format, FLAC format, APE format, or the like.
The sampling rate, also referred to as sampling speed or sampling frequency, defines the number of samples per second that are extracted from a continuous signal and constitute a discrete signal, which is expressed in hertz (Hz).
The sampling depth, which means that the value of a sample point is represented by a few bits, determines the number of bits of the value of each sample point, for example, 8 bits (bit), 16 bits or 24 bits, etc.
The code rate is the number of bits processed per unit time, and the unit is bits per second (bps).
Specifically, the identifying unit 53 may specifically parse a frame header of the candidate audio file to obtain a format parameter of the candidate audio file.
For example, if the sampling depth is 8 bits, the tone quality of the candidate audio file is identified as the second tone quality; and if the sampling depth is 16 bits, determining the candidate audio file as the target audio file.
Or, for another example, if the sampling rate is less than 44100Hz, identifying the tone quality of the candidate audio file as the second tone quality; and if the sampling rate is greater than or equal to 44100Hz, determining the candidate audio file as the target audio file.
Or, for another example, the compression format is MP3, and the bitrate is less than 320 kilobits per second (kbps), identifying the timbre of the candidate audio file as the second timbre; the compression format is MP3, and the code rate is greater than or equal to 320kbps, and the candidate audio file is determined as the target audio file.
Therefore, the format parameters of the candidate audio files are obtained through the identification unit, and then the tone quality of the candidate audio files can be identified as the second tone quality in advance according to the format parameters, so that the candidate audio files do not need to be used as target audio files for further identification, and the efficiency of tone quality identification of the audio files can be effectively improved.
In addition, since the candidate audio file does not need to be decoded, and the format parameters of the candidate audio file can be obtained only by analyzing the frame header, the efficiency of sound quality identification of the audio file can be further improved.
Optionally, in a possible implementation manner of this embodiment, the feature unit 52 may be specifically configured to determine the number of channels of the target audio file; decoding the data blocks of the target audio file to obtain original audio data; and obtaining the audio data of the sound channel corresponding to each sound channel according to the number of the sound channels and the original audio data. For a detailed description of the parsing method and the decoding method, reference may be made to related contents in the prior art, and details are not repeated here.
For example, the feature unit 52 may specifically parse a frame header of the target audio file to determine the number of channels of the target audio file.
Or for another example, the feature unit 52 specifically parses the file header of the target audio file to determine the number of channels of the target audio file.
For another example, the feature unit 52 may further parse other portions of the target audio file to determine the number of channels of the target audio file, which is not particularly limited in this embodiment.
Or for another example, the feature unit 52 may specifically obtain the number of channels of the target audio file from a configuration file.
Accordingly, in a possible implementation manner of this embodiment, the identifying unit 53 may be specifically configured to, if the number of channels is greater than or equal to 2, obtain, according to the channel audio data corresponding to each channel, first channel audio data and second channel audio data corresponding to at least two channels; adding the first channel audio data and the second channel audio data to obtain mixed channel audio data; if the mixed channel audio data is greater than or equal to the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the first tone quality; if the mixed channel audio data is smaller than the first channel audio data/N or the second channel audio data/M, identifying the tone quality of the target audio file as the second tone quality; wherein N is a number greater than 1; m is a number greater than 1.
Accordingly, in a possible implementation manner of this embodiment, the identifying unit 53 may be specifically configured to identify the sound quality of the target audio file as the second sound quality if a difference between two of values of a specified number (e.g., 3) of target channel audio data is smaller than or equal to a first amplitude threshold, where the target channel audio data includes channel audio data corresponding to any channel in the channel audio data corresponding to each channel. The corresponding waveform for this case can be as shown in fig. 2. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment.
Accordingly, in a possible implementation manner of this embodiment, the identifying unit 53 may be specifically configured to identify that the sound quality of the target audio file is the second sound quality if the difference between the values of two consecutive target channel audio data is greater than or equal to a second amplitude threshold and the signs of the values of the two consecutive target channel audio data are opposite, where the target channel audio data includes channel audio data corresponding to any channel in the channel audio data corresponding to each channel. The corresponding waveform for this case can be as shown in fig. 3. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment.
Optionally, in a possible implementation manner of this embodiment, the feature unit 52 may be further configured to perform frame division processing on target channel audio data to obtain at least one frame of audio data, where the target channel audio data includes channel audio data corresponding to any channel in channel audio data corresponding to each channel; and performing frequency domain transformation processing on the at least one frame of audio data to obtain frequency domain data corresponding to each frame of audio data. The target channel audio data may be channel audio data corresponding to any one channel, which is not particularly limited in this embodiment.
In particular, the frequency domain Transform process may include, but is not limited to, a Fast Fourier Transform (FFT).
For example, the feature unit 52 may perform framing processing on the target channel audio data at intervals of 20ms with 50% data overlap between adjacent frames to obtain at least one frame of audio data. Then, the feature unit 52 may perform FFT on the at least one frame of audio data to obtain frequency domain data, denoted as a, corresponding to each frame of audio datai,j(ii) a Wherein i represents the number of frequency points, j represents the number of frames, Ai,jRepresenting the frequency domain data of the jth frame at the ith frequency point.
Accordingly, in a possible implementation manner of this embodiment, the identifying unit 53 may be specifically configured to obtain, according to the frequency domain data corresponding to each frame of audio data, an energy component at each frequency point of the frequency domain data corresponding to each frame of audio data; and if the difference value between every two frequency domain data corresponding to each frame of audio data in the energy components at least one same frequency point is smaller than or equal to the energy threshold value, identifying the tone quality of the target audio file as the second tone quality. The corresponding energy spectrum for this case can be seen in fig. 4.
For example, the identification unit 53 records as a frequency domain data corresponding to each frame of the obtained audio datai,jObtaining the energy component E of the frequency domain data corresponding to each frame of audio data at each frequency pointi,j(ii) a Wherein i represents the number of frequency points, j represents the number of frames, Ei,jRepresenting the energy component of the jth frame at the ith frequency point.
In this embodiment, acquire the target audio file of treating discernment through the acquisition element, and then by the characteristic cell basis the target audio file, obtain the time domain waveform characteristic of target audio file with at least one item in the frequency domain spectral line characteristic of target audio file for the identification element can be according to time domain waveform characteristic with at least one item in the frequency domain spectral line characteristic is discerned the tone quality of target audio file is first tone quality or second tone quality, first tone quality is higher than second tone quality, like this, just can provide the audio file of real high tone quality to the user, makes the user can appreciate the audio file of real high tone quality.
In addition, the technical scheme provided by the invention is simple to operate, and can effectively improve the efficiency of tone quality identification of the audio file.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an audio processing engine, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.