CN104715756A - Audio data processing method and device - Google Patents

Audio data processing method and device Download PDF

Info

Publication number
CN104715756A
CN104715756A CN201510069567.9A CN201510069567A CN104715756A CN 104715756 A CN104715756 A CN 104715756A CN 201510069567 A CN201510069567 A CN 201510069567A CN 104715756 A CN104715756 A CN 104715756A
Authority
CN
China
Prior art keywords
audio data
frequency range
acoustic feature
original
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510069567.9A
Other languages
Chinese (zh)
Inventor
田彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yinzhibang Culture Technology Co ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510069567.9A priority Critical patent/CN104715756A/en
Publication of CN104715756A publication Critical patent/CN104715756A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides an audio data processing method and device. Due to the fact that high-frequency-band audio data, namely expanding audio data are added in original audio data, the obtained target audio data not only include low-frequency-band audio data, namely the original audio data, but also include the high-frequency-band audio data, accordingly, a real high-tone-quality audio file can be provided for a user, and the user can appreciate the real high-tone-quality audio file.

Description

The disposal route of voice data and device
[technical field]
The present invention relates to audio signal processing technique, particularly relate to a kind of disposal route and device of voice data.
[background technology]
The tonequality of audio file, refers to the fidelity of the original audio data after overcompression process.The audio file of high tone quality, can recover original audio data completely, and not cause any distortion; And the audio file of low tonequality, then can not recover original audio data completely, and cause partial distortion.At present, there are some switch technologies, the audio file of low tonequality can have been converted to the audio file of pseudo-high tone quality.In fact, the audio file of this pseudo-high tone quality, its tonequality is the same with the tonequality of the audio file before conversion, does not belong to real high tone quality.After user obtains the audio file of these pseudo-high tone qualities by the application of some music classes, cannot enjoy real high tone quality, this can affect the brand image of these music classes application, even also can cause legal dispute at all.
Therefore, provide the audio file of real high tone quality to user, enabling user appreciate the audio file of real high tone quality, is problem demanding prompt solution.
[summary of the invention]
Many aspects of the present invention provide a kind of disposal route and device of voice data, in order to improve the tonequality of audio file.
An aspect of of the present present invention, provides a kind of disposal route of voice data, comprising:
Obtain pending original audio data; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range;
According to described original audio data, obtain extended audio data; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range;
According to described original audio data and described extended audio data, obtain target audio data.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and described first signal frequency range for being more than or equal to 0, and is less than or equal to the first frequency range threshold value; Described secondary signal frequency range for being greater than described first frequency range threshold value, and is less than or equal to the second frequency range threshold value.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described according to described original audio data, obtains extended audio data, comprising:
According to described original audio data, obtain the original acoustic feature of described original audio data;
According to described original acoustic feature, obtain described expansion acoustic feature;
According to described expansion acoustic feature, obtain described extended audio data.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described according to described original acoustic feature, obtains described expansion acoustic feature, comprising:
According to described original acoustic feature, utilize the transformational relation between original acoustic feature and expansion acoustic feature, obtain described expansion acoustic feature.
Aspect as above and arbitrary possible implementation, a kind of implementation is provided further, described according to described original acoustic feature, utilize the transformational relation between original acoustic feature and expansion acoustic feature, after obtaining described expansion acoustic feature, also comprise:
Obtain at least one sample audio data;
According to each sample audio data at least one sample audio data described, obtain the first voice data of described each sample audio data and the second audio data of described each sample audio data; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range;
According to described first voice data, obtain the first acoustic feature of described each sample audio data;
According to described second audio data, obtain the second acoustic feature of described each sample audio data;
According to the first acoustics characteristic sum of described each sample audio data, the second acoustic feature of each sample audio data, utilizes degree of deep learning algorithm, obtains described transformational relation.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and described acoustic feature comprises linear prediction LPC coefficient, linear prediction residue error LPCC, mel-frequency cepstrum coefficient MFCC or perception linear prediction PLP coefficient.
Another aspect of the present invention, provides a kind for the treatment of apparatus of voice data, comprising:
Acquiring unit, for obtaining pending original audio data; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range;
Feature unit, for according to described original audio data, obtains extended audio data; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range;
Processing unit, for according to described original audio data and described extended audio data, obtains target audio data.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and described first signal frequency range for being more than or equal to 0, and is less than or equal to the first frequency range threshold value; Described secondary signal frequency range for being greater than described first frequency range threshold value, and is less than or equal to the second frequency range threshold value.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described feature unit further, specifically for
According to described original audio data, obtain the original acoustic feature of described original audio data;
According to described original acoustic feature, obtain described expansion acoustic feature; And
According to described expansion acoustic feature, obtain described extended audio data.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described feature unit further, specifically for
According to described original acoustic feature, utilize the transformational relation between original acoustic feature and expansion acoustic feature, obtain described expansion acoustic feature.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described feature unit further, also for
Obtain at least one sample audio data;
According to each sample audio data at least one sample audio data described, obtain the first voice data of described each sample audio data and the second audio data of described each sample audio data; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range;
According to described first voice data, obtain the first acoustic feature of described each sample audio data;
According to described second audio data, obtain the second acoustic feature of described each sample audio data; And
According to the first acoustics characteristic sum of described each sample audio data, the second acoustic feature of each sample audio data, utilizes degree of deep learning algorithm, obtains described transformational relation.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and described acoustic feature comprises linear prediction LPC coefficient, linear prediction residue error LPCC, mel-frequency cepstrum coefficient MFCC or perception linear prediction PLP coefficient.
As shown from the above technical solution, the embodiment of the present invention is passed through according to obtained pending original audio data, obtain extended audio data, the frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range, make it possible to according to described original audio data and described extended audio data, obtain target audio data, due in voice data that the voice data of high band and extended audio data are increased to low-frequency range and original audio data, obtained target audio data are made no longer only to have the voice data of low-frequency range, but also there is the voice data of high band, like this, the audio file of real high tone quality can be provided to user, user is enable to appreciate the audio file of real high tone quality.
In addition, adopt technical scheme provided by the invention, simple to operate, effectively can improve the efficiency of the process of voice data.
[accompanying drawing explanation]
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the disposal route of the voice data that Fig. 1 provides for one embodiment of the invention;
The structural representation of the treating apparatus of the voice data that Fig. 2 provides for another embodiment of the present invention.
[embodiment]
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments whole obtained under creative work prerequisite, belong to the scope of protection of the invention.
It should be noted that, terminal involved in the embodiment of the present invention can include but not limited to mobile phone, personal digital assistant (Personal Digital Assistant, PDA), radio hand-held equipment, panel computer (Tablet Computer), PC (Personal Computer, PC), MP3 player, MP4 player, wearable device (such as, intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc.
In addition, term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
The schematic flow sheet of the disposal route of a kind of voice data that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101, pending original audio data is obtained; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range.
102, according to described original audio data, extended audio data are obtained; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range.
Wherein, described first signal frequency range for being more than or equal to 0, and can be less than or equal to the first frequency range threshold value such as, 22050 hertz (Hz) etc., i.e. [0,22050Hz]; Described secondary signal frequency range then for being greater than described first frequency range threshold value, and can be less than or equal to the second frequency range threshold value such as, 48000Hz, namely (22050Hz, 48000Hz].
103, according to described original audio data and described extended audio data, target audio data are obtained.
So far, the frequency range of the sound signal corresponding to described target audio data obtained comprises described first signal frequency range and described secondary signal frequency range, in the present embodiment, because described secondary signal frequency range is higher than described first signal frequency range, therefore, described first signal frequency range can be called low-frequency range, and described secondary signal frequency range can be called high band.
The present invention, after obtaining target audio data, obtained target audio data can be stored as a complete audio file, carry out stores processor, or directly obtained target audio data can also be transferred to playback equipment, carry out real-time playback process, the present embodiment is not particularly limited this.
Particularly, specifically in the memory device of terminal, audio file can be carried out stores processor.
In a concrete implementation procedure, the memory device of described terminal can memory device at a slow speed, be specifically as follows the hard disk of computer system, or can also be inoperative internal memory and the physical memory of mobile phone, such as, ROM (read-only memory) (Read-Only Memory, ROM) and RAM (random access memory) card etc., the present embodiment is not particularly limited this.
In the implementation procedure that another is concrete, the memory device of described terminal can also be speedy storage equipment, be specifically as follows the internal memory of computer system, or can also be running memory and the Installed System Memory of mobile phone, such as, random access memory (Random Access Memory, RAM) etc., the present embodiment is not particularly limited this.
It should be noted that, the executive agent of 101 ~ 103 partly or entirely can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the Software Development Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this, and the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Like this, due in voice data that the voice data of high band and extended audio data are increased to low-frequency range and original audio data, obtained target audio data are made no longer only to have the voice data of low-frequency range, but also there is the voice data of high band, like this, the audio file of real high tone quality can be provided to user, enable user appreciate the audio file of real high tone quality.
Acoustic feature involved in the present invention and original acoustic feature, expansion acoustic feature, any one in first acoustics characteristic sum second acoustic feature, linear prediction (LinearPrediction Coding can be included but not limited to, LPC) coefficient, linear prediction residue error (Linear PredictionCepstrum Coefficient, LPCC), mel-frequency cepstrum coefficient (Mel Frequency CepstrumCoefficient, or perception linear prediction (Perceptual Linear Predictive MFCC), PLP) coefficient, the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, in 101, specifically by carrying out decoding process to the data block of pending audio file, described original audio data can be obtained.So-called original audio data is by the digital signal converted sound signal, such as, samples to described sound signal, quantizes and coded treatment, to obtain pulse code modulation (PCM) (Pulse CodeModulation, PCM) data.The detailed description of decoding process see related content of the prior art, can repeat no more herein.
Wherein, described pending audio file can comprise the audio file of various coded format in prior art, such as, dynamic image expert group (Moving Picture Experts Group, MPEG) layer 3 (MPEGLayer-3, MP3) formatted audio files, WMA (Windows Media Audio) formatted audio files, Advanced Audio Coding (Advanced Audio Coding, AAC) formatted audio files, Lossless Audio Compression coding (Free Lossless Audio Codec, or APE formatted audio files etc. FLAC), the present embodiment is not particularly limited this.
In the present embodiment, by performing 101, the described original audio data obtained, can original audio data corresponding to a sound channel, if there is multiple sound channel in audio file, specifically can to the original audio data corresponding to each sound channel, perform follow-up treatment scheme that is 102 ~ 103 all respectively.
In a concrete implementation procedure, specifically can determine the number of channels of described audio file, and decoding process is carried out, to obtain original audio data to the data block of described audio file.Then, then according to described number of channels and described original audio data, the original audio data corresponding to each sound channel can be obtained.
Such as, specifically can carry out dissection process to the frame head of described audio file, to determine the number of channels of described audio file.
Or more such as, specifically can carry out dissection process to the file header of described audio file, to determine the number of channels of described audio file.
Or more such as, specifically can carry out dissection process to other parts of audio file, to determine the number of channels of described audio file, the present embodiment is not particularly limited this.
Or more such as, specifically from configuration file, the number of channels of described audio file can also be obtained.
Be understandable that, " determine the number of channels of described audio file ", and " decoding process is carried out to the data block of described audio file, to obtain original audio data " two steps, there is no permanent order, the step that described treating apparatus can first perform " determining the number of channels of described audio file ", perform again and " decoding process is carried out to the data block of described audio file, to obtain original audio data " step, or first can also perform and " decoding process be carried out to the data block of described audio file, to obtain original audio data " step, perform the step of " number of channels determining described audio file " again, or this two steps can also be performed simultaneously, the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, in 102, specifically according to described original audio data, the original acoustic feature of described original audio data can be obtained, and then, then according to described original acoustic feature, described expansion acoustic feature can be obtained.Then, according to described expansion acoustic feature, described extended audio data can be obtained.
In a concrete technical scheme, specifically can carry out sub-frame processing to described original audio data, to obtain at least one frame data, and then acoustic analysis process be carried out, to obtain the original acoustic feature of every frame data to frame data every at least one frame data.
Such as, can to described original audio data according to prefixed time interval, such as, 20ms, carries out sub-frame processing, and has the data overlap of part between consecutive frame, the data overlap of such as 50%, like this, can obtain at least one frame data of described original audio data.
To carry out lpc analysis, concrete implementation procedure will be described in detail below.Lpc analysis is started with from the sound generating mechanism of people, by the research of the short tube cascade model to sound channel, think that system transter meets the form of full limit digital filter, thus n (n be greater than 0 number) signal in moment can estimate with the linear combination of the signal in front some moment.By making to reach Minimum Mean Square Error (Least Mean Square, LMS) between the sampled value of actual audio signal and linear prediction sampled value, LPC coefficient can be obtained.
Such as, if utilize P sampled value to predict, then the linear prediction of P rank is become.Suppose sampled value with a front P moment s (n-1), s (n-2) ..., s (n-P) } weighting sum, predict the sampled value s (n) that sound signal is current, then prediction signal for:
s ^ ( n ) = Σ k = 1 P a k × s ( n - k ) ;
Wherein, a krepresent weighting coefficient, be called LPC coefficient.
Predicated error e (n) is:
e ( n ) = s ( n ) - s ^ ( n ) = s ( n ) - Σ k = 1 P a k × s ( n - k ) ;
Make prediction best, then will make between the sampled value of sound signal in short-term and linear prediction sampled value, to reach Minimum Mean Square Error (Least Mean Square, LMS) ε minimum, namely
ϵ = E [ e 2 ( n ) ] = min ∂ [ e 2 ( n ) ] ∂ a k = 0 , ( 1 ≤ k ≤ P ) ;
Wherein, E [e 2(n)] be e 2the mathematical expectation of (n).
Make φ (i, k)=E [s (n-i), s (n-k)], minimum ε can be expressed as following form:
ϵ min = φ ( 0 , 0 ) - Σ k = 1 P a k × φ ( 0 , k )
The accuracy of linear prediction is the highest when least mean-square error, can calculate LPC coefficient thus.
In the technical scheme that another is concrete, specifically according to described original acoustic feature, the transformational relation between original acoustic feature and expansion acoustic feature can be utilized, obtains described expansion acoustic feature.
In a concrete implementation procedure, particularly, specifically for every frame data, according to described original acoustic feature, the transformational relation between original acoustic feature and expansion acoustic feature can be utilized, obtains described expansion acoustic feature.Like this, the expansion acoustic feature of some frame data is obtained.
In the implementation procedure that another is concrete, can further include the operation obtaining described transformational relation.Particularly, specifically can be obtained up to a few sample audio data, and then, then according to each sample audio data at least one sample audio data described, the first voice data of described each sample audio data and the second audio data of described each sample audio data can be obtained; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range.Then, according to described first voice data, the first acoustic feature of described each sample audio data can be obtained, and according to described second audio data, obtain the second acoustic feature of described each sample audio data.Then, then can according to the first acoustics characteristic sum of described each sample audio data the second acoustic feature of each sample audio data, utilize degree of deep learning algorithm, obtain described transformational relation.
Particularly, specifically can adopt wave filter, filtering process is carried out to each sample audio data, with the second audio data of the first voice data and described each sample audio data that obtain described each sample audio data.
Such as, adopt a bandpass filter, its passband can be described first signal frequency range, such as, 0 ~ 22050Hz, carries out filtering process to each sample audio data, to obtain the voice data of the low-frequency range of described each sample audio data, i.e. the first voice data.
Or, more such as, adopt a bandpass filter, its passband can be described secondary signal frequency range, such as, and 22050 ~ 48000Hz, filtering process is carried out to each sample audio data, to obtain the voice data of the high band of described each sample audio data, i.e. second audio data.
Particularly, specifically can also carry out sub-frame processing to each sample audio data, to obtain at least one frame data.Then, then to described at least one frame data, frequency domain conversion process can be carried out, to obtain the frequency domain data corresponding to every frame data.Then, according to the frequency domain data corresponding to every frame data, the first voice data corresponding to every frame data and second audio data is obtained.Wherein, described frequency domain conversion process can include but not limited to Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT), and the present embodiment is not particularly limited this.
Particularly, specifically can carry out acoustic analysis process to described first voice data, to obtain the first acoustic feature of described each sample audio data; And acoustic analysis process is carried out to described second audio data, to obtain the second acoustic feature of described each sample audio data.
The acoustic analysis process that this place is carried out, with acoustic analysis process described above, is identical processing mode.Detailed description see related content of the prior art, can repeat no more herein.
Be understandable that, " acoustic analysis process is carried out to described first voice data, to obtain the first acoustic feature of described each sample audio data ", and " acoustic analysis process is carried out to described second audio data, to obtain the second acoustic feature of described each sample audio data " two steps, there is no permanent order, described treating apparatus can first perform " carries out acoustic analysis process to described first voice data, to obtain the first acoustic feature of described each sample audio data " step, perform again and " acoustic analysis process is carried out to described second audio data, to obtain the second acoustic feature of described each sample audio data " step, or first can also perform and " acoustic analysis process be carried out to described second audio data, to obtain the second acoustic feature of described each sample audio data " step, perform again and " acoustic analysis process is carried out to described first voice data, to obtain the first acoustic feature of described each sample audio data " step, or this two steps can also be performed simultaneously, the present embodiment is not particularly limited this.
So-called degree of depth study, its concept comes from the research of artificial neural network.Multilayer perceptron containing many hidden layers is exactly a kind of degree of depth study structure.Degree of depth study forms more abstract high level by combination low-level feature and represents attribute classification or feature, to find that the distributed nature of data represents.
The same with machine learning algorithm, degree of depth machine learning algorithm also supervised learning and unsupervised learning point.The learning model set up under different learning frameworks is very different.Such as, convolutional neural networks (Convolutional neural networks, be called for short CNNs) be exactly machine learning model under a kind of supervised learning of the degree of depth, and degree of depth confidence net (Deep Belief Nets is called for short DBNs) is exactly the machine learning model under a kind of unsupervised learning.
In the technical scheme that another is concrete, specifically can according to the expansion acoustic feature of every frame data, the extended audio data corresponding to acquisition.Like this, then by extended audio data one by one, complete extended audio data can be reconsolidated into.
Specifically can adopt the inverse process of acoustic analysis process, according to the expansion acoustic feature of every frame data, the extended audio data corresponding to acquisition.Detailed description see related content of the prior art, can repeat no more herein.
Alternatively, in one of the present embodiment possible implementation, in 103, specifically addition process can be carried out to described original audio data and described extended audio data, to obtain described target audio data.
In the present embodiment, by according to obtained pending original audio data, obtain extended audio data, the frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range, make it possible to according to described original audio data and described extended audio data, obtain target audio data, due in voice data that the voice data of high band and extended audio data are increased to low-frequency range and original audio data, obtained target audio data are made no longer only to have the voice data of low-frequency range, but also there is the voice data of high band, like this, the audio file of real high tone quality can be provided to user, user is enable to appreciate the audio file of real high tone quality.
In addition, adopt technical scheme provided by the invention, simple to operate, effectively can improve the efficiency of the process of voice data.
It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
The structural representation of the treating apparatus of the voice data that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.The treating apparatus of the voice data of the present embodiment can comprise acquiring unit 21, feature unit 22 and processing unit 23.Wherein, acquiring unit 21, for obtaining pending original audio data; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range; Feature unit 22, for according to described original audio data, obtains extended audio data; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range; Processing unit 23, for according to described original audio data and described extended audio data, obtains target audio data.
Wherein, described first signal frequency range for being more than or equal to 0, and can be less than or equal to the first frequency range threshold value such as, 22050 hertz (Hz) etc., i.e. [0,22050Hz]; Described secondary signal frequency range then for being greater than described first frequency range threshold value, and can be less than or equal to the second frequency range threshold value such as, 48000Hz, namely (22050Hz, 48000Hz].
It should be noted that, the treating apparatus of the voice data that the present embodiment provides partly or entirely can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the Software Development Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this, and the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, described feature unit 22, specifically may be used for according to described original audio data, obtains the original acoustic feature of described original audio data; According to described original acoustic feature, obtain described expansion acoustic feature; And according to described expansion acoustic feature, obtain described extended audio data.
Particularly, described feature unit 22, specifically may be used for according to described original acoustic feature, utilizes the transformational relation between original acoustic feature and expansion acoustic feature, obtains described expansion acoustic feature.
Particularly, described feature unit 22, can also be further used for obtaining at least one sample audio data; According to each sample audio data at least one sample audio data described, obtain the first voice data of described each sample audio data and the second audio data of described each sample audio data; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range; According to described first voice data, obtain the first acoustic feature of described each sample audio data; According to described second audio data, obtain the second acoustic feature of described each sample audio data; And according to the first acoustics characteristic sum of described each sample audio data the second acoustic feature of each sample audio data, utilize degree of deep learning algorithm, obtain described transformational relation.
It should be noted that, method in the embodiment that Fig. 1 is corresponding, the treating apparatus of the voice data that can be provided by the present embodiment realizes.Detailed description see the related content in embodiment corresponding to Fig. 1, can repeat no more herein.
In the present embodiment, by the pending original audio data that feature unit obtains according to acquiring unit, obtain extended audio data, the frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range, make processing unit can according to described original audio data and described extended audio data, obtain target audio data, due in voice data that the voice data of high band and extended audio data are increased to low-frequency range and original audio data, obtained target audio data are made no longer only to have the voice data of low-frequency range, but also there is the voice data of high band, like this, the audio file of real high tone quality can be provided to user, user is enable to appreciate the audio file of real high tone quality.
In addition, adopt technical scheme provided by the invention, simple to operate, effectively can improve the efficiency of the process of voice data.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, audio frequency processing engine, or network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (12)

1. a disposal route for voice data, is characterized in that, comprising:
Obtain pending original audio data; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range;
According to described original audio data, obtain extended audio data; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range;
According to described original audio data and described extended audio data, obtain target audio data.
2. method according to claim 1, is characterized in that, described first signal frequency range for being more than or equal to 0, and is less than or equal to the first frequency range threshold value; Described secondary signal frequency range for being greater than described first frequency range threshold value, and is less than or equal to the second frequency range threshold value.
3. method according to claim 1, is characterized in that, described according to described original audio data, obtains extended audio data, comprising:
According to described original audio data, obtain the original acoustic feature of described original audio data;
According to described original acoustic feature, obtain described expansion acoustic feature;
According to described expansion acoustic feature, obtain described extended audio data.
4. method according to claim 3, is characterized in that, described according to described original acoustic feature, obtains described expansion acoustic feature, comprising:
According to described original acoustic feature, utilize the transformational relation between original acoustic feature and expansion acoustic feature, obtain described expansion acoustic feature.
5. method according to claim 4, is characterized in that, described according to described original acoustic feature, utilizes the transformational relation between original acoustic feature and expansion acoustic feature, after obtaining described expansion acoustic feature, also comprises:
Obtain at least one sample audio data;
According to each sample audio data at least one sample audio data described, obtain the first voice data of described each sample audio data and the second audio data of described each sample audio data; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range;
According to described first voice data, obtain the first acoustic feature of described each sample audio data;
According to described second audio data, obtain the second acoustic feature of described each sample audio data;
According to the first acoustics characteristic sum of described each sample audio data, the second acoustic feature of each sample audio data, utilizes degree of deep learning algorithm, obtains described transformational relation.
6. the method according to the arbitrary claim of claim 3 ~ 5, is characterized in that, described acoustic feature comprises linear prediction LPC coefficient, linear prediction residue error LPCC, mel-frequency cepstrum coefficient MFCC or perception linear prediction PLP coefficient.
7. a treating apparatus for voice data, is characterized in that, comprising:
Acquiring unit, for obtaining pending original audio data; The frequency range of the sound signal corresponding to described original audio data is the first signal frequency range;
Feature unit, for according to described original audio data, obtains extended audio data; The frequency range of the sound signal corresponding to described extended audio data is secondary signal frequency range; Described secondary signal frequency range is higher than described first signal frequency range;
Processing unit, for according to described original audio data and described extended audio data, obtains target audio data.
8. device according to claim 7, is characterized in that, described first signal frequency range for being more than or equal to 0, and is less than or equal to the first frequency range threshold value; Described secondary signal frequency range for being greater than described first frequency range threshold value, and is less than or equal to the second frequency range threshold value.
9. device according to claim 7, is characterized in that, described feature unit, specifically for
According to described original audio data, obtain the original acoustic feature of described original audio data;
According to described original acoustic feature, obtain described expansion acoustic feature; And
According to described expansion acoustic feature, obtain described extended audio data.
10. device according to claim 9, is characterized in that, described feature unit, specifically for
According to described original acoustic feature, utilize the transformational relation between original acoustic feature and expansion acoustic feature, obtain described expansion acoustic feature.
11. devices according to claim 10, is characterized in that, described feature unit, also for
Obtain at least one sample audio data;
According to each sample audio data at least one sample audio data described, obtain the first voice data of described each sample audio data and the second audio data of described each sample audio data; The frequency range of the sound signal corresponding to described first voice data is described first signal frequency range; The frequency range of the sound signal corresponding to described second audio data is described secondary signal frequency range;
According to described first voice data, obtain the first acoustic feature of described each sample audio data;
According to described second audio data, obtain the second acoustic feature of described each sample audio data; And
According to the first acoustics characteristic sum of described each sample audio data, the second acoustic feature of each sample audio data, utilizes degree of deep learning algorithm, obtains described transformational relation.
12. devices according to the arbitrary claim of claim 7 ~ 11, it is characterized in that, described acoustic feature comprises linear prediction LPC coefficient, linear prediction residue error LPCC, mel-frequency cepstrum coefficient MFCC or perception linear prediction PLP coefficient.
CN201510069567.9A 2015-02-10 2015-02-10 Audio data processing method and device Pending CN104715756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510069567.9A CN104715756A (en) 2015-02-10 2015-02-10 Audio data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510069567.9A CN104715756A (en) 2015-02-10 2015-02-10 Audio data processing method and device

Publications (1)

Publication Number Publication Date
CN104715756A true CN104715756A (en) 2015-06-17

Family

ID=53415018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510069567.9A Pending CN104715756A (en) 2015-02-10 2015-02-10 Audio data processing method and device

Country Status (1)

Country Link
CN (1) CN104715756A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106057220A (en) * 2016-05-19 2016-10-26 Tcl集团股份有限公司 Audio signal high frequency expansion method and audio frequency player
CN109791772A (en) * 2016-09-27 2019-05-21 松下知识产权经营株式会社 Audio-signal processing apparatus, audio signal processing method and control program
CN111863027A (en) * 2019-04-24 2020-10-30 北京京东尚科信息技术有限公司 Method, device and system for processing audio

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116698A1 (en) * 2000-05-05 2002-08-22 Marc Lurie Method for distributing, integrating, and hosting a software platform
CN101162584A (en) * 2006-09-18 2008-04-16 三星电子株式会社 Method and apparatus to encode and decode audio signal by using bandwidth extension technique
CN101789239A (en) * 2009-01-23 2010-07-28 奥迪康有限公司 Audio processing in a portable listening device
US20110257980A1 (en) * 2010-04-14 2011-10-20 Huawei Technologies Co., Ltd. Bandwidth Extension System and Approach
CN102543089A (en) * 2012-01-17 2012-07-04 大连理工大学 Conversion device for converting narrowband code streams into broadband code streams and conversion method thereof
CN102637436A (en) * 2011-02-09 2012-08-15 索尼公司 Sound signal processing apparatus, sound signal processing method, and program
CN103093757A (en) * 2012-01-17 2013-05-08 大连理工大学 Conversion method for conversion from narrow-band code stream to wide-band code stream

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116698A1 (en) * 2000-05-05 2002-08-22 Marc Lurie Method for distributing, integrating, and hosting a software platform
CN101162584A (en) * 2006-09-18 2008-04-16 三星电子株式会社 Method and apparatus to encode and decode audio signal by using bandwidth extension technique
CN101789239A (en) * 2009-01-23 2010-07-28 奥迪康有限公司 Audio processing in a portable listening device
US20110257980A1 (en) * 2010-04-14 2011-10-20 Huawei Technologies Co., Ltd. Bandwidth Extension System and Approach
CN102637436A (en) * 2011-02-09 2012-08-15 索尼公司 Sound signal processing apparatus, sound signal processing method, and program
CN102543089A (en) * 2012-01-17 2012-07-04 大连理工大学 Conversion device for converting narrowband code streams into broadband code streams and conversion method thereof
CN103093757A (en) * 2012-01-17 2013-05-08 大连理工大学 Conversion method for conversion from narrow-band code stream to wide-band code stream

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106057220A (en) * 2016-05-19 2016-10-26 Tcl集团股份有限公司 Audio signal high frequency expansion method and audio frequency player
CN106057220B (en) * 2016-05-19 2020-01-03 Tcl集团股份有限公司 High-frequency extension method of audio signal and audio player
CN109791772A (en) * 2016-09-27 2019-05-21 松下知识产权经营株式会社 Audio-signal processing apparatus, audio signal processing method and control program
CN109791772B (en) * 2016-09-27 2023-07-04 松下知识产权经营株式会社 Sound signal processing device, sound signal processing method, and recording medium
CN111863027A (en) * 2019-04-24 2020-10-30 北京京东尚科信息技术有限公司 Method, device and system for processing audio

Similar Documents

Publication Publication Date Title
Bhat et al. A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone
CN104538011A (en) Tone adjusting method and device and terminal device
WO2011128723A1 (en) Audio communication device, method for outputting an audio signal, and communication system
CN111370019A (en) Sound source separation method and device, and model training method and device of neural network
CN104036788B (en) The acoustic fidelity identification method of audio file and device
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN105448302A (en) Environment adaptive type voice reverberation elimination method and system
CN114267372A (en) Voice noise reduction method, system, electronic device and storage medium
CN113539297A (en) Combined attention mechanism model and method for sound classification and application
CN113823323B (en) Audio processing method and device based on convolutional neural network and related equipment
CN104715756A (en) Audio data processing method and device
Thomas et al. Acoustic and data-driven features for robust speech activity detection
CN113470688B (en) Voice data separation method, device, equipment and storage medium
Mandel et al. Audio super-resolution using concatenative resynthesis
US20190172477A1 (en) Systems and methods for removing reverberation from audio signals
CN104882146A (en) Method and device for processing audio popularization information
CN112735466A (en) Audio detection method and device
Zhan et al. Audio post-processing detection and identification based on audio features
Joy et al. Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Medhi et al. Isolated assamese speech recognition using artificial neural network
CN105336327B (en) The gain control method of voice data and device
CN115116469A (en) Feature representation extraction method, feature representation extraction device, feature representation extraction apparatus, feature representation extraction medium, and program product
CN114333891A (en) Voice processing method and device, electronic equipment and readable medium
CN117649846B (en) Speech recognition model generation method, speech recognition method, device and medium
CN114582367B (en) Music reverberation intensity estimation method and device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160321

Address after: 100027 Haidian District, Qinghe Qinghe East Road, No. 23, building two, floor 2108, No., No. 18

Applicant after: BEIJING YINZHIBANG CULTURE TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District, No. ten on the street Baidu building, No. 10

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication

Application publication date: 20150617

RJ01 Rejection of invention patent application after publication