CN106960672A - The bandwidth expanding method and device of a kind of stereo audio - Google Patents
The bandwidth expanding method and device of a kind of stereo audio Download PDFInfo
- Publication number
- CN106960672A CN106960672A CN201710203054.1A CN201710203054A CN106960672A CN 106960672 A CN106960672 A CN 106960672A CN 201710203054 A CN201710203054 A CN 201710203054A CN 106960672 A CN106960672 A CN 106960672A
- Authority
- CN
- China
- Prior art keywords
- sound wave
- sound
- short
- spectrum
- direct sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
- G10L21/0388—Details of processing therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
Abstract
The invention discloses a kind of bandwidth expanding method of stereo audio and device.This method includes:Stereophonic signal is decomposed into direct sound wave and diffusion sound;Bandwidth expansion is carried out to diffusion sound according to default frequency expansion method;Direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources, multiple point sound sources after bandwidth expansion are obtained;Multiple point sound sources after bandwidth expansion are re-mixed according to the azimuth information pre-estimated, the direct sound wave after bandwidth expansion is obtained;Wide stereo audio signal is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after bandwidth expansion.By means of technical scheme, solve and realize extension to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal in the prior art, the correlation of signal energy and phase in two sound channels is not accounted for, it is rebuild stereophonic signal and has had a strong impact on the problem of hearer is to the judgement of sound source position and distance.
Description
Technical field
The present invention relates to application of net field, the bandwidth expanding method and dress of more particularly to a kind of stereo audio
Put.
Background technology
In digital audio and video signals treatment technology, the whole frequency models of the appreciable 20Hz~20KHz of human ear will be generally covered
Audio signal in enclosing is referred to as full band audio, and the high-fidelity that this kind of signal is mainly used in music signal is reappeared.Sound at this stage
Frequency instantaneous communication system can not provide enough network transmission speed and terminal processing capacity, and letter is rebuild in inevitably limitation
Number effective bandwidth, the low-frequency component of preferential quantization encoding audio signal, so lifted audio communication system code efficiency.
What black phone voice communication system was generally transmitted is narrow band signal, and its frequency distribution is in 300~3400Hz scopes
Interior, sample rate is 8kHz.Related Subjective audiometry result shows, remained in narrowband speech 91% syllable intelligibility and
99% sentence intelligibility.But compared to real speech, naturalness and the master that narrow band signal is transmitted in actually call
Appearance quality is decreased obviously.Due to the missing of radio-frequency component, narrowband speech can not fairway branch point voiceless sound or explosion
Sound, and weaken its ability for describing speaker's characteristic.For the deficiency efficiently against narrowband audio, wideband audio is extensive
It has been applied in voice call communication field, its effective bandwidth expands to 50Hz~7kHz, has preferably covered sign voice letter
Most of frequency spectrum of number key property, realizes the level of sound quality close to amplitude modulation broadcasting.But by history, economy, technology etc.
The limitation of problems, traditional fixation and mobile communication fully achieve striding forward from arrowband to wideband audio also need to it is considerably long
One section of transitional period.
As a kind of effective audio Enhancement Method, frequency expansion method can not change narrow band signal message sink coding and
On the premise of network transmission, by analyzing the time-frequency characteristic of original audio signal, in receiving terminal from the wideband audio of reconstruction people
The radio-frequency component that coding side is clipped is recovered by ground, and then reaches that the purpose of audio acoustical quality is rebuild in enhancing.For hearing
Personage is damaged, frequency expansion method can further improve its phoneme and semantic resolution capability.In recent ten years, many researchs
Mechanism proposes numerous solutions in succession with scientific research personnel for the bandspreading of monophonic voices signal.These methods are usual
Respectively two aspects, and then composite signal radio-frequency component, its principle such as Fig. 1 are extended from spectrum envelope extension and frequency spectrum details
It is shown.Time-frequency characteristics extraction is carried out to narrow band signal according to human auditory system perception principle first;Next, by side information or
Mapping relations between low-and high-frequency feature described by priori are estimated come spectrum envelope and energy to radio-frequency component;Together
When, select appropriate frequency spectrum method for repairing and mending to carry out spread-spectrum details;Finally, with reference to the spectrum envelope after extension and frequency spectrum details,
Realize effective reconstruction of wideband audio signal radio-frequency component.
Radio-frequency component is carried out for stereo audio, more than legacy band extended method for two sound channels independently to rebuild, this
Class method realizes the extension to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal, does not account for two sound channels
The correlation of middle signal energy and phase, its reconstruction stereophonic signal has had a strong impact on hearer and sound source position and distance has been sentenced
It is fixed.
The content of the invention
In view of the above problems, the invention provides a kind of bandwidth expanding method of stereo audio and device.
The bandwidth expanding method for the stereo audio that the present invention is provided, comprises the following steps:
Stereophonic signal is decomposed into direct sound wave and diffusion sound;
Bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method;
The direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources,
Obtain multiple point sound sources after bandwidth expansion;
Multiple point sound sources after the bandwidth expansion are re-mixed according to the azimuth information pre-estimated, band is obtained
Direct sound wave after width extension;
Wide stereo audio is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion
Signal.
Present invention also offers a kind of bandwidth expansion means of stereo audio, including:Decomposing module, diffusion sound expanded mode
Block, direct sound wave separation and expansion module, reconstructed module;
The decomposing module, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method
Exhibition;
The direct sound wave separation and expansion module, the point sound source for the direct sound wave to be separated into multiple different azimuths,
Bandwidth expansion is carried out respectively to multiple point sound sources, multiple point sound sources after bandwidth expansion are obtained;
The reconstructed module, for multiple point sound sources after the bandwidth expansion to be entered according to the azimuth information pre-estimated
Row is re-mixed, and the direct sound wave after bandwidth expansion is obtained, after the direct sound wave combination bandwidth expansion after the bandwidth expansion
Diffusion sound reconstructs wide stereo audio signal.
The present invention has the beneficial effect that:
The embodiment of the present invention first with the frequency spectrum correlation between sound channel by input stereo audio signal decomposition be direct sound wave and
Two kinds of compositions of diffusion sound, then diffusion sound composition be directly extended using legacy band extended method;Direct sound wave is not then according to
With the openness point sound source that is separated into multiple different azimuths of the sound source on time-frequency structure, and bandwidth expansion is carried out respectively, finally
According to it, the azimuth information in original stereo is re-mixed point sound source after extension, and combines the diffusion after bandwidth expansion
Sound composition, reconstructs wide stereo audio signal.The present invention is solved in the prior art according only to single sound channel reconstruction signal
Subjective quality realize extension to signal bandwidth, do not account for the correlation of signal energy and phase in two sound channels, its
Rebuild stereophonic signal and have a strong impact on the problem of hearer is to the judgement of sound source position and distance.
Brief description of the drawings
Fig. 1 is the basic flow sheet of monophonic voices signal band extended method in the prior art;
Fig. 2 is the flow chart of the bandwidth expanding method of the stereo audio of the inventive method embodiment;
Fig. 3 is the structural representation of the bandwidth expansion means of the stereo audio of apparatus of the present invention embodiment;
Fig. 4 is the theory diagram of the bandwidth expanding method of the stereo audio of present example 1;
Fig. 5 is the theory diagram of the state-space model based on deep neural network in present example 1.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
In order to solve to realize expansion to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal in the prior art
Exhibition, does not account for the correlation of signal energy and phase in two sound channels, and it is rebuild stereophonic signal and has had a strong impact on hearer
The problem of to the judgement of sound source position and distance, the invention provides a kind of bandwidth expanding method of stereo audio and device,
Below in conjunction with accompanying drawing and embodiment, the present invention will be described in further detail.It should be appreciated that specific reality described herein
Example is applied only to explain the present invention, does not limit the present invention.
The method according to the invention embodiment is there is provided a kind of bandwidth expanding method of stereo audio, and Fig. 1 is the present invention
The flow chart of the bandwidth expanding method of the stereo audio of embodiment of the method, as shown in figure 1, according to the inventive method embodiment
The bandwidth expanding method of stereo audio includes following processing:
Step 201, stereophonic signal is decomposed into direct sound wave and diffusion sound.
Specifically, step 201 comprises the following steps:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel for obtaining stereophonic signal is short
Time-frequency spectrum composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy is obtained
Between amount spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation between left and right sound track signals energy spectrum
Pcc;
Utilize the Psum、PdiffAnd PccDirect sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave, which is subtracted, using the stereophonic signal obtains diffusion sound.
More specific, it is described respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, obtain
To between left and right sound track signals energy spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, left and right sound track signals energy
Cross-correlation P between amount spectrumccIncluding:
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs
Formula Psum=| SL(t,f)|2+|SR(t,f)|2Between calculating left and right sound track signals energy spectrum and Psum;
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs
Formula Pdiff=| SL(t,f)|2-|SR(t,f)|2Calculate the poor P between left and right sound track signals energy spectrumdiff;
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs
Formula Pcc=R { SL(t,f)SR *(t, f) } calculate left and right sound track signals energy spectrum between cross-correlation Pcc, wherein R { } is to take real part
Operation.
More specific, it is described that direct sound wave is isolated from stereophonic signal using the direct sound wave matrix, including:
Utilize the direct sound wave matrix MD(t, f) isolates direct sound wave S' according to formula 1 from stereophonic signal S (t, f)
(t,f);
S ' (t, f)=MD(t, f) [SL(t, f) SR(t, f)]TFormula 1.
Step 202, bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method.
Specifically, step 202 directly carries out bandwidth expansion, this hair using traditional frequency expansion method to the diffusion sound
It is bright not repeat.
Step 203, the direct sound wave is separated into the point sound source of multiple different azimuths, band is carried out respectively to multiple point sound sources
Width extension, obtains multiple point sound sources after bandwidth expansion.
Specifically, the direct sound wave is separated into the point sound source of multiple different azimuths in step 203, including:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, and is obtained
To the cluster centre of directional information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, obtain sheltering square
Battle array;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
Multiple point sound sources are carried out with bandwidth expansion respectively specifically, described, including:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and width of narrow band signal
Mapping relations between the short-term spectrum of band signal, and according to default error criterion to broadband signal short-term spectrum radio-frequency component
Spectrum envelope estimated, with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending extend after frequency spectrum details, obtain
Multiple point sound sources after to bandwidth expansion.
More specific, the short-term spectrum that narrow band signal is fitted in the state-space model and broadband signal
Mapping relations between short-term spectrum, and the spectrum envelope of radio-frequency component is estimated according to default error criterion, including:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, state-space model is obtained
Middle hidden state vector;
Using the short-term spectrum of hidden state vector current time narrow band signal in the state-space model, width is obtained
The short-term spectrum of band signal.
Step 204, the point sound source after the multiple bandwidth expansion is re-mixed according to default azimuth information, obtained
Direct sound wave after to bandwidth expansion, width is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion
Band stereo audio signal.
Specifically, the azimuth information pre-estimated is obtained according to the estimation of the cluster centre of the directional information, it is described
The method of estimation is ordinary skill in the art means, and the present invention is not repeated this.
Specifically, being built using formula 2 according to the diffusion after the direct sound wave combination bandwidth expansion after the bandwidth expansion is low voice speaking
Go out wide stereo audio signal;
In formula 2,Represent the short-term spectrum of stereophonic signal after wideband extension;Represent broadband
The short-term spectrum of direct sound wave after extension;Represent the short-term spectrum of diffusion sound after bandwidth expansion.
It is corresponding with method of the present invention embodiment there is provided a kind of bandwidth expansion means of stereo audio, Fig. 3 is this
The structural representation of the bandwidth expansion means of the stereo audio of invention device embodiment, as shown in figure 3, according to apparatus of the present invention
The bandwidth expansion means of the stereo audio of embodiment include:Decomposing module 30, diffusion sound expansion module 32, direct sound wave separation with
The modules of the embodiment of the present invention are described in detail by expansion module 34, reconstructed module 36 below.
Specifically, the decomposing module 30, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module 32, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method
Exhibition;
The direct sound wave separation and expansion module 34, the point sound for the direct sound wave to be separated into multiple different azimuths
Multiple point sound sources are carried out bandwidth expansion by source respectively, obtain multiple point sound sources after bandwidth expansion;
The reconstructed module 36, for by multiple point sound sources after the bandwidth expansion according to the azimuth information pre-estimated
Re-mixed, obtain the direct sound wave after bandwidth expansion, for being expanded according to the direct sound wave combination bandwidth after the bandwidth expansion
Diffusion sound after exhibition reconstructs wide stereo audio signal.
The decomposing module 30 specifically for:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel for obtaining stereophonic signal is short
Time-frequency spectrum composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy is obtained
Between amount spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation between left and right sound track signals energy spectrum
Pcc;
Utilize the Psum、PdiffAnd PccDirect sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave, which is subtracted, using the stereophonic signal obtains diffusion sound.
Direct sound wave separation and expansion module 34 specifically for:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, and is obtained
To the cluster centre of directional information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, obtain sheltering square
Battle array;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
Direct sound wave separation and expansion module 34 specifically for:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and width of narrow band signal
Mapping relations between the short-term spectrum of band signal, and according to default error criterion to broadband signal short-term spectrum radio-frequency component
Spectrum envelope estimated, with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending extend after frequency spectrum details, most
The direct sound wave after bandwidth expansion is obtained eventually.
For more detailed explanation technical scheme, example 1 is provided, Fig. 4 is the stereo of present example 1
The theory diagram of the bandwidth expanding method of audio, as shown in figure 4, a kind of bandwidth expanding method of stereo audio includes following step
Suddenly:
1. direct sound wave/diffusion sound separation
Proposed stereo widening systems will be divided using discrete Fourier transform or quadrature mirror filter group
Left and right acoustic channels audio signal after frame is each transformed into frequency domain, and is divided into multiple uniform subbands according to human auditory system perception principle
Or critical band.So, the short-term spectrum S (t, f) of input stereo audio signal can be expressed as S (t, f)=[SL(t,f)SR(t,
f)]T
Wherein, t and f represent the time frame and sub-band serial number of signal respectively;SL(t, f) and SR(t, f) then represents three-dimensional respectively
The left and right acoustic channels short-term spectrum composition of acoustical signal.
In order to efficiently separate direct sound wave and diffusion sound, system also needs to calculate respectively between left and right sound track signals energy spectrum
And PsumWith poor PdiffAnd the cross-correlation P of two sound channelscc。
Psum=| SL(t,f)|2+|SR(t,f)|2
Pdiff=| SL(t,f)|2-|SR(t,f)|2
Pcc=R { SL(t,f)SR *(t,f)}
Wherein, R { } is to take real part to operate.In order to improve the stability of separation algorithm, the P obtained respectively to calculatingsum、
PdiffAnd PccCarry out time smoothing.
Height correlation between direct sound wave composition in stereo left and right acoustic channels, and be represented by by a direction propagation Lai
Point sound source signal.Accordingly, system is put forward herein utilize a direct sound wave matrix from original stereo binaural signal S (t, f)
Direct sound wave composition S'(t is directly separated out, f), is shown below,
S'(t, f)=[SL'(t,f)SR'(t,f)]T=MD(t,f)[SL(t,f)SR(t,f)]T=MD(t,f)S(t,f)
Wherein, SL' (t, f) and SR' (t, f) respectively represent direct sound wave left and right acoustic channels short-term spectrum composition, MD(t, f) is
Direct sound wave matrix.According to document【M Vinton,D McGrath,C Robinson,P Brown,next generation
surround decoding and upmixing for consumer ad professional applications.AES
57thInternational conference,USA,2015】It is described, direct sound wave matrix MD(t, f) can utilize least square
Method is obtained, so that the expectation square error between direct sound wave composition and true composition that estimation is obtained is minimum, i.e.,
Then direct sound wave matrix MD(t, f) can be calculated by following formula and obtained,
And diffusion sound composition S " (t, f) can then be expressed as original stereo signal and direct sound wave into the difference divided,
S " (t, f)=S (t, f)-S'(t, f)
2. the Sound seperation of direct sound wave composition
According to S'(t, f)=[SL'(t,f)SR'(t,f)]TDirect sound wave S'(t in a certain time frequency point is obtained using formula 3, f)
Directional information θ (t, f), direct sound wave S'(t in a certain time frequency point, the direction of directional information θ (t, f) and point sound source f)
Information θiIt is identical;
The directional information θ (t, f) of whole time frequency points is clustered, the cluster centre C of directional information is obtainedi, i=1,
2…N;These cluster centres correspond to each point sound source S respectively1(t,f)、S2(t,f)、S3(t,f)…SNThe directional information of (t, f)
θ1、θ2、θ3…θN;
According to direct sound wave S'(t in a certain time frequency point, directional information θ (t, f) and cluster centre C f)iObtain masking matrix
mi(t,f);
Utilize the masking matrix mi(t, f), to direct sound wave S'(t, f) is separated according to formula 4, obtains through point sound
Source
3. bandwidth expansion
According to method as described above, isolated respectively from stereophonic signal diffusion sound composition S " (t, f) and direct sound wave into
Divide S'(t, f), and it is openness further by direct sound wave composition S'(t using time-frequency, f) it is separated into multiple point sound sourcesNext can be according to monophonic frequency expansion method respectively to diffusion sound S " (t, f)
With through point sound sourceCarry out independent bandwidth expansion.
This paper adoption status spatial model expands come the mapping relations that are directly fitted between narrow broader frequency spectrum parameter actual
The spectrum envelope of radio-frequency component is estimated according to certain error criterion in exhibition,
SY(t, f)=F [SX(t,f)]
In formula, SX(t, f) and SY(t, f) represents the short-term spectrum of arrowband and broadband signal respectively, F [] represent mapping (or
Estimation) function.
According to state-space model, mapping function F [] can be by state evolution function Fstate[] and observation function Fobs[]
Two processes are described, and are shown below,
Shidden(t, f)=Fstate[Shidden(t-1,f),SX(t-1,f),N1(t,f)]
SY(t, f)=Fobs[Shidden(t,f),SX(t,f),N2(t,f)]
Wherein, Shidden(t, f) is hidden state vector, N in model1(t, f) and N2(t, f) describes state evolution letter respectively
Number FstateWith observation function FobsError.In above-mentioned model, the hidden state vector S at current timehidden(t, f) by it is previous when
Carve hidden state vector Shidden(t-1, f) with the short-term spectrum S of previous moment narrow band signalX(t-1, f) is determined, and current
Moment broadband signal short-term spectrum SY(t, f) is then further by current time hidden state vector Shidden(t, f) and current time
The short-term spectrum S of narrow band signalX(t, f) is determined.The hidden state recursive structure contained in utilization state spatial model can be more
Plus the complex mapping relation between narrow broader frequency spectrum parameter is accurately fitted, the model can use generalized kalman filtering method
Realize, it would however also be possible to employ two separate deep neural networks are realized.State space mould based on deep neural network
Type general principle is as shown in Figure 5.Herein, state evolution function FstateWith observation function FobsCan be using storehouse self-encoding encoder, many
Layer perceptron, delay Recursive Networks, the long various forward directions such as memory network and depth of recursion neural fusion in short-term.
4. stereophonic signal is synthesized
Can be respectively to diffusion sound S " (t, f) and through point sound source using monophonic frequency expansion method
2 ..., N are extended, so as to obtain corresponding broader frequency spectrum SY(t,f).Next, it is possible to use believe in each point sound source direction
Cease θiTo reappear broadband direct sound wave
Wherein,For the point sound source broader frequency spectrum after extension.To expand
The short-term spectrum of broadband direct sound wave after exhibition.Finally, with reference to the broadband diffusion sound after extensionWide stereo can be realized
SignalReproduction,
The embodiment of the present invention first with the frequency spectrum correlation between sound channel by input stereo audio signal decomposition be direct sound wave and
Two kinds of compositions of diffusion sound, then diffusion sound composition be directly extended using legacy band extended method;Direct sound wave is not then according to
With the openness point sound source that is separated into multiple different azimuths of the sound source on time-frequency structure, and bandwidth expansion is carried out respectively, finally
According to it, the azimuth information in original stereo is re-mixed point sound source after extension, and combines the diffusion after bandwidth expansion
Sound composition, reconstructs wide stereo audio signal.The present invention is solved in the prior art according only to single sound channel reconstruction signal
Subjective quality realize extension to signal bandwidth, do not account for the correlation of signal energy and phase in two sound channels, its
Rebuild stereophonic signal and have a strong impact on the problem of hearer is to the judgement of sound source position and distance.
Embodiments of the invention are the foregoing is only, are not intended to limit the invention, for those skilled in the art
For member, the present invention can have various modifications and variations.Any modification within the spirit and principles of the invention, being made,
Equivalent, improvement etc., should be included within scope of the presently claimed invention.
Claims (10)
1. a kind of bandwidth expanding method of stereo audio, it is characterised in that including:
Stereophonic signal is decomposed into direct sound wave and diffusion sound;
Bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method;
The direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources, obtained
Multiple point sound sources after bandwidth expansion;
Multiple point sound sources after the bandwidth expansion are re-mixed according to the azimuth information pre-estimated, bandwidth expansion is obtained
Direct sound wave after exhibition, wide stereo is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion
Audio signal.
2. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that described by stereophonic signal point
Solve as direct sound wave and diffusion sound, including:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel of stereophonic signal frequency in short-term is obtained
Compose composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy spectrum is obtained
Between and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation P between left and right sound track signals energy spectrumcc;
Utilize the Psum、PdiffAnd Pcc, direct sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave is rejected in the stereophonic signal, diffusion sound is obtained.
3. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that
The point sound source that the direct sound wave is separated into multiple different azimuths, including:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, the side of obtaining
To the cluster centre of information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, masking matrix is obtained;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
4. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that
It is described that multiple point sound sources are carried out with bandwidth expansion respectively, including:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and broadband letter of narrow band signal
Number short-term spectrum between mapping relations, and according to frequency of the default error criterion to broadband signal short-term spectrum radio-frequency component
Spectrum envelope is estimated that the frequency spectrum details after being extended with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending obtains band
Multiple point sound sources after width extension.
5. the bandwidth expanding method of stereo audio as claimed in claim 4, it is characterised in that
It is described to be fitted in the state-space model between the short-term spectrum of narrow band signal and the short-term spectrum of broadband signal
Mapping relations, and the spectrum envelope of radio-frequency component is estimated according to default error criterion, including:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, the preset state space is obtained
Hidden state vector in model;
Using the short-term spectrum of the hidden state vector current time narrow band signal in the preset state spatial model, obtain
The short-term spectrum of broadband signal.
6. a kind of bandwidth expansion means of stereo audio, it is characterised in that including:It is decomposing module, diffusion sound expansion module, straight
Up to sound separation and expansion module, reconstructed module;
The decomposing module, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method;
The direct sound wave separation and expansion module, the point sound source for the direct sound wave to be separated into multiple different azimuths, to many
Individual point sound source carries out bandwidth expansion respectively, obtains multiple point sound sources after bandwidth expansion;
The reconstructed module, for multiple point sound sources after the bandwidth expansion to be weighed according to the azimuth information pre-estimated
New mixing, obtains the direct sound wave after bandwidth expansion, according to the diffusion after the direct sound wave combination bandwidth expansion after the bandwidth expansion
Sound reconstructs wide stereo audio signal.
7. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the decomposing module is specifically used
In:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel of stereophonic signal frequency in short-term is obtained
Compose composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy spectrum is obtained
Between and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation P between left and right sound track signals energy spectrumcc;
Utilize the Psum、PdiffAnd Pcc, direct sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave is rejected in the stereophonic signal and obtains diffusion sound.
8. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the direct sound wave separation is with expanding
Open up module specifically for:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, the side of obtaining
To the cluster centre of information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, masking matrix is obtained;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
9. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the direct sound wave separation is with expanding
Open up module specifically for:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and broadband letter of narrow band signal
Number short-term spectrum between mapping relations, and according to frequency of the default error criterion to broadband signal short-term spectrum radio-frequency component
Spectrum envelope is estimated that the frequency spectrum details after being extended with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending obtains band
Direct sound wave after width extension.
10. the bandwidth expansion means of stereo audio as claimed in claim 9, it is characterised in that the direct sound wave separation with
Expansion module specifically for:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, the preset state space is obtained
Hidden state vector in model;
Using the short-term spectrum of the hidden state vector current time narrow band signal in the preset state spatial model, obtain
The short-term spectrum of broadband signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710203054.1A CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710203054.1A CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106960672A true CN106960672A (en) | 2017-07-18 |
CN106960672B CN106960672B (en) | 2020-08-21 |
Family
ID=59470575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710203054.1A Expired - Fee Related CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106960672B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108152788A (en) * | 2017-12-22 | 2018-06-12 | 西安Tcl软件开发有限公司 | Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium |
WO2019085914A1 (en) * | 2017-10-30 | 2019-05-09 | 捷开通讯(深圳)有限公司 | Terminal, voice command optimization method therefor and storage apparatus |
CN109975762A (en) * | 2017-12-28 | 2019-07-05 | 中国科学院声学研究所 | A kind of underwater sound source localization method |
CN110751956A (en) * | 2019-09-17 | 2020-02-04 | 北京时代拓灵科技有限公司 | Immersive audio rendering method and system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5222059A (en) * | 1988-01-06 | 1993-06-22 | Lucasfilm Ltd. | Surround-sound system with motion picture soundtrack timbre correction, surround sound channel timbre correction, defined loudspeaker directionality, and reduced comb-filter effects |
CN101518083A (en) * | 2006-09-22 | 2009-08-26 | 三星电子株式会社 | Method, medium, and system encoding and/or decoding audio signals by using bandwidth extension and stereo coding |
CN102572676A (en) * | 2012-01-16 | 2012-07-11 | 华南理工大学 | Real-time rendering method for virtual auditory environment |
CN102859590A (en) * | 2010-02-24 | 2013-01-02 | 弗劳恩霍夫应用研究促进协会 | Apparatus for generating an enhanced downmix signal, method for generating an enhanced downmix signal and computer program |
EP2645748A1 (en) * | 2012-03-28 | 2013-10-02 | Thomson Licensing | Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal |
EP2782094A1 (en) * | 2013-03-22 | 2014-09-24 | Thomson Licensing | Method and apparatus for enhancing directivity of a 1st order Ambisonics signal |
EP2884491A1 (en) * | 2013-12-11 | 2015-06-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Extraction of reverberant sound using microphone arrays |
CN104781880A (en) * | 2012-09-03 | 2015-07-15 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for providing informed multichannel speech presence probability estimation |
CN104919822A (en) * | 2012-11-15 | 2015-09-16 | 弗兰霍菲尔运输应用研究公司 | Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup |
CN106531179A (en) * | 2015-09-10 | 2017-03-22 | 中国科学院声学研究所 | Multi-channel speech enhancement method based on semantic prior selective attention |
-
2017
- 2017-03-30 CN CN201710203054.1A patent/CN106960672B/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5222059A (en) * | 1988-01-06 | 1993-06-22 | Lucasfilm Ltd. | Surround-sound system with motion picture soundtrack timbre correction, surround sound channel timbre correction, defined loudspeaker directionality, and reduced comb-filter effects |
CN101518083A (en) * | 2006-09-22 | 2009-08-26 | 三星电子株式会社 | Method, medium, and system encoding and/or decoding audio signals by using bandwidth extension and stereo coding |
CN102859590A (en) * | 2010-02-24 | 2013-01-02 | 弗劳恩霍夫应用研究促进协会 | Apparatus for generating an enhanced downmix signal, method for generating an enhanced downmix signal and computer program |
CN102572676A (en) * | 2012-01-16 | 2012-07-11 | 华南理工大学 | Real-time rendering method for virtual auditory environment |
EP2645748A1 (en) * | 2012-03-28 | 2013-10-02 | Thomson Licensing | Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal |
CN104781880A (en) * | 2012-09-03 | 2015-07-15 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for providing informed multichannel speech presence probability estimation |
CN104919822A (en) * | 2012-11-15 | 2015-09-16 | 弗兰霍菲尔运输应用研究公司 | Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup |
EP2782094A1 (en) * | 2013-03-22 | 2014-09-24 | Thomson Licensing | Method and apparatus for enhancing directivity of a 1st order Ambisonics signal |
EP2884491A1 (en) * | 2013-12-11 | 2015-06-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Extraction of reverberant sound using microphone arrays |
CN106531179A (en) * | 2015-09-10 | 2017-03-22 | 中国科学院声学研究所 | Multi-channel speech enhancement method based on semantic prior selective attention |
Non-Patent Citations (3)
Title |
---|
DANIEL ET AL.: "Evolving views on HOA: From technological to pragmatic concerns", 《AMBISONICS SYMPOSIUM》 * |
SUGAR ET AL.: "Radio propagation by reflection from meteor trails", 《PROCEEDINGS OF THE IEEE》 * |
许春冬 等: "两扬声器配置下的串声消除系统参数优化设置", 《计算机应用》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019085914A1 (en) * | 2017-10-30 | 2019-05-09 | 捷开通讯(深圳)有限公司 | Terminal, voice command optimization method therefor and storage apparatus |
CN108152788A (en) * | 2017-12-22 | 2018-06-12 | 西安Tcl软件开发有限公司 | Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium |
CN109975762A (en) * | 2017-12-28 | 2019-07-05 | 中国科学院声学研究所 | A kind of underwater sound source localization method |
CN109975762B (en) * | 2017-12-28 | 2021-05-18 | 中国科学院声学研究所 | Underwater sound source positioning method |
CN110751956A (en) * | 2019-09-17 | 2020-02-04 | 北京时代拓灵科技有限公司 | Immersive audio rendering method and system |
Also Published As
Publication number | Publication date |
---|---|
CN106960672B (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101390443B (en) | Audio encoding and decoding | |
CN110085245B (en) | Voice definition enhancing method based on acoustic feature conversion | |
RU2409911C2 (en) | Decoding binaural audio signals | |
EP1783745B1 (en) | Multichannel signal decoding | |
EP4011099A1 (en) | System and method for assisting selective hearing | |
JP2956548B2 (en) | Voice band expansion device | |
CN106960672A (en) | The bandwidth expanding method and device of a kind of stereo audio | |
CN102157156B (en) | Single-channel voice enhancement method and system | |
CN106205623B (en) | A kind of sound converting method and device | |
JPH10509256A (en) | Audio signal conversion method using pitch controller | |
EP2559026A1 (en) | Audio communication device, method for outputting an audio signal, and communication system | |
CN107564538A (en) | The definition enhancing method and system of a kind of real-time speech communicating | |
JPWO2006080358A1 (en) | Speech coding apparatus and speech coding method | |
JP4927264B2 (en) | Method for encoding an audio signal | |
Dadvar et al. | Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target | |
CN113593601A (en) | Audio-visual multi-modal voice separation method based on deep learning | |
CN101715643B (en) | Multi-point connection device, signal analysis and device, method, and program | |
JPH0946233A (en) | Sound encoding method/device and sound decoding method/ device | |
Saeki et al. | Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. | |
EP2489036B1 (en) | Method, apparatus and computer program for processing multi-channel audio signals | |
CN106034274A (en) | 3D sound device based on sound field wave synthesis and synthetic method | |
Gil-Pita et al. | Enhancing the energy efficiency of wireless-communicated binaural hearing aids for speech separation driven by soft-computing algorithms | |
Estreder et al. | On perceptual audio equalization for multiple users in presence of ambient noise | |
CN116110424A (en) | Voice bandwidth expansion method and related device | |
Koduri | Discrete cosine transform-based data hiding for speech bandwidth extension |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200821 Termination date: 20210330 |