CN106960672A - The bandwidth expanding method and device of a kind of stereo audio - Google Patents

The bandwidth expanding method and device of a kind of stereo audio Download PDF

Info

Publication number
CN106960672A
CN106960672A CN201710203054.1A CN201710203054A CN106960672A CN 106960672 A CN106960672 A CN 106960672A CN 201710203054 A CN201710203054 A CN 201710203054A CN 106960672 A CN106960672 A CN 106960672A
Authority
CN
China
Prior art keywords
sound wave
sound
short
spectrum
direct sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710203054.1A
Other languages
Chinese (zh)
Other versions
CN106960672B (en
Inventor
高昕
颜永红
邹潇湘
白海钏
舒敏
云晓春
王锟
张震
计哲
董琳
金暐
王中华
李海灵
李佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, National Computer Network and Information Security Management Center filed Critical Institute of Acoustics CAS
Priority to CN201710203054.1A priority Critical patent/CN106960672B/en
Publication of CN106960672A publication Critical patent/CN106960672A/en
Application granted granted Critical
Publication of CN106960672B publication Critical patent/CN106960672B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • G10L21/0388Details of processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation

Abstract

The invention discloses a kind of bandwidth expanding method of stereo audio and device.This method includes:Stereophonic signal is decomposed into direct sound wave and diffusion sound;Bandwidth expansion is carried out to diffusion sound according to default frequency expansion method;Direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources, multiple point sound sources after bandwidth expansion are obtained;Multiple point sound sources after bandwidth expansion are re-mixed according to the azimuth information pre-estimated, the direct sound wave after bandwidth expansion is obtained;Wide stereo audio signal is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after bandwidth expansion.By means of technical scheme, solve and realize extension to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal in the prior art, the correlation of signal energy and phase in two sound channels is not accounted for, it is rebuild stereophonic signal and has had a strong impact on the problem of hearer is to the judgement of sound source position and distance.

Description

The bandwidth expanding method and device of a kind of stereo audio
Technical field
The present invention relates to application of net field, the bandwidth expanding method and dress of more particularly to a kind of stereo audio Put.
Background technology
In digital audio and video signals treatment technology, the whole frequency models of the appreciable 20Hz~20KHz of human ear will be generally covered Audio signal in enclosing is referred to as full band audio, and the high-fidelity that this kind of signal is mainly used in music signal is reappeared.Sound at this stage Frequency instantaneous communication system can not provide enough network transmission speed and terminal processing capacity, and letter is rebuild in inevitably limitation Number effective bandwidth, the low-frequency component of preferential quantization encoding audio signal, so lifted audio communication system code efficiency.
What black phone voice communication system was generally transmitted is narrow band signal, and its frequency distribution is in 300~3400Hz scopes Interior, sample rate is 8kHz.Related Subjective audiometry result shows, remained in narrowband speech 91% syllable intelligibility and 99% sentence intelligibility.But compared to real speech, naturalness and the master that narrow band signal is transmitted in actually call Appearance quality is decreased obviously.Due to the missing of radio-frequency component, narrowband speech can not fairway branch point voiceless sound or explosion Sound, and weaken its ability for describing speaker's characteristic.For the deficiency efficiently against narrowband audio, wideband audio is extensive It has been applied in voice call communication field, its effective bandwidth expands to 50Hz~7kHz, has preferably covered sign voice letter Most of frequency spectrum of number key property, realizes the level of sound quality close to amplitude modulation broadcasting.But by history, economy, technology etc. The limitation of problems, traditional fixation and mobile communication fully achieve striding forward from arrowband to wideband audio also need to it is considerably long One section of transitional period.
As a kind of effective audio Enhancement Method, frequency expansion method can not change narrow band signal message sink coding and On the premise of network transmission, by analyzing the time-frequency characteristic of original audio signal, in receiving terminal from the wideband audio of reconstruction people The radio-frequency component that coding side is clipped is recovered by ground, and then reaches that the purpose of audio acoustical quality is rebuild in enhancing.For hearing Personage is damaged, frequency expansion method can further improve its phoneme and semantic resolution capability.In recent ten years, many researchs Mechanism proposes numerous solutions in succession with scientific research personnel for the bandspreading of monophonic voices signal.These methods are usual Respectively two aspects, and then composite signal radio-frequency component, its principle such as Fig. 1 are extended from spectrum envelope extension and frequency spectrum details It is shown.Time-frequency characteristics extraction is carried out to narrow band signal according to human auditory system perception principle first;Next, by side information or Mapping relations between low-and high-frequency feature described by priori are estimated come spectrum envelope and energy to radio-frequency component;Together When, select appropriate frequency spectrum method for repairing and mending to carry out spread-spectrum details;Finally, with reference to the spectrum envelope after extension and frequency spectrum details, Realize effective reconstruction of wideband audio signal radio-frequency component.
Radio-frequency component is carried out for stereo audio, more than legacy band extended method for two sound channels independently to rebuild, this Class method realizes the extension to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal, does not account for two sound channels The correlation of middle signal energy and phase, its reconstruction stereophonic signal has had a strong impact on hearer and sound source position and distance has been sentenced It is fixed.
The content of the invention
In view of the above problems, the invention provides a kind of bandwidth expanding method of stereo audio and device.
The bandwidth expanding method for the stereo audio that the present invention is provided, comprises the following steps:
Stereophonic signal is decomposed into direct sound wave and diffusion sound;
Bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method;
The direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources, Obtain multiple point sound sources after bandwidth expansion;
Multiple point sound sources after the bandwidth expansion are re-mixed according to the azimuth information pre-estimated, band is obtained Direct sound wave after width extension;
Wide stereo audio is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion Signal.
Present invention also offers a kind of bandwidth expansion means of stereo audio, including:Decomposing module, diffusion sound expanded mode Block, direct sound wave separation and expansion module, reconstructed module;
The decomposing module, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method Exhibition;
The direct sound wave separation and expansion module, the point sound source for the direct sound wave to be separated into multiple different azimuths, Bandwidth expansion is carried out respectively to multiple point sound sources, multiple point sound sources after bandwidth expansion are obtained;
The reconstructed module, for multiple point sound sources after the bandwidth expansion to be entered according to the azimuth information pre-estimated Row is re-mixed, and the direct sound wave after bandwidth expansion is obtained, after the direct sound wave combination bandwidth expansion after the bandwidth expansion Diffusion sound reconstructs wide stereo audio signal.
The present invention has the beneficial effect that:
The embodiment of the present invention first with the frequency spectrum correlation between sound channel by input stereo audio signal decomposition be direct sound wave and Two kinds of compositions of diffusion sound, then diffusion sound composition be directly extended using legacy band extended method;Direct sound wave is not then according to With the openness point sound source that is separated into multiple different azimuths of the sound source on time-frequency structure, and bandwidth expansion is carried out respectively, finally According to it, the azimuth information in original stereo is re-mixed point sound source after extension, and combines the diffusion after bandwidth expansion Sound composition, reconstructs wide stereo audio signal.The present invention is solved in the prior art according only to single sound channel reconstruction signal Subjective quality realize extension to signal bandwidth, do not account for the correlation of signal energy and phase in two sound channels, its Rebuild stereophonic signal and have a strong impact on the problem of hearer is to the judgement of sound source position and distance.
Brief description of the drawings
Fig. 1 is the basic flow sheet of monophonic voices signal band extended method in the prior art;
Fig. 2 is the flow chart of the bandwidth expanding method of the stereo audio of the inventive method embodiment;
Fig. 3 is the structural representation of the bandwidth expansion means of the stereo audio of apparatus of the present invention embodiment;
Fig. 4 is the theory diagram of the bandwidth expanding method of the stereo audio of present example 1;
Fig. 5 is the theory diagram of the state-space model based on deep neural network in present example 1.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
In order to solve to realize expansion to signal bandwidth according only to the subjective quality of single sound channel reconstruction signal in the prior art Exhibition, does not account for the correlation of signal energy and phase in two sound channels, and it is rebuild stereophonic signal and has had a strong impact on hearer The problem of to the judgement of sound source position and distance, the invention provides a kind of bandwidth expanding method of stereo audio and device, Below in conjunction with accompanying drawing and embodiment, the present invention will be described in further detail.It should be appreciated that specific reality described herein Example is applied only to explain the present invention, does not limit the present invention.
The method according to the invention embodiment is there is provided a kind of bandwidth expanding method of stereo audio, and Fig. 1 is the present invention The flow chart of the bandwidth expanding method of the stereo audio of embodiment of the method, as shown in figure 1, according to the inventive method embodiment The bandwidth expanding method of stereo audio includes following processing:
Step 201, stereophonic signal is decomposed into direct sound wave and diffusion sound.
Specifically, step 201 comprises the following steps:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel for obtaining stereophonic signal is short Time-frequency spectrum composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy is obtained Between amount spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation between left and right sound track signals energy spectrum Pcc
Utilize the Psum、PdiffAnd PccDirect sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave, which is subtracted, using the stereophonic signal obtains diffusion sound.
More specific, it is described respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, obtain To between left and right sound track signals energy spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, left and right sound track signals energy Cross-correlation P between amount spectrumccIncluding:
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs Formula Psum=| SL(t,f)|2+|SR(t,f)|2Between calculating left and right sound track signals energy spectrum and Psum
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs Formula Pdiff=| SL(t,f)|2-|SR(t,f)|2Calculate the poor P between left and right sound track signals energy spectrumdiff
Utilize the L channel short-term spectrum composition SL(t, f) and the R channel short-term spectrum composition SR(t, f) is according to public affairs Formula Pcc=R { SL(t,f)SR *(t, f) } calculate left and right sound track signals energy spectrum between cross-correlation Pcc, wherein R { } is to take real part Operation.
More specific, it is described that direct sound wave is isolated from stereophonic signal using the direct sound wave matrix, including:
Utilize the direct sound wave matrix MD(t, f) isolates direct sound wave S' according to formula 1 from stereophonic signal S (t, f) (t,f);
S ' (t, f)=MD(t, f) [SL(t, f) SR(t, f)]TFormula 1.
Step 202, bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method.
Specifically, step 202 directly carries out bandwidth expansion, this hair using traditional frequency expansion method to the diffusion sound It is bright not repeat.
Step 203, the direct sound wave is separated into the point sound source of multiple different azimuths, band is carried out respectively to multiple point sound sources Width extension, obtains multiple point sound sources after bandwidth expansion.
Specifically, the direct sound wave is separated into the point sound source of multiple different azimuths in step 203, including:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, and is obtained To the cluster centre of directional information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, obtain sheltering square Battle array;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
Multiple point sound sources are carried out with bandwidth expansion respectively specifically, described, including:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and width of narrow band signal Mapping relations between the short-term spectrum of band signal, and according to default error criterion to broadband signal short-term spectrum radio-frequency component Spectrum envelope estimated, with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending extend after frequency spectrum details, obtain Multiple point sound sources after to bandwidth expansion.
More specific, the short-term spectrum that narrow band signal is fitted in the state-space model and broadband signal Mapping relations between short-term spectrum, and the spectrum envelope of radio-frequency component is estimated according to default error criterion, including:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, state-space model is obtained Middle hidden state vector;
Using the short-term spectrum of hidden state vector current time narrow band signal in the state-space model, width is obtained The short-term spectrum of band signal.
Step 204, the point sound source after the multiple bandwidth expansion is re-mixed according to default azimuth information, obtained Direct sound wave after to bandwidth expansion, width is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion Band stereo audio signal.
Specifically, the azimuth information pre-estimated is obtained according to the estimation of the cluster centre of the directional information, it is described The method of estimation is ordinary skill in the art means, and the present invention is not repeated this.
Specifically, being built using formula 2 according to the diffusion after the direct sound wave combination bandwidth expansion after the bandwidth expansion is low voice speaking Go out wide stereo audio signal;
In formula 2,Represent the short-term spectrum of stereophonic signal after wideband extension;Represent broadband The short-term spectrum of direct sound wave after extension;Represent the short-term spectrum of diffusion sound after bandwidth expansion.
It is corresponding with method of the present invention embodiment there is provided a kind of bandwidth expansion means of stereo audio, Fig. 3 is this The structural representation of the bandwidth expansion means of the stereo audio of invention device embodiment, as shown in figure 3, according to apparatus of the present invention The bandwidth expansion means of the stereo audio of embodiment include:Decomposing module 30, diffusion sound expansion module 32, direct sound wave separation with The modules of the embodiment of the present invention are described in detail by expansion module 34, reconstructed module 36 below.
Specifically, the decomposing module 30, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module 32, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method Exhibition;
The direct sound wave separation and expansion module 34, the point sound for the direct sound wave to be separated into multiple different azimuths Multiple point sound sources are carried out bandwidth expansion by source respectively, obtain multiple point sound sources after bandwidth expansion;
The reconstructed module 36, for by multiple point sound sources after the bandwidth expansion according to the azimuth information pre-estimated Re-mixed, obtain the direct sound wave after bandwidth expansion, for being expanded according to the direct sound wave combination bandwidth after the bandwidth expansion Diffusion sound after exhibition reconstructs wide stereo audio signal.
The decomposing module 30 specifically for:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel for obtaining stereophonic signal is short Time-frequency spectrum composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy is obtained Between amount spectrum and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation between left and right sound track signals energy spectrum Pcc
Utilize the Psum、PdiffAnd PccDirect sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave, which is subtracted, using the stereophonic signal obtains diffusion sound.
Direct sound wave separation and expansion module 34 specifically for:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, and is obtained To the cluster centre of directional information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, obtain sheltering square Battle array;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
Direct sound wave separation and expansion module 34 specifically for:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and width of narrow band signal Mapping relations between the short-term spectrum of band signal, and according to default error criterion to broadband signal short-term spectrum radio-frequency component Spectrum envelope estimated, with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending extend after frequency spectrum details, most The direct sound wave after bandwidth expansion is obtained eventually.
For more detailed explanation technical scheme, example 1 is provided, Fig. 4 is the stereo of present example 1 The theory diagram of the bandwidth expanding method of audio, as shown in figure 4, a kind of bandwidth expanding method of stereo audio includes following step Suddenly:
1. direct sound wave/diffusion sound separation
Proposed stereo widening systems will be divided using discrete Fourier transform or quadrature mirror filter group Left and right acoustic channels audio signal after frame is each transformed into frequency domain, and is divided into multiple uniform subbands according to human auditory system perception principle Or critical band.So, the short-term spectrum S (t, f) of input stereo audio signal can be expressed as S (t, f)=[SL(t,f)SR(t, f)]T
Wherein, t and f represent the time frame and sub-band serial number of signal respectively;SL(t, f) and SR(t, f) then represents three-dimensional respectively The left and right acoustic channels short-term spectrum composition of acoustical signal.
In order to efficiently separate direct sound wave and diffusion sound, system also needs to calculate respectively between left and right sound track signals energy spectrum And PsumWith poor PdiffAnd the cross-correlation P of two sound channelscc
Psum=| SL(t,f)|2+|SR(t,f)|2
Pdiff=| SL(t,f)|2-|SR(t,f)|2
Pcc=R { SL(t,f)SR *(t,f)}
Wherein, R { } is to take real part to operate.In order to improve the stability of separation algorithm, the P obtained respectively to calculatingsum、 PdiffAnd PccCarry out time smoothing.
Height correlation between direct sound wave composition in stereo left and right acoustic channels, and be represented by by a direction propagation Lai Point sound source signal.Accordingly, system is put forward herein utilize a direct sound wave matrix from original stereo binaural signal S (t, f) Direct sound wave composition S'(t is directly separated out, f), is shown below,
S'(t, f)=[SL'(t,f)SR'(t,f)]T=MD(t,f)[SL(t,f)SR(t,f)]T=MD(t,f)S(t,f)
Wherein, SL' (t, f) and SR' (t, f) respectively represent direct sound wave left and right acoustic channels short-term spectrum composition, MD(t, f) is Direct sound wave matrix.According to document【M Vinton,D McGrath,C Robinson,P Brown,next generation surround decoding and upmixing for consumer ad professional applications.AES 57thInternational conference,USA,2015】It is described, direct sound wave matrix MD(t, f) can utilize least square Method is obtained, so that the expectation square error between direct sound wave composition and true composition that estimation is obtained is minimum, i.e.,
Then direct sound wave matrix MD(t, f) can be calculated by following formula and obtained,
And diffusion sound composition S " (t, f) can then be expressed as original stereo signal and direct sound wave into the difference divided,
S " (t, f)=S (t, f)-S'(t, f)
2. the Sound seperation of direct sound wave composition
According to S'(t, f)=[SL'(t,f)SR'(t,f)]TDirect sound wave S'(t in a certain time frequency point is obtained using formula 3, f) Directional information θ (t, f), direct sound wave S'(t in a certain time frequency point, the direction of directional information θ (t, f) and point sound source f) Information θiIt is identical;
The directional information θ (t, f) of whole time frequency points is clustered, the cluster centre C of directional information is obtainedi, i=1, 2…N;These cluster centres correspond to each point sound source S respectively1(t,f)、S2(t,f)、S3(t,f)…SNThe directional information of (t, f) θ1、θ2、θ3…θN
According to direct sound wave S'(t in a certain time frequency point, directional information θ (t, f) and cluster centre C f)iObtain masking matrix mi(t,f);
Utilize the masking matrix mi(t, f), to direct sound wave S'(t, f) is separated according to formula 4, obtains through point sound Source
3. bandwidth expansion
According to method as described above, isolated respectively from stereophonic signal diffusion sound composition S " (t, f) and direct sound wave into Divide S'(t, f), and it is openness further by direct sound wave composition S'(t using time-frequency, f) it is separated into multiple point sound sourcesNext can be according to monophonic frequency expansion method respectively to diffusion sound S " (t, f) With through point sound sourceCarry out independent bandwidth expansion.
This paper adoption status spatial model expands come the mapping relations that are directly fitted between narrow broader frequency spectrum parameter actual The spectrum envelope of radio-frequency component is estimated according to certain error criterion in exhibition,
SY(t, f)=F [SX(t,f)]
In formula, SX(t, f) and SY(t, f) represents the short-term spectrum of arrowband and broadband signal respectively, F [] represent mapping (or Estimation) function.
According to state-space model, mapping function F [] can be by state evolution function Fstate[] and observation function Fobs[] Two processes are described, and are shown below,
Shidden(t, f)=Fstate[Shidden(t-1,f),SX(t-1,f),N1(t,f)]
SY(t, f)=Fobs[Shidden(t,f),SX(t,f),N2(t,f)]
Wherein, Shidden(t, f) is hidden state vector, N in model1(t, f) and N2(t, f) describes state evolution letter respectively Number FstateWith observation function FobsError.In above-mentioned model, the hidden state vector S at current timehidden(t, f) by it is previous when Carve hidden state vector Shidden(t-1, f) with the short-term spectrum S of previous moment narrow band signalX(t-1, f) is determined, and current Moment broadband signal short-term spectrum SY(t, f) is then further by current time hidden state vector Shidden(t, f) and current time The short-term spectrum S of narrow band signalX(t, f) is determined.The hidden state recursive structure contained in utilization state spatial model can be more Plus the complex mapping relation between narrow broader frequency spectrum parameter is accurately fitted, the model can use generalized kalman filtering method Realize, it would however also be possible to employ two separate deep neural networks are realized.State space mould based on deep neural network Type general principle is as shown in Figure 5.Herein, state evolution function FstateWith observation function FobsCan be using storehouse self-encoding encoder, many Layer perceptron, delay Recursive Networks, the long various forward directions such as memory network and depth of recursion neural fusion in short-term.
4. stereophonic signal is synthesized
Can be respectively to diffusion sound S " (t, f) and through point sound source using monophonic frequency expansion method 2 ..., N are extended, so as to obtain corresponding broader frequency spectrum SY(t,f).Next, it is possible to use believe in each point sound source direction Cease θiTo reappear broadband direct sound wave
Wherein,For the point sound source broader frequency spectrum after extension.To expand The short-term spectrum of broadband direct sound wave after exhibition.Finally, with reference to the broadband diffusion sound after extensionWide stereo can be realized SignalReproduction,
The embodiment of the present invention first with the frequency spectrum correlation between sound channel by input stereo audio signal decomposition be direct sound wave and Two kinds of compositions of diffusion sound, then diffusion sound composition be directly extended using legacy band extended method;Direct sound wave is not then according to With the openness point sound source that is separated into multiple different azimuths of the sound source on time-frequency structure, and bandwidth expansion is carried out respectively, finally According to it, the azimuth information in original stereo is re-mixed point sound source after extension, and combines the diffusion after bandwidth expansion Sound composition, reconstructs wide stereo audio signal.The present invention is solved in the prior art according only to single sound channel reconstruction signal Subjective quality realize extension to signal bandwidth, do not account for the correlation of signal energy and phase in two sound channels, its Rebuild stereophonic signal and have a strong impact on the problem of hearer is to the judgement of sound source position and distance.
Embodiments of the invention are the foregoing is only, are not intended to limit the invention, for those skilled in the art For member, the present invention can have various modifications and variations.Any modification within the spirit and principles of the invention, being made, Equivalent, improvement etc., should be included within scope of the presently claimed invention.

Claims (10)

1. a kind of bandwidth expanding method of stereo audio, it is characterised in that including:
Stereophonic signal is decomposed into direct sound wave and diffusion sound;
Bandwidth expansion is carried out to the diffusion sound according to default frequency expansion method;
The direct sound wave is separated into the point sound source of multiple different azimuths, bandwidth expansion is carried out respectively to multiple point sound sources, obtained Multiple point sound sources after bandwidth expansion;
Multiple point sound sources after the bandwidth expansion are re-mixed according to the azimuth information pre-estimated, bandwidth expansion is obtained Direct sound wave after exhibition, wide stereo is reconstructed according to the diffusion sound after the direct sound wave combination bandwidth expansion after the bandwidth expansion Audio signal.
2. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that described by stereophonic signal point Solve as direct sound wave and diffusion sound, including:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel of stereophonic signal frequency in short-term is obtained Compose composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy spectrum is obtained Between and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation P between left and right sound track signals energy spectrumcc
Utilize the Psum、PdiffAnd Pcc, direct sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave is rejected in the stereophonic signal, diffusion sound is obtained.
3. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that
The point sound source that the direct sound wave is separated into multiple different azimuths, including:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, the side of obtaining To the cluster centre of information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, masking matrix is obtained;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
4. the bandwidth expanding method of stereo audio as claimed in claim 1, it is characterised in that
It is described that multiple point sound sources are carried out with bandwidth expansion respectively, including:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and broadband letter of narrow band signal Number short-term spectrum between mapping relations, and according to frequency of the default error criterion to broadband signal short-term spectrum radio-frequency component Spectrum envelope is estimated that the frequency spectrum details after being extended with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending obtains band Multiple point sound sources after width extension.
5. the bandwidth expanding method of stereo audio as claimed in claim 4, it is characterised in that
It is described to be fitted in the state-space model between the short-term spectrum of narrow band signal and the short-term spectrum of broadband signal Mapping relations, and the spectrum envelope of radio-frequency component is estimated according to default error criterion, including:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, the preset state space is obtained Hidden state vector in model;
Using the short-term spectrum of the hidden state vector current time narrow band signal in the preset state spatial model, obtain The short-term spectrum of broadband signal.
6. a kind of bandwidth expansion means of stereo audio, it is characterised in that including:It is decomposing module, diffusion sound expansion module, straight Up to sound separation and expansion module, reconstructed module;
The decomposing module, for stereophonic signal to be decomposed into direct sound wave and diffusion sound;
The diffusion sound expansion module, for carrying out bandwidth expansion to the diffusion sound according to default frequency expansion method;
The direct sound wave separation and expansion module, the point sound source for the direct sound wave to be separated into multiple different azimuths, to many Individual point sound source carries out bandwidth expansion respectively, obtains multiple point sound sources after bandwidth expansion;
The reconstructed module, for multiple point sound sources after the bandwidth expansion to be weighed according to the azimuth information pre-estimated New mixing, obtains the direct sound wave after bandwidth expansion, according to the diffusion after the direct sound wave combination bandwidth expansion after the bandwidth expansion Sound reconstructs wide stereo audio signal.
7. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the decomposing module is specifically used In:
The stereophonic signal is decomposed into L channel and R channel;
The L channel after sub-frame processing and R channel are subjected to time-frequency conversion respectively, the L channel of stereophonic signal frequency in short-term is obtained Compose composition and R channel short-term spectrum composition;
Respectively according to the L channel short-term spectrum composition and R channel short-term spectrum composition, left and right sound track signals energy spectrum is obtained Between and Psum, poor P between left and right sound track signals energy spectrumdiff, cross-correlation P between left and right sound track signals energy spectrumcc
Utilize the Psum、PdiffAnd Pcc, direct sound wave matrix is obtained by least square method;
Using the direct sound wave matrix direct sound wave is isolated from the stereophonic signal;
The direct sound wave is rejected in the stereophonic signal and obtains diffusion sound.
8. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the direct sound wave separation is with expanding Open up module specifically for:
The directional information of direct sound wave in each time frequency point is calculated, the directional information to whole time frequency points is clustered, the side of obtaining To the cluster centre of information, the cluster centre corresponds to the directional information of each point sound source respectively;
According to the cluster centre of the directional information of direct sound wave and the directional information in a certain time frequency point, masking matrix is obtained;
Direct sound wave is separated using the masking matrix, the point sound source of multiple different azimuths is obtained.
9. the bandwidth expansion means of stereo audio as claimed in claim 6, it is characterised in that the direct sound wave separation is with expanding Open up module specifically for:
Multiple point sound sources are separately input to be fitted in default state-space model to the short-term spectrum and broadband letter of narrow band signal Number short-term spectrum between mapping relations, and according to frequency of the default error criterion to broadband signal short-term spectrum radio-frequency component Spectrum envelope is estimated that the frequency spectrum details after being extended with reference to low-frequency spectra envelope and using appropriate frequency spectrum method for repairing and mending obtains band Direct sound wave after width extension.
10. the bandwidth expansion means of stereo audio as claimed in claim 9, it is characterised in that the direct sound wave separation with Expansion module specifically for:
Using the short-term spectrum of previous moment hidden state vector previous moment narrow band signal, the preset state space is obtained Hidden state vector in model;
Using the short-term spectrum of the hidden state vector current time narrow band signal in the preset state spatial model, obtain The short-term spectrum of broadband signal.
CN201710203054.1A 2017-03-30 2017-03-30 Bandwidth extension method and device for stereo audio Expired - Fee Related CN106960672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710203054.1A CN106960672B (en) 2017-03-30 2017-03-30 Bandwidth extension method and device for stereo audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710203054.1A CN106960672B (en) 2017-03-30 2017-03-30 Bandwidth extension method and device for stereo audio

Publications (2)

Publication Number Publication Date
CN106960672A true CN106960672A (en) 2017-07-18
CN106960672B CN106960672B (en) 2020-08-21

Family

ID=59470575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710203054.1A Expired - Fee Related CN106960672B (en) 2017-03-30 2017-03-30 Bandwidth extension method and device for stereo audio

Country Status (1)

Country Link
CN (1) CN106960672B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108152788A (en) * 2017-12-22 2018-06-12 西安Tcl软件开发有限公司 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium
WO2019085914A1 (en) * 2017-10-30 2019-05-09 捷开通讯(深圳)有限公司 Terminal, voice command optimization method therefor and storage apparatus
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN110751956A (en) * 2019-09-17 2020-02-04 北京时代拓灵科技有限公司 Immersive audio rendering method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5222059A (en) * 1988-01-06 1993-06-22 Lucasfilm Ltd. Surround-sound system with motion picture soundtrack timbre correction, surround sound channel timbre correction, defined loudspeaker directionality, and reduced comb-filter effects
CN101518083A (en) * 2006-09-22 2009-08-26 三星电子株式会社 Method, medium, and system encoding and/or decoding audio signals by using bandwidth extension and stereo coding
CN102572676A (en) * 2012-01-16 2012-07-11 华南理工大学 Real-time rendering method for virtual auditory environment
CN102859590A (en) * 2010-02-24 2013-01-02 弗劳恩霍夫应用研究促进协会 Apparatus for generating an enhanced downmix signal, method for generating an enhanced downmix signal and computer program
EP2645748A1 (en) * 2012-03-28 2013-10-02 Thomson Licensing Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal
EP2782094A1 (en) * 2013-03-22 2014-09-24 Thomson Licensing Method and apparatus for enhancing directivity of a 1st order Ambisonics signal
EP2884491A1 (en) * 2013-12-11 2015-06-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Extraction of reverberant sound using microphone arrays
CN104781880A (en) * 2012-09-03 2015-07-15 弗兰霍菲尔运输应用研究公司 Apparatus and method for providing informed multichannel speech presence probability estimation
CN104919822A (en) * 2012-11-15 2015-09-16 弗兰霍菲尔运输应用研究公司 Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup
CN106531179A (en) * 2015-09-10 2017-03-22 中国科学院声学研究所 Multi-channel speech enhancement method based on semantic prior selective attention

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5222059A (en) * 1988-01-06 1993-06-22 Lucasfilm Ltd. Surround-sound system with motion picture soundtrack timbre correction, surround sound channel timbre correction, defined loudspeaker directionality, and reduced comb-filter effects
CN101518083A (en) * 2006-09-22 2009-08-26 三星电子株式会社 Method, medium, and system encoding and/or decoding audio signals by using bandwidth extension and stereo coding
CN102859590A (en) * 2010-02-24 2013-01-02 弗劳恩霍夫应用研究促进协会 Apparatus for generating an enhanced downmix signal, method for generating an enhanced downmix signal and computer program
CN102572676A (en) * 2012-01-16 2012-07-11 华南理工大学 Real-time rendering method for virtual auditory environment
EP2645748A1 (en) * 2012-03-28 2013-10-02 Thomson Licensing Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal
CN104781880A (en) * 2012-09-03 2015-07-15 弗兰霍菲尔运输应用研究公司 Apparatus and method for providing informed multichannel speech presence probability estimation
CN104919822A (en) * 2012-11-15 2015-09-16 弗兰霍菲尔运输应用研究公司 Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup
EP2782094A1 (en) * 2013-03-22 2014-09-24 Thomson Licensing Method and apparatus for enhancing directivity of a 1st order Ambisonics signal
EP2884491A1 (en) * 2013-12-11 2015-06-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Extraction of reverberant sound using microphone arrays
CN106531179A (en) * 2015-09-10 2017-03-22 中国科学院声学研究所 Multi-channel speech enhancement method based on semantic prior selective attention

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DANIEL ET AL.: "Evolving views on HOA: From technological to pragmatic concerns", 《AMBISONICS SYMPOSIUM》 *
SUGAR ET AL.: "Radio propagation by reflection from meteor trails", 《PROCEEDINGS OF THE IEEE》 *
许春冬 等: "两扬声器配置下的串声消除系统参数优化设置", 《计算机应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019085914A1 (en) * 2017-10-30 2019-05-09 捷开通讯(深圳)有限公司 Terminal, voice command optimization method therefor and storage apparatus
CN108152788A (en) * 2017-12-22 2018-06-12 西安Tcl软件开发有限公司 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN109975762B (en) * 2017-12-28 2021-05-18 中国科学院声学研究所 Underwater sound source positioning method
CN110751956A (en) * 2019-09-17 2020-02-04 北京时代拓灵科技有限公司 Immersive audio rendering method and system

Also Published As

Publication number Publication date
CN106960672B (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN101390443B (en) Audio encoding and decoding
CN110085245B (en) Voice definition enhancing method based on acoustic feature conversion
RU2409911C2 (en) Decoding binaural audio signals
EP1783745B1 (en) Multichannel signal decoding
EP4011099A1 (en) System and method for assisting selective hearing
JP2956548B2 (en) Voice band expansion device
CN106960672A (en) The bandwidth expanding method and device of a kind of stereo audio
CN102157156B (en) Single-channel voice enhancement method and system
CN106205623B (en) A kind of sound converting method and device
JPH10509256A (en) Audio signal conversion method using pitch controller
EP2559026A1 (en) Audio communication device, method for outputting an audio signal, and communication system
CN107564538A (en) The definition enhancing method and system of a kind of real-time speech communicating
JPWO2006080358A1 (en) Speech coding apparatus and speech coding method
JP4927264B2 (en) Method for encoding an audio signal
Dadvar et al. Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
CN113593601A (en) Audio-visual multi-modal voice separation method based on deep learning
CN101715643B (en) Multi-point connection device, signal analysis and device, method, and program
JPH0946233A (en) Sound encoding method/device and sound decoding method/ device
Saeki et al. Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
EP2489036B1 (en) Method, apparatus and computer program for processing multi-channel audio signals
CN106034274A (en) 3D sound device based on sound field wave synthesis and synthetic method
Gil-Pita et al. Enhancing the energy efficiency of wireless-communicated binaural hearing aids for speech separation driven by soft-computing algorithms
Estreder et al. On perceptual audio equalization for multiple users in presence of ambient noise
CN116110424A (en) Voice bandwidth expansion method and related device
Koduri Discrete cosine transform-based data hiding for speech bandwidth extension

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200821

Termination date: 20210330