WO2017193550A1 - 多声道信号的编码方法和编码器 - Google Patents

多声道信号的编码方法和编码器 Download PDF

Info

Publication number
WO2017193550A1
WO2017193550A1 PCT/CN2016/103594 CN2016103594W WO2017193550A1 WO 2017193550 A1 WO2017193550 A1 WO 2017193550A1 CN 2016103594 W CN2016103594 W CN 2016103594W WO 2017193550 A1 WO2017193550 A1 WO 2017193550A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
frequency domain
domain signal
target
channel
Prior art date
Application number
PCT/CN2016/103594
Other languages
English (en)
French (fr)
Inventor
刘泽新
张兴涛
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017193550A1 publication Critical patent/WO2017193550A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • Embodiments of the present invention relate to the field of audio coding, and more particularly, to an encoding method and an encoder for a multi-channel signal.
  • stereo audio has the sense of orientation and distribution of each sound source, which can improve the clarity, intelligibility and presence of sound, and is therefore favored by people.
  • Stereo processing techniques mainly include Mid/Sid (MS) encoding, Intensity Stereo (IS) encoding, and Parametric Stereo (PS) encoding.
  • MS Mid/Sid
  • IS Intensity Stereo
  • PS Parametric Stereo
  • the MS code combines and converts the two signals based on the inter-channel correlation.
  • the energy of each channel is mainly concentrated in the sum channel, so that the inter-channel redundancy is removed.
  • the rate saving depends on the correlation of the input signals.
  • the correlation of the left and right channel signals is poor, the left channel signal and the right channel signal need to be separately transmitted.
  • the IS code is based on the fact that the human ear auditory system is insensitive to the fine result of the phase difference of the high frequency component of the channel (for example, a component larger than 2 kHz), and the high frequency components of the left and right signals are simplified.
  • PS coding is based on the binaural auditory model, which converts stereo to a mono signal and a small number of spatial parameters (or spatially perceptual parameters) describing the spatial sound field at the encoding end, as shown in Figure 1 (x L in Figure 1 is the left channel) Time domain signal, x R is the right channel time domain signal).
  • the decoder After the decoder receives the mono signal, it further combines the spatial parameters to restore the stereo, as shown in Figure 2.
  • the PS coding compression ratio is high. Under the premise of maintaining good sound quality, higher coding gain can be obtained, and it can work in the full audio bandwidth, which can well restore the spatial sensing effect of stereo sound.
  • spatial parameters include Inter-channel Coherent (IC), Inter-channel Level Difference (ILD), and Inter-channel Time Difference (ITD). And Inter-channel Phase Difference (IPD).
  • IC Inter-channel Coherent
  • ILD Inter-channel Level Difference
  • IPD Inter-channel Time Difference
  • IPD Inter-channel Phase Difference
  • the IC describes the cross-correlation or coherence between channels, which determines the perception of the sound field range and improves the spatial and acoustic stability of the audio signal.
  • ILD is used to distinguish the horizontal direction of the stereo source and describes the difference in intensity between the channels, which will affect the frequency content of the entire spectrum.
  • ITD and IPD are spatial parameters that represent the horizontal orientation of the sound source. They describe the time and phase differences between the channels. This parameter mainly affects the frequency components below 2 kHz.
  • ILD, ITD and IPD can determine the human ear's perception of the sound source position, can effectively determine the sound field position, and play an important role in the recovery of stereo signals.
  • Stereo phase parameters include ITD parameters and IPD parameters.
  • the ITD parameter can represent the time delay between the left and right channel signals of the stereo
  • the IPD parameter can represent the waveform similarity of the stereo left and right channel signals after time alignment.
  • FIG. 3 is a coding flow chart of phase parameters of stereo in the prior art.
  • the extraction of the ITD parameter and the IPD parameter is implemented based on the frequency domain signal, and mainly includes the following steps:
  • Step 1 Perform time-frequency transform on the left and right channel input time domain signals to obtain frequency domain signals of the left and right channels.
  • the time-frequency transform can be performed by using the following formula:
  • x L (n) and x R (n) are the time domain signals of the left and right channels, respectively
  • Length is the frame length or subframe length
  • L is the length of the time-frequency transform.
  • Step 2 Extract phase parameters based on the frequency domain signals of the left and right channels.
  • step 2 can be subdivided into the following steps:
  • Step 2.1 Based on the formula (3), calculate the IPD parameter by the frequency bin (Frequency Bin) in the preset range [k1, k2]:
  • IPD(k) ⁇ L(k)*R * (k), k 1 ⁇ k ⁇ k 2 (3)
  • L(k) and R(k) are the kth frequency point values of the left and right channel frequency domain signals respectively
  • the frequency point value includes the real part and the imaginary part
  • R * (k) represents the right sound
  • the conjugate of the kth frequency point value of the channel frequency domain signal, the real and imaginary parts of L(k) and R(k) may be constructed based on X L (k) and X R (k), as specifically seen in the prior art.
  • Step 2.2 Calculate the time difference between channels of each frequency point based on formula (4):
  • L is the time-frequency transform length used when converting the time domain signals of the left and right channels into the frequency domain signals of the left and right channels
  • is the pi
  • Step 2.3 Perform statistical processing on ITD(k) to obtain ITD parameters.
  • the number N pos of the positive ITD (k) and the number N neg of the negative number of ITD (k) can be counted and further calculated separately.
  • ITD(k) is the mean M pos of the positive number, the variance V pos , and the mean M neg of the negative number of ITD(k), the variance V neg ; finally according to N pos , N neg , M pos , M neg , V pos , V neg
  • the ITD parameter of the current frame/subframe is obtained. For example, when N pos >N neg , if V pos ⁇ V neg , the ITD parameter is M pos up-rounded.
  • Step 2.4 Perform statistical processing on the IPD (k) to obtain an IPD parameter.
  • the mean of IPD(k) over the range of k1 and k2 can be calculated using the following formula:
  • the average of the consecutive 6 frames of IPD parameters including the current frame can be further calculated as the IPD parameter of the current frame:
  • the average of the IPD parameters of the previous frame immediately adjacent to the current frame It is the mean of the IPD parameters of the previous frame of the previous frame of the current frame, and so on.
  • Step 3 Perform quantization processing on the extracted phase parameters.
  • the ITD parameter in order to reduce the bit rate, when the ITD parameter is not 0, the ITD parameter is quantized; when the ITD parameter is 0, the IPD parameter is quantized.
  • the decoder can recover the stereo phase information by combining the mono signal and the decoded phase parameters.
  • the prior art calculates ITD based on IPD. However, for a signal with a large delay, the IPD is beyond the range of 2 ⁇ . If the ITD parameters are still extracted by the prior art, the calculated phase parameters are inaccurate, resulting in degradation of the decoded audio quality.
  • the present application provides an encoding method and an encoder for a multi-channel signal to accurately extract phase parameters of a multi-channel signal and improve encoding quality of the multi-channel signal.
  • a method for encoding a multi-channel signal includes: acquiring a multi-channel signal; generating, according to the multi-channel signal, a first target frequency domain signal, a phase of the first target frequency domain signal The IPD of the multi-channel signal is linearly correlated; performing frequency-time transform on the first target frequency domain signal to obtain a first target time domain signal; and according to the first target time domain signal, and a preset time domain signal peak value Condition, determining an ITD parameter of the multi-channel signal; encoding an ITD parameter of the multi-channel signal.
  • the maximum value of the first target time domain signal is located at the ITD, and the ITD parameter obtained by using the first target time domain signal is not It will be affected by whether the IPD of the multi-channel signal exceeds the 2 ⁇ range.
  • the generating, according to the multi-channel signal, the first target frequency domain signal comprises: acquiring a first frequency domain from the multi-channel signal a signal, wherein the first frequency domain signal is a signal in a first frequency domain of the multi-channel signal; and the first target time domain signal is generated according to the first frequency domain signal;
  • Determining the ITD parameter of the multi-channel signal, the first target time domain signal, and the peak condition of the preset time domain signal including: if the first target time domain signal satisfies the peak condition, according to Determining an ITD parameter of the multi-channel signal in a first target time domain signal; acquiring a second frequency domain from the multi-channel signal if the first target time domain signal does not satisfy the peak condition a signal, wherein the second frequency domain signal is a signal in a second frequency domain of the multi-channel signal, the second frequency domain range being different from the first frequency domain range;
  • the second frequency domain signal determines an ITD
  • the solution flexibly selects the ITD parameter determination mode of the multi-channel signal according to the peak characteristic of the first target time domain signal.
  • the determining, according to the second frequency domain signal, the ITD parameter of the multi-channel signal Generating, by the second frequency domain signal, a second target frequency domain signal, wherein a phase of the second target frequency domain signal is linearly related to an IPD of the multichannel signal; and performing frequency shifting on the second target frequency domain signal, Obtaining a second target time domain signal; determining an ITD parameter of the multi-channel signal according to the second target time domain signal.
  • the second target frequency domain signal is subjected to frequency-time transform to obtain a second target time domain signal.
  • the method includes: performing frequency-time transform on the frequency domain signal of the second target frequency domain signal except the first frequency domain to obtain a third target time domain signal, where the second frequency domain range includes Decoding a first frequency domain range; superimposing the first target time domain signal and the third target time domain signal to obtain the second target time domain signal.
  • Calculating the third target time domain signal by using the calculated first target time domain signal can save computation and improve coding efficiency.
  • the determining the multi-channel according to the first target time domain signal includes: selecting a target sampling point from the N sampling points of the first target time domain signal, The target sampling point is a sampling point with the largest sampling value among the N sampling points, and N represents the number of sampling points of the first target time domain signal; determining the location according to the index value corresponding to the target sampling point An ITD parameter of a multi-channel signal, wherein the index value is used to indicate an ordering of the target sample points in the N sample points.
  • the determining, according to the index value corresponding to the target sampling point, determining an ITD parameter of the multi-channel signal including: The index value corresponding to the target sampling point is determined as an ITD parameter of the multi-channel signal.
  • a method for encoding a multi-channel signal includes: acquiring a multi-channel signal; generating a first target frequency domain signal according to the multi-channel signal, wherein the first target frequency domain signal is located at a first In the frequency domain, the phase of the first target frequency domain signal is linearly related to the IPD of the multi-channel signal; performing frequency-time transform on the first target frequency domain signal to obtain a first target time domain signal; Determining whether the multi-channel signal includes an inverted signal, and in the case that the multi-channel signal does not include an inverted signal, generating a second target frequency according to the multi-channel signal a domain signal, the second target frequency domain signal is located in a second frequency domain range, the second frequency domain range is different from the first frequency domain range, and a phase of the second target frequency domain signal is different from the The IPD of the channel signal is linearly correlated; the second target frequency domain signal is frequency-time transformed to obtain a second target time domain signal; and the ITD parameter of the multi-channel signal is
  • the maximum value of the first target time domain signal is located at the ITD, and the ITD parameter obtained by using the first target time domain signal is not It will be affected by whether the IPD of the multi-channel signal exceeds the 2 ⁇ range.
  • the performing the frequency-time transform on the second target frequency domain signal to obtain the second target time domain signal includes: performing the second target frequency Performing a frequency-time transform on the frequency domain signal of the first frequency domain in the domain signal to obtain a third target time domain signal, where the second frequency domain range includes the first frequency domain range; The first target time domain signal and the third target time domain signal are superposed to obtain the second target time domain signal.
  • the method further includes: determining, in a case where the multi-channel signal includes an inverted signal, An IPD parameter of the multi-channel signal; encoding the IPD parameter.
  • an encoder comprising means for performing the steps of the encoding method of the multi-channel signal in the first aspect.
  • an encoder comprising means capable of performing the respective steps of the encoding method of the multi-channel signal in the second aspect.
  • an encoder comprising: a memory for storing a program, the processor for executing a program, and when the program is executed, the processor performs the first aspect method.
  • an encoder comprising: a memory for storing a program, the processor for executing a program, and when the program is executed, the processor performs the second aspect method.
  • the generating the first or second target frequency domain signal according to the multi-channel signal comprises: determining the first or second target frequency domain signal according to the multi-channel signal Amplitude; determining the multi-channel IPD parameter according to the multi-channel signal; generating according to the amplitude of the first or second target frequency domain signal, and the IPD parameter of the multi-channel signal The first or second target frequency domain signal.
  • determining, according to the multi-channel signal, a magnitude of the first or second target frequency domain signal including: Determining a magnitude of the first or second target frequency domain signal, wherein A M (k) represents a magnitude of the first or second target frequency domain signal, A 1 (k) and A 2 (k) Respectively representing the amplitude of the frequency domain signal of any two of the multi-channel signals, k represents the frequency point, 0 ⁇ k ⁇ L/2, and L represents the transformation of the multi-channel signal from the time domain to The time-frequency transform length used in the frequency domain.
  • the generating the first or second target frequency domain signal according to an amplitude of the first or second target frequency domain signal and an IPD parameter of the multi-channel signal including :according to Determining the first or second target frequency domain signal, wherein A M (k) represents a magnitude of the first or second target frequency domain signal, and X M_real (k) represents the first or second target The real part of the frequency domain signal, X M_iamge (k) represents the imaginary part of the first or second target frequency domain signal, IPD (k) represents the IPD parameter of the multichannel signal, k represents the frequency point, 0 ⁇ k ⁇ L/2, L represents the time-frequency transform length used when transforming the multi-channel signal from the time domain to the frequency domain.
  • X 1 (k) represents a frequency domain signal of a first channel of the multichannel signal
  • X * 2 (k) represents a signal in the multichannel signal
  • the conjugate of the frequency domain signal of the second channel, k represents the frequency point; the amplitude of the frequency domain signal X M (k) is normalized to obtain the first or second target frequency domain signal.
  • the method before the determining the ITD parameter of the multi-channel signal according to the first or second target time domain signal, the method further comprises: for the first or second target The amplitude of the time domain signal is smoothed.
  • the first or second target frequency domain signal can be a cross-correlation signal of the multi-channel signal.
  • the phase of the first or second target frequency domain signal is a multi-channel signal IPD.
  • the frequency domain signal may be represented by a complex number
  • the complex number may be represented by amplitude and phase
  • the phase of the target frequency domain signal may refer to a phase representing a complex number of signals constituting the target frequency domain.
  • 3 is a flow chart of encoding of stereo phase parameters in the prior art.
  • FIG. 4 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • Figure 7 is a schematic diagram of time domain signal synthesis.
  • FIG. 8 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • Figure 11 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • Figure 12 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the signal picked up by the first mic is the first channel signal
  • the signal picked up by the second mic is the second channel signal as an example:
  • the ILD describes the intensity difference between the first channel signal and the second channel signal; if the ILD is greater than 0, it indicates that the energy of the first channel signal is higher than the energy of the second channel signal; if the ILD is equal to 0, Representing that the energy of the first channel signal is equal to the energy of the second channel signal; if the ILD is less than 0, indicating that the energy of the first channel signal is less than the energy of the second channel signal;
  • the time difference between the first channel signal and the second channel signal described by the ITD that is, the time difference between the sound source reaching the first microphone and the second microphone. If the ITD is greater than 0, the sound source reaches the first microphone. The time is earlier than the time when the sound source reaches the second mic; if the ITD is equal to 0, it means that the sound source arrives at the same time to reach the first mic and the second mic; if the ITD is less than 0, it indicates the time when the sound source reaches the first mic. Later than the time when the sound source reached the second microphone;
  • the IPD describes the phase difference between the first channel signal and the second channel signal, which is usually combined with the ITD parameter so that the decoder recovers the phase information of the multi-channel signal.
  • the ITD parameter and the IPD parameter in the embodiment of the present invention may be a Group Inter-channel Time Difference (G_ITD) and a Group Inter-channel Phase Difference (G_IPD), wherein G_ITD may also be referred to as group delay, and G_IPD may also be referred to as group phase.
  • G_ITD Group Inter-channel Time Difference
  • G_IPD Group Inter-channel Phase Difference
  • FIG. 4 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the method of Figure 4 includes:
  • the multi-channel signal can include the signal of the first channel and the signal of the second channel; in some embodiments, the signal of the first channel can be the signal of the left channel, the second channel The signal can be the signal of the right channel.
  • the multi-channel signal can be a multi-channel time domain signal or a multi-channel frequency domain signal.
  • the first target frequency domain signal can be a cross-correlation signal of a multi-channel frequency domain signal.
  • the phase of the first target frequency domain signal is linearly related to the IPD of the multi-channel signal; in some embodiments, the phase of the first target frequency domain signal is the multi-channel
  • the IPD of the signal, ie the linear scale factor is 1.
  • the implementation manner of the step 420 is not limited in the embodiment of the present invention, and will be described in detail later with reference to specific embodiments.
  • the first target frequency domain signal may be frequency-time transformed as a whole to obtain a first target time domain signal; in some embodiments, a portion of the frequency domain signal in the first target frequency domain signal may be The frequency-time transform is performed to obtain the first target frequency domain signal, which can reduce the calculation amount and improve the coding efficiency.
  • the manner of selecting a part of the frequency domain signal in the target frequency domain signal is not specifically limited.
  • the spectral range of the target frequency domain signal may be [0, F]
  • the selected partial frequency domain signal may be a low frequency portion of the target frequency domain signal, such as [0, F/2 of the target frequency domain signal). ], [3, F/4] or [F/4, F/2], which is due to the results obtained for the stable signal based on the low frequency portion of the signal and the result based on the entire spectrum of the signal ( That is, the multi-channel ITD parameters are not much different.
  • step 440 may include determining an ITD parameter of the multi-channel signal according to the first target time domain signal if the first target time domain signal satisfies a peak condition; the first target time domain signal is not When the peak condition is met, the second frequency domain signal is obtained from the multi-channel signal, wherein the second frequency domain signal is a signal in the second frequency domain of the multi-channel signal, and the second frequency domain is The first frequency domain range is different (eg, the second frequency domain range may include the first frequency domain range); and the ITD parameter of the multi-channel signal is determined according to the second frequency domain signal.
  • the value range of the first frequency domain range and the second frequency domain range is not specifically limited in the embodiment of the present invention.
  • the first frequency domain range may be [0].
  • F/2] that is, the first frequency domain range includes the low frequency band portion of the multi-channel signal
  • the second frequency domain range may be [0, F], that is, the second frequency domain range includes the entire frequency band of the multi-channel signal.
  • the peak condition may be that the highest peak of the first target time domain signal is greater than a certain predetermined threshold. In some embodiments, the peak condition may be between the highest peak and the second highest peak of the first target time domain signal The difference is greater than a preset threshold.
  • the peak condition it can be determined whether the ITD parameter of the multi-channel signal is accurate based on the first target time domain signal, and if accurate, the ITD parameter of the multi-channel signal can be determined according to the first target time domain signal; Accurately, the ITD parameter of the multi-channel signal can be determined using the second target time domain signal in the second frequency domain.
  • the ITD parameters of a multi-channel signal can be quantized.
  • the method of FIG. 4 may further include transmitting an ITD parameter of the encoded multi-channel signal to the decoding end.
  • the maximum value of the first target time domain signal is located at the ITD, and the ITD parameter obtained by using the first target time domain signal is not It will be affected by whether the IPD of the multi-channel signal exceeds the 2 ⁇ range.
  • FIG. 5 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the method of Figure 5 includes:
  • the first target frequency domain signal may be located within the first frequency domain.
  • the first target time domain signal can be a cross-correlation signal of a signal of the multi-channel signal in the first frequency domain.
  • the phase of the first target frequency domain signal can be linearly related to the IPD of the multi-channel signal.
  • the phase of the first target frequency domain signal can be the IPD of the multi-channel signal.
  • the first target frequency domain signal may be subjected to frequency-time transform as a whole; or part of the frequency domain signal in the first target frequency domain signal may be frequency-time-transformed, thereby saving computation and improving coding efficiency.
  • the two signals may be referred to as inverted signals.
  • Whether the multi-channel signal in step 540 includes a reverse signal may refer to a multi-channel signal Whether there are two signals with a phase difference of 180 degrees.
  • the manner of determining the inversion signal may be various, which is not specifically limited in the embodiment of the present invention.
  • the step 540 may include: determining an initial ITD parameter of the multi-channel signal according to an index value corresponding to the target sampling point of the first target time domain signal, where the target sampling point is a sampling value in the sampling point of the first target time domain signal. a maximum sampling point; determining that the multi-channel signal comprises an inverted signal if the initial ITD parameter is less than a preset threshold; determining the multi-channel signal if the initial ITD parameter is greater than a preset threshold Does not include inverted signals.
  • determining the initial ITD parameter of the multi-channel signal according to the index value corresponding to the target sampling point of the first target time domain signal may include: corresponding to the target sampling point of the first target time domain signal.
  • the index value is determined as the initial ITD parameter of the multi-channel signal.
  • the multi-channel signal does not include the inverted signal, generate a second target frequency domain signal according to the multi-channel signal, where the second target frequency domain signal is located in the second frequency domain, and the second frequency domain range is The first frequency domain range is different (for example, the second frequency domain range may include the second frequency domain range).
  • step 550 can include: extracting a frequency domain signal in a second frequency domain from the multi-channel signal; generating a second target frequency domain signal according to the multi-channel frequency domain signal in the second frequency domain (eg, And obtaining a cross-correlation signal of the signal of the multi-channel signal in the second frequency domain to obtain a second frequency domain signal).
  • the second target frequency domain signal may be subjected to frequency-time transform to obtain a second target time domain signal; or part of the frequency domain signal in the second target frequency domain signal may be subjected to frequency-time transform to obtain a second target.
  • Domain signal which can reduce computational complexity and improve coding efficiency.
  • the amplitude of the second target time domain signal may be smoothed prior to performing step 570.
  • the ITD parameter of the multi-channel signal may be determined according to an index value corresponding to the target sampling point of the second target time domain signal, and the target sampling point of the second target time domain signal is the second target.
  • the sampling point with the largest sampled value in the time domain signal For example, an index value corresponding to a target sample point of the second target time domain signal may be determined as an ITD parameter of the multi-channel signal.
  • the specific manner of determining the IPD parameter of the multi-channel signal is not limited in the embodiment of the present invention, and may be determined, for example, in the manner described in the formula (3).
  • the multi-channel signal is the left channel signal and the right channel signal as an example, but the embodiment of the present invention is not limited thereto.
  • embodiments of the present invention can be used to process any two or more channels, and the left and right channels below can be any two of two or more channels.
  • the following determines whether the multi-channel signal includes an inverted signal by comparing the initial ITD parameter T 1 obtained based on the first target time domain signal with the preset threshold TH 1 (the preset threshold may have a value range of [1] 4), for example, may be 3.), but the embodiment of the present invention is not limited thereto.
  • any inverse signal determination manner in the prior art may be used to determine whether the multi-channel signal includes an inverted signal.
  • FIG. 6 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the initial ITD parameter T 1 of the multi-channel signal is extracted in the first frequency domain based on the hybrid domain, when T 1 ⁇ TH1, further, based on the mixed domain in the second frequency domain Calculate the ITD parameters of the multichannel signal.
  • the embodiment of the present invention does not specifically limit the relationship between the second frequency domain range and the first frequency domain range. For example, the two may be separated from each other, may overlap, or may be included in each other.
  • FIG. 6 is a second frequency domain range.
  • the first frequency domain range is included as an example for illustration. It should be understood that the processing steps or operations illustrated in FIG.
  • FIG. 6 are merely examples, and that other operations of the present invention or variations of the various operations in FIG. 6 may be performed. Moreover, the various steps in FIG. 6 may be performed in a different order than that presented in FIG. 6, and it is possible that not all operations in FIG. 6 are to be performed.
  • Figure 6 mainly includes the following steps:
  • the FFT transformation can be performed using the following formula:
  • x L (n) and x R (n) are the time domain signals of the left and right channels, respectively, k represents the frequency point, Length represents the frame length or subframe length, and L represents the length of the time-frequency transform.
  • the frequency domain signal obtained after FFT transformation is a complex signal containing the real part and the imaginary part.
  • the real part is X L_real (k)
  • the imaginary part is X L_image (k)
  • the frequency domain signal of the channel is actually X R_real (k) and the imaginary part is X R_image (k), where
  • the values of the real part and the imaginary part can be calculated as follows:
  • the obtained frequency domain signal includes 256 frequency points, wherein the 256th frequency point corresponds to 8 kHz. Spectrum, the 128th frequency corresponds to the 4kHz spectrum, and so on.
  • the amplitude of the first target frequency domain signal and the IPD of the left and right channel signals may be calculated first, and then the first target frequency is constructed based on the amplitude of the first target frequency domain signal and the IPD of the left and right channel signals. Domain signal.
  • the amplitude of the first target frequency domain signal A M (k) may be calculated in the first frequency domain range [k3, k4] using the following formula, wherein k3 and k4 may be between 0 and L/2 :
  • the amplitude of the left channel frequency domain signal can be calculated by the following formula:
  • the amplitude of the right channel frequency domain signal can be calculated by the following formula:
  • the IPD of the left and right channel signals can be calculated using the following formula:
  • the first target frequency domain signal After calculating the amplitude of the first target frequency domain signal and the IPD of the left and right channel signals, the first target frequency domain signal can be constructed using the following formula:
  • one of the frequency domain signals of the left and right channels may be directly multiplied by the conjugate of the other frequency domain signal to obtain a first target frequency domain signal.
  • the amplitude of the first target frequency domain signal may also be smoothed. This calculation method separately constructs the amplitude and phase of the first target frequency domain signal, which is relatively simple.
  • Step 630 may perform frequency-time transform by using Inverse Discrete Fourier Transform (IDFT), or may perform frequency-time transform by using Inverse Fast Fourier Transform (IFFT), which is in the embodiment of the present invention. This is not specifically limited.
  • IDFT Inverse Discrete Fourier Transform
  • IFFT Inverse Fast Fourier Transform
  • the first target frequency domain signal may be windowed first:
  • k is the frequency point, 0 ⁇ k ⁇ L/2
  • L is the time-frequency transform length used when converting the time domain signals of the left and right channels into the frequency domain signals of the left and right channels.
  • IDFT transform is performed on the windowed signal to obtain a first target time domain signal:
  • n is the index value of the sampling point, 0 ⁇ n ⁇ L/2.
  • the obtained amplitude of the first target time domain signal can also be smoothed.
  • the amplitude of the first target time domain signal can be expressed by:
  • w 1 and w 2 are smoothing factors, which can be set as constants, or It changes with the change in the size relationship of A(n).
  • the initial value of the ITD parameter with a preset threshold TH 1 for comparison.
  • step 660 can be performed. It should be noted that the implementation of T 1 ⁇ TH 1 is not specifically limited in the embodiment of the present invention.
  • the IPD parameters may be extracted as shown in step 690, or the ITD parameters may be extracted according to the prior art, or may not be processed. .
  • Steps 660 to 670 are similar to the processing methods of steps 620 to 630. For details, refer to the processing manners of step 620 to step 630, except that steps 660 to 670 are to extract multi-channel signals in the second frequency domain.
  • the ITD parameter, and steps 620 to 630 are the ITD parameters for extracting the multi-channel signal in the first frequency domain.
  • the first frequency domain range may be within the second frequency domain range, such as the first frequency domain range is [k3, k4], and the second frequency domain range is [k5, k6], where k5 ⁇ k3 , k6>k4.
  • the first frequency domain can be [0, F/2], [0, F/4] or [F/4, F/2], That is, the first frequency domain range includes the low frequency band portion of the multi-channel signal; the second frequency domain range may be [0, F], that is, the second frequency domain range includes the entire frequency band of the multi-channel signal.
  • the first frequency domain range [k3, k4] includes n frequency points
  • the second frequency domain range includes n+m+p frequency points, where m is m frequency points before the first frequency domain range.
  • p is the p frequency points after the first frequency domain range.
  • the calculation result of the first frequency domain range can be used for the calculation of the second frequency domain range (ie, the waveform for calculating the second target time domain signal) That is, when calculating the second target time domain signal corresponding to the second frequency domain range, it is not necessary to calculate the time domain waveform corresponding to the first frequency domain range, and only need to calculate other frequencies than the first frequency domain range.
  • the time domain waveform corresponding to the domain range ie, the waveform of the third target time domain signal
  • the first target time domain signal which can superimpose the amplitude of the time domain signal
  • the step 680 may specifically include: determining an index value corresponding to a sampling point where the sampling value of the second target time domain signal is the largest as an ITD parameter of the multi-channel signal.
  • the multi-channel IPD parameters can be extracted using the IPD parameter extraction method described in FIG.
  • FIG. 8 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention. It should be understood that the processing steps or operations illustrated in FIG. 8 are merely examples, and that other operations of the present invention or variations of the various operations in FIG. 8 may be performed. Moreover, the various steps in FIG. 8 may be performed in a different order than that presented in FIG. 8, and it is possible that not all operations of FIG. 8 are to be performed.
  • Steps 810-850 are similar to steps 610-650. To avoid repetition, details are not described in detail. It should be understood that, in the embodiment of the present invention, step 820 may construct the first target frequency domain signal in all or part of the frequency domain of the left and right channel frequency domain signals, and is not limited to the first frequency domain range described in step 620. Further, in step 850, when T 1 ⁇ TH 1 , the initial ITD parameter T 1 can be directly determined as the ITD parameter of the multi-channel signal.
  • Steps 860 and 870 are similar to steps 690 and 695 in FIG. 6, respectively, and are not described in detail herein to avoid repetition.
  • FIG. 9 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 900 of FIG. 9 is capable of performing the various steps in FIG. 4, and to avoid repetition, it will not be described in detail herein.
  • Encoder 900 includes:
  • An obtaining unit 910 configured to acquire a multi-channel signal
  • a generating unit 920 configured to generate, according to the multi-channel signal, a first target frequency domain signal, wherein a phase of the first target frequency domain signal is linearly related to an inter-channel phase difference IPD of the multi-channel signal;
  • the frequency-time transform unit 930 is configured to perform frequency-time transform on the first target frequency domain signal to obtain a first target time domain signal;
  • a determining unit 940 configured to: according to the first target time domain signal, and a preset time domain signal a peak condition determining an inter-channel time difference ITD parameter of the multi-channel signal;
  • the encoding unit 950 is configured to encode the ITD parameter of the multi-channel signal.
  • the generating unit 920 is specifically configured to acquire a first frequency domain signal from the multi-channel signal, where the first frequency domain signal is in the multi-channel signal. a signal located in a first frequency domain; generating the first target frequency domain signal according to the first frequency domain signal; the determining unit 940 is specifically configured to meet the peak in the first target time domain signal In the case of a condition, determining an ITD parameter of the multi-channel signal according to the first target time domain signal; and if the peak value of the first target time domain signal does not satisfy the peak condition, Acquiring a second frequency domain signal in the frequency domain signal, wherein the second frequency domain signal is a signal in the second frequency domain of the multichannel signal, and the second frequency domain range The first frequency domain range is different; and the ITD parameter of the multi-channel signal is determined according to the second frequency domain signal.
  • the determining unit 940 is specifically configured to generate, according to the second frequency domain signal, a second target frequency domain signal, a phase of the second target frequency domain signal and the multichannel
  • the IPD of the signal is linearly correlated; the second target frequency domain signal is frequency-time transformed to obtain a second target time domain signal; and the ITD parameter of the multi-channel signal is determined according to the second target time domain signal.
  • the determining unit 940 is specifically configured to perform frequency-time transform on the frequency domain signal in the second target frequency domain signal except the first frequency domain range to obtain a third target. a domain signal, wherein the second frequency domain range includes the first frequency domain range; superimposing the first target time domain signal and the third target time domain signal to obtain the second target time domain signal .
  • the determining unit 940 is specifically configured to select a target sampling point from the N sampling points of the first target time domain signal, where the target sampling point is the N sampling points.
  • the sampling point having the largest sampling value, N represents the number of sampling points of the first target time domain signal; determining the ITD parameter of the multi-channel signal according to the index value corresponding to the target sampling point, wherein The index value is used to indicate the ordering of the target sample points in the N sample points.
  • the determining unit 940 is specifically configured to sample the target.
  • the index value corresponding to the point is determined as the ITD parameter of the multi-channel signal.
  • the generating unit 920 is specifically configured to determine, according to the multi-channel signal, a magnitude of the first target frequency domain signal; and determine, according to the multi-channel signal, the multiple An IPD parameter of the channel; generating the first target frequency domain signal according to the amplitude of the first target frequency domain signal and the IPD parameter of the multi-channel signal.
  • the generating unit 920 is specifically configured to Determining an amplitude of the first target frequency domain signal, wherein A M (k) represents an amplitude of the first target frequency domain signal, and A 1 (k) and A 2 (k) respectively represent the multiple sound The amplitude of the frequency domain signal of any two channels in the channel signal, and k represents the frequency point.
  • the generating unit 920 is specifically configured to Generating the first target frequency domain signal, wherein A M (k) represents the amplitude of the first target frequency domain signal, and X M_real (k) represents the real part of the first target frequency domain signal, X M_iamge (k) represents the imaginary part of the first target frequency domain signal, IPD(k) represents the IPD parameter of the multichannel signal, and k represents the frequency point.
  • Normalizing the amplitude of the frequency domain signal may include: selecting a maximum amplitude from a magnitude of a frequency point of the frequency domain signal; and then dividing the amplitude of each frequency point of the frequency domain signal by the maximum amplitude, The amplitude after normalization of each frequency point is obtained.
  • FIG. 10 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 1000 of FIG. 10 is capable of performing the various steps in FIG. 4, and to avoid repetition, it will not be described in detail herein.
  • the encoder 1000 includes:
  • a memory 1010 configured to store a program
  • the processor 1020 is configured to execute a program in the memory 1010, when the program is executed, the processor 1020 acquires a multi-channel signal, and generates a first target frequency domain signal according to the multi-channel signal,
  • the phase of the first target frequency domain signal is linearly related to the inter-channel phase difference IPD of the multi-channel signal; performing frequency-time transform on the first target frequency domain signal to obtain a first target time domain signal; a first target time domain signal, and a peak condition of the preset time domain signal, determining an interchannel time difference ITD parameter of the multichannel signal; encoding an ITD parameter of the multichannel signal.
  • the processor 1020 is specifically configured to acquire a first frequency domain signal from the multi-channel signal, where the first frequency domain signal is in the multi-channel signal. a signal located in a first frequency domain; generating the first target frequency domain signal according to the first frequency domain signal; and in a case where the first target time domain signal satisfies the peak condition, according to the first Determining an ITD parameter of the multi-channel signal in a target time domain signal; acquiring a second frequency domain signal from the multi-channel signal if the first target time domain signal does not satisfy the peak condition, The second frequency domain signal is located in a second frequency domain range, and the second frequency domain range is different from the first frequency domain range; and the multi-channel signal is determined according to the second frequency domain signal. ITD parameters.
  • the processor 1020 is specifically configured to generate, according to the second frequency domain signal, a second target frequency domain signal, a phase of the second target frequency domain signal, and the multichannel
  • the IPD of the signal is linearly correlated; the second target frequency domain signal is frequency-time transformed to obtain a second target time domain signal; and the ITD parameter of the multi-channel signal is determined according to the second target time domain signal.
  • the processor 1020 is specifically configured to perform frequency-time transform on a frequency domain signal of the second target frequency domain signal except the first frequency domain range to obtain a third target. a domain signal, wherein the second frequency domain range includes the first frequency domain range; superimposing the first target time domain signal and the third target time domain signal to obtain the second target time domain signal .
  • the processor 1020 is specifically configured to select a target sampling point from the N sampling points of the first target time domain signal, where the target sampling point is the N a sample point having the largest sampled value in the sample point, where N represents the number of sample points of the first target time domain signal; and determining an ITD parameter of the multi-channel signal according to an index value corresponding to the target sample point, wherein The index value is used to indicate an ordering of the target sampling points in the N sampling points.
  • the processor 1020 is specifically configured to determine an index value corresponding to the target sampling point as an ITD parameter of the multi-channel signal.
  • the processor 1020 is specifically configured to determine, according to the multi-channel signal, a magnitude of the first target frequency domain signal; and determine, according to the multi-channel signal, the multiple An IPD parameter of the channel signal; generating the first target frequency domain signal according to the amplitude of the first target frequency domain signal and the IPD parameter of the multichannel signal.
  • the processor 1020 is specifically configured to Determining an amplitude of the first target frequency domain signal, wherein A M (k) represents an amplitude of the first target frequency domain signal, and A 1 (k) and A 2 (k) respectively represent the multiple sound The amplitude of the frequency domain signal of any two channels in the channel signal, and k represents the frequency point.
  • the processor 1020 is specifically configured to Generating the first target frequency domain signal, wherein A M (k) represents the amplitude of the first target frequency domain signal, and X M_real (k) represents the real part of the first target frequency domain signal, X M_iamge (k) represents the imaginary part of the first target frequency domain signal, IPD(k) represents the IPD parameter of the multichannel signal, and k represents the frequency point.
  • FIG. 11 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 1100 of FIG. 11 can implement the various steps in FIGS. 5 to 8. To avoid repetition, details are not described herein.
  • Encoder 1100 include:
  • An obtaining unit 1110 configured to acquire a multi-channel signal
  • a first generating unit 1120 configured to generate, according to the multi-channel signal, a first target frequency domain signal, where the first target frequency domain signal is located in a first frequency domain, and the first target frequency domain signal is The phase is linearly related to the inter-channel phase difference IPD of the multi-channel signal;
  • the first time-frequency transform unit 1130 is configured to perform frequency-time transform on the first target frequency domain signal to obtain a first target time domain signal;
  • a first determining unit 1140 configured to determine, according to the first target time domain signal, whether the multi-channel signal includes an inverted signal
  • a second generating unit 1150 configured to generate, according to the multi-channel signal, a second target frequency domain signal, where the multi-channel signal does not include an inverted signal, where the second target frequency domain signal is located In the range of the second frequency domain, the second frequency domain range is different from the first frequency domain range, and the phase of the second target frequency domain signal is linearly related to the IPD of the multichannel signal;
  • the second time-frequency transform unit 1160 is configured to perform frequency-time transform on the second target frequency domain signal to obtain a second target time domain signal;
  • a second determining unit 1170 configured to determine an inter-channel time difference ITD parameter of the multi-channel signal according to the second target time domain signal
  • the first encoding unit 1180 is configured to encode an ITD parameter of the multi-channel signal.
  • a third determining unit 1190 configured to determine an IPD parameter of the multi-channel signal if the multi-channel signal includes a reverse signal
  • the second encoding unit 1195 is configured to encode an IPD parameter of the multi-channel signal.
  • the second time-frequency transform unit 1160 is specifically configured to perform frequency-time transform on the frequency domain signal in the second target frequency domain signal except the first frequency domain range, to obtain a third target time domain signal, wherein the second frequency domain range includes the first frequency domain range; superimposing the first target time domain signal and the third target time domain signal to obtain the second Target time domain signal.
  • Figure 12 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 1200 of FIG. 12 can implement the various steps in FIGS. 5-8, and to avoid repetition, it will not be described in detail herein.
  • the encoder 1200 includes:
  • a memory 1210 configured to store a program
  • a processor 1220 configured to execute a program in the memory 1210, when the program is executed, the processor 1220 acquires a multi-channel signal; and generates a first target frequency domain signal according to the multi-channel signal,
  • the first target frequency domain signal is located in the first frequency domain, and the phase of the first target frequency domain signal is linearly related to the inter-channel phase difference IPD of the multi-channel signal; for the first target frequency domain Performing a frequency-time transform on the signal to obtain a first target time domain signal; determining, according to the first target time domain signal, whether the multi-channel signal includes an inverted signal; and the multi-channel signal does not include an inverted signal
  • the second target frequency domain signal is generated according to the multi-channel signal, the second target frequency domain signal is located in the second frequency domain range, and the second frequency domain range and the first frequency domain range Differentiating, the phase of the second target frequency domain signal is linearly related to the IPD of the multi-channel signal; performing frequency-time transform on the second target frequency domain signal to obtain a second
  • the second time-frequency transform unit 1160 is specifically configured to perform frequency-time transform on the frequency domain signal in the second target frequency domain signal except the first frequency domain range, to obtain a third target time domain signal, wherein the second frequency domain range includes the first frequency domain range; superimposing the first target time domain signal and the third target time domain signal to obtain the second Target time domain signal.
  • the encoder 1100 further includes: a third determining unit, configured to determine an IPD parameter of the multi-channel signal if the multi-channel signal includes an inverted signal; And a second coding unit, configured to encode the IPD parameter.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

一种多声道信号的编码方法和编码器,该方法包括:获取多声道信号(410);根据多声道信号,生成第一目标频域信号(420),第一目标频域信号的相位与多声道信号的IPD线性相关;对第一目标频域信号进行频时变换,得到第一目标时域信号(430);根据第一目标时域信号,以及预设的时域信号的峰值条件,确定多声道信号的ITD参数(440);对多声道信号的ITD参数进行编码(450)。该方法能够提高多声道信号编码的准确性。

Description

多声道信号的编码方法和编码器
本申请要求于2016年05月10日提交中国专利局、申请号为201610304389.8、发明名称为“多声道信号的编码方法和编码器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及音频编码领域,并且更为具体地,涉及一种多声道信号的编码方法和编码器。
背景技术
随着生活质量的提高,人们对高质量音频的需求不断增大。相对于单声道音频,立体声音频具有各声源的方位感和分布感,能够提高声音的清晰度、可懂度及临场感,因而备受人们青睐。
立体声处理技术主要有和差(Mid/Sid,MS)编码、强度立体声(Intensity Stereo,IS)编码以及参数立体声(Parametric Stereo,PS)编码。
MS编码基于声道间相关性将两路信号作和、差变换,各声道能量主要集中在和声道,使声道间冗余得以去除。在MS编码技术中,码率的节省依赖于输入信号的相关性,当左右声道信号的相关性差时,需分别传输左声道信号和右声道信号。IS编码基于人耳听觉系统对声道的高频成分(例如,大于2kHz的成分)的相位差异精细结果不敏感的特性,将左右两路信号的高频分量进行简化处理。但该IS编码技术仅对高频成分有效,如将IS编码处理扩展到低频,将会引起严重的人为噪声。PS编码基于双耳听觉模型,在编码端将立体声转换成单声道信号和少量描述空间声场的空间参数(或空间感知参数),如图1所示(图1中的xL为左声道时域信号,xR为右声道时域信号)。解码端得到单声道信号后进一步结合空间参数恢复立体声,如图2所示。相对于MS编码,PS编码压缩比高,在保持较好音质的前提下,能够获得更高的编码增益,且可以工作在全音频带宽中,能够很好地还原立体 声的空间感知效果。
PS编码中,空间参数包括声道间相关性(Inter-channel Coherent,IC)、声道间电平差(Inter-channel Level Difference,ILD)、声道间时间差(Inter-channel Time Difference,ITD)以及声道间相位差(Inter-channel Phase Difference,IPD)。IC描述了声道间的互相关或相干性,该参数决定了声场范围的感知,可以提高音频信号空间感和声响稳定性。ILD用于分辨立体声源的水平方向角度,描述了声道间的强度差别,该参数将影响整个频谱的频率成分。ITD和IPD为表示声源水平方位的空间参数,描述了声道间的时间和相位差别,该参数主要影响2kHz以下的频率成分。ILD、ITD和IPD能够决定人耳对声源位置的感知,可以有效确定声场位置,对立体声信号的恢复具有重要作用。
立体声的相位参数包括ITD参数及IPD参数。对于两声道信号,ITD参数可以表示立体声的左右声道信号之间的时间延时,IPD参数可以表示立体声的左右声道信号在时间对齐后的波形相似性。
图3是现有技术中的立体声的相位参数的编码流程图。从图3可以看出,在现有技术中,ITD参数和IPD参数的提取是基于频域信号实现的,主要包括如下步骤:
步骤1、对左右声道输入时域信号分别进行时频变换,得到左右声道的频域信号。
具体地,可以采用如下公式进行时频变换:
Figure PCTCN2016103594-appb-000001
Figure PCTCN2016103594-appb-000002
其中,xL(n)和xR(n)分别为左右声道的时域信号,Length为帧长或子帧长,L为时频变换的长度。
步骤2、基于左右声道的频域信号提取相位参数。
具体地,可以将步骤2细分为如下步骤:
步骤2.1、基于公式(3),在预设的范围[k1,k2]内逐频点(Frequency Bin)计算IPD参数:
IPD(k)=∠L(k)*R*(k),k1≤k≤k2    (3)
其中,k表示频点,L(k)和R(k)分别为左右声道频域信号第k个频点值,该频点值包含实部和虚部,R*(k)表示右声道频域信号第k个频点值的共轭,L(k)和R(k)的实部和虚部可以基于XL(k)和XR(k)构建,具体参见现有技术。
步骤2.2、基于公式(4)计算各频点的声道间时间差:
Figure PCTCN2016103594-appb-000003
其中,L是将左右声道的时域信号变换成左右声道的频域信号时采用的时频变换长度,π为圆周率。
步骤2.3、对ITD(k)进行统计处理,得到ITD参数。
具体地,可以在得到[k1,k2]范围内的ITD(k)后,统计ITD(k)为正数的个数Npos以及ITD(k)为负数的个数Nneg,并进一步分别计算ITD(k)为正数的均值Mpos、方差Vpos以及ITD(k)为负数的均值Mneg、方差Vneg;最后根据Npos、Nneg、Mpos、Mneg、Vpos、Vneg得到当前帧/子帧的ITD参数,例如,当Npos>Nneg时,如果Vpos<Vneg,则ITD参数为Mpos向上取整值。
步骤2.4、对IPD(k)进行统计处理,得到IPD参数。
首先,可以采用如下公式计算IPD(k)在k1和k2范围内的均值:
Figure PCTCN2016103594-appb-000004
然后,可以进一步计算包含当前帧在内的连续6帧IPD参数的均值,作为当前帧的IPD参数:
Figure PCTCN2016103594-appb-000005
其中,
Figure PCTCN2016103594-appb-000006
为与当前帧紧邻的前一帧的IPD参数的均值,
Figure PCTCN2016103594-appb-000007
为当前帧 的前一帧的前一帧的IPD参数的均值,其它依次类推。
步骤3、对提取的相位参数进行量化处理。
现有算法中,为了降低比特率,当ITD参数不为0时,量化ITD参数;当ITD参数为0时,量化IPD参数。
解码端可以结合单声道信号和解码得到的相位参数,恢复立体声相位信息。
从公式(4)可以看出,现有技术基于IPD计算ITD。但是,对于时延较大的信号,会导致IPD超出2π的范围,如果仍采用现有技术的方式提取ITD参数,会导致计算出的相位参数不准确,进而导致解码音频质量下降。
发明内容
本申请提供一种多声道信号的编码方法和编码器,以准确提取多声道信号的相位参数,提高多声道信号的编码质量。
第一方面,提供一种多声道信号的编码方法,包括:获取多声道信号;根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号的相位与多声道信号的IPD线性相关;对所述第一目标频域信号进行频时变换,得到第一目标时域信号;根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的ITD参数;对所述多声道信号的ITD参数进行编码。
由于构建出的第一目标频域信号的相位与所述多声道信号的IPD线性相关,第一目标时域信号的最大值会位于ITD处,利用第一目标时域信号得到的ITD参数不会受到多声道信号的IPD是否超出2π范围的影响,比较准确。
结合第一方面,在第一方面的第一种实现方式中,所述根据所述多声道信号,生成第一目标频域信号,包括:从所述多声道信号中获取第一频域信号,其中,所述第一频域信号为所述多声道信号中的位于第一频域范围内的信号;根据所述第一频域信号,生成所述第一目标时域信号;所述根据所述 第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的ITD参数,包括:在所述第一目标时域信号满足所述峰值条件的情况下,根据第一目标时域信号,确定所述多声道信号的ITD参数;在所述第一目标时域信号不满足所述峰值条件的情况下,从所述多声道信号中获取第二频域信号,其中,所述第二频域信号为所述多声道信号中的位于第二频域范围内的信号,所述第二频域范围与所述第一频域范围不同;根据所述第二频域信号,确定所述多声道信号的ITD参数。
本方案根据第一目标时域信号的峰值特性,灵活地选择多声道信号的ITD参数确定方式。
结合第一方面的第一种实现方式,在第一方面的第二种实现方式中,所述根据所述第二频域信号,确定所述多声道信号的ITD参数,包括:根据所述第二频域信号,生成第二目标频域信号,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的ITD参数。
结合第一方面的第一种或第二种实现方式,在第一方面的第三种实现方式中,所述对所述第二目标频域信号进行频时变换,得到第二目标时域信号,包括:对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
利用已计算出的第一目标时域信号计算第三目标时域信号,可以节省计算量,提高编码效率。
结合第一方面的第一种至第三种实现方式中的任一种,在第一方面的第四种实现方式中,所述根据所述第一目标时域信号,确定所述多声道信号的ITD参数,包括:从所述第一目标时域信号的N个采样点中选取目标采样点, 所述目标采样点为所述N个采样点中的采样值最大的采样点,N表示所述第一目标时域信号的采样点的数目;根据所述目标采样点对应的索引值,确定所述多声道信号的ITD参数,其中,所述索引值用于指示所述目标采样点在所述N个采样点中的排序。
结合第一方面的第四种实现方式,在第一方面的第五种实现方式中,所述根据所述目标采样点对应的索引值,确定所述多声道信号的ITD参数,包括:将所述目标采样点对应的索引值确定为所述多声道信号的ITD参数。
第二方面,提供一种多声道信号的编码方法,包括:获取多声道信号;根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号位于第一频域范围内,且所述第一目标频域信号的相位与多声道信号的IPD线性相关;对所述第一目标频域信号进行频时变换,得到第一目标时域信号;根据所述第一目标时域信号,确定所述多声道信号是否包括反相信号;在所述多声道信号不包括反相信号的情况下,根据所述多声道信号,生成第二目标频域信号,所述第二目标频域信号位于第二频域范围内,所述第二频域范围与所述第一频域范围不同,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的ITD参数;对所述多声道信号的ITD参数进行编码;在所述多声道信号包括反向信号的情况下,提取所述多声道信号的IPD参数;对所述多声道信号的IPD参数进行编码。
由于构建出的第一目标频域信号的相位与所述多声道信号的IPD线性相关,第一目标时域信号的最大值会位于ITD处,利用第一目标时域信号得到的ITD参数不会受到多声道信号的IPD是否超出2π范围的影响,比较准确。
结合第二方面,在第二方面的第一种实现方式中,所述对所述第二目标频域信号进行频时变换,得到第二目标时域信号,包括:对所述第二目标频 域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
结合第二方面或第二方面的第一种实现方式,在第二方面的第二种实现方式中,所述方法还包括:在所述多声道信号包括反相信号的情况下,确定所述多声道信号的IPD参数;对所述IPD参数进行编码。
第三方面,提供一种编码器,包括能够执行第一方面中的多声道信号的编码方法的各个步骤的单元。
第四方面,提供一种编码器,包括能够执行第二方面中的多声道信号的编码方法的各个步骤的单元。
第五方面,提供一种编码器,包括存储器和处理器,所述存储器用于存储程序,所述处理器用于执行程序,当所述程序被执行时,所述处理器执行第一方面中的方法。
第六方面,提供一种编码器,包括存储器和处理器,所述存储器用于存储程序,所述处理器用于执行程序,当所述程序被执行时,所述处理器执行第二方面中的方法。
在某些实现方式中,所述根据所述多声道信号,生成第一或第二目标频域信号,包括:根据所述多声道信号,确定所述第一或第二目标频域信号的幅值;根据所述多声道信号,确定所述多声道的IPD参数;根据所述第一或第二目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一或第二目标频域信号。
在某些实现方式中,所述根据所述多声道信号,确定所述第一或第二目标频域信号的幅值,包括:根据
Figure PCTCN2016103594-appb-000008
确定所述第一或第二目标频域信号的幅值,其中,AM(k)表示所述第一或第二目标频域信号的幅值,A1(k)和A2(k)分别表示所述多声道信号中的任意两个声道的频域信号的幅值,k表示频点,0≤k≤L/2,L表示将所述多声道信号从时域变换至频 域时采用的时频变换长度。
在某些实现方式中,所述根据所述第一或第二目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一或第二目标频域信号,包括:根据
Figure PCTCN2016103594-appb-000009
确定所述第一或第二目标频域信号,其中,AM(k)表示所述第一或第二目标频域信号的幅值,XM_real(k)表示所述第一或第二目标频域信号的实部,XM_iamge(k)表示所述第一或第二目标频域信号的虚部,IPD(k)表示所述多声道信号的IPD参数,k表示频点,0≤k≤L/2,L表示将所述多声道信号从时域变换至频域时采用的时频变换长度。
在某些实现方式中,所述根据所述多声道信号,生成第一或第二目标频域信号,包括:根据XM(k)=X1(k)*X* 2(k),生成频域信号XM(k),其中,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道信号中的第二声道的频域信号的共轭,k表示频点;对所述频域信号XM(k)的幅值进行归一化处理,得到所述第一或第二目标频域信号。
在某些实现方式中,所述根据所述第一或第二目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一或第二目标频域信号,包括:根据XM(k)=X1(k)*X* 2(k),生成所述第一或第二目标频域信号,其中,XM(k)表示所述第一或第二目标频域信号,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道中的第二声道的频域信号的共轭,k表示频点。
在某些实现方式中,在所述根据所述第一或第二目标时域信号,确定所述多声道信号的ITD参数之前,所述方法还包括:对所述第一或第二目标时域信号的幅值进行平滑处理。
在某些实现方式中,第一或第二目标频域信号可以为多声道信号的互相关信号。
在某些实现方式中,第一或第二目标频域信号的相位为多声道信号的 IPD。应理解,频域信号可以通过复数表示,而复数可以通过幅值和相位表示,目标频域信号的相位可以指表示组成该目标频域信号的复数的相位。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是现有技术中的PS编码的流程图。
图2是现有技术中的PS解码的流程图。
图3是现有技术中的立体声的相位参数的编码的流程图。
图4是本发明实施例的多声道信号的编码方法的示意性流程图。
图5是本发明实施例的多声道信号的编码方法的示意性流程图。
图6是本发明实施例的多声道信号的编码方法的示意性流程图。
图7是时域信号合成的示意图。
图8是本发明实施例的多声道信号的编码方法的示意性流程图。
图9是本发明实施例的编码器的示意性结构图。
图10是本发明实施例的编码器的示意性结构图。
图11是本发明实施例的编码器的示意性结构图。
图12是本发明实施例的编码器的示意性结构图。
具体实施方式
为了便于理解,先简单介绍多声道的ILD、ITD、IPD的含义。以第一个麦克拾取到的信号为第一声道信号,第二个麦克拾取到的信号为第二声道信号为例:
ILD描述了第一声道信号和第二声道信号之间的强度差别;如果ILD大于0,表示第一声道信号的能量高于第二声道信号的能量;如果ILD等于0, 表示第一声道信号的能量等于第二声道信号的能量;如果ILD小于0,表示第一声道信号的能量小于第二声道信号的能量;
ITD描述的第一声道信号和第二声道信号之间的时间差别,即声源到达第一个麦克和第二个麦克的时间差异,如果ITD大于0,表示声源到达第一个麦克的时间早于声源到达第二个麦克的时间;如果ITD等于0,表示声源到达同时到达第一个麦克和第二个麦克;如果ITD小于0,表示声源达到第一个麦克的时间晚于声源到达第二个麦克的时间;
IPD描述了第一声道信号和第二声道信号的相位差别,该参数通常和ITD参数结合在一起,以便解码端恢复多声道信号的相位信息。
应理解,本发明实施例中的ITD参数和IPD参数可以是群声道间时间差(Group Inter-channel Time Difference,G_ITD)及群声道间相位差(Group Inter-channel Phase Difference,G_IPD),其中,G_ITD也可称为群时延(group delay),G_IPD也可称为群相位(group phase)。
图4是本发明实施例的多声道信号的编码方法的示意性流程图。图4的方法包括:
410、获取多声道信号。
在一些实施例中,多声道信号可以包括第一声道的信号和第二声道的信号;在一些实施例中,第一声道的信号可以为左声道的信号,第二声道的信号可以为右声道的信号。多声道信号可以为多声道的时域信号,也可以为多声道的频域信号。
420、根据所述多声道信号,生成第一目标频域信号。
在一些实现方式中,第一目标频域信号可以为多声道的频域信号的互相关信号。在一些实施例中,所述第一目标频域信号的相位与所述多声道信号的IPD线性相关;在一些实施例中,所述第一目标频域信号的相位为所述多声道信号的IPD,即线性比例系数为1。此外,本发明实施例对步骤420的实现方式不作限定,后文会结合具体的实施例进行详细描述。
430、对所述第一目标频域信号进行频时变换,得到第一目标时域信号。
在一些实施例中,可以将第一目标频域信号作为一个整体进行频时变换,得到第一目标时域信号;在一些实施例中,可以对第一目标频域信号中的部分频域信号进行频时变换,得到第一目标频域信号,这样可以减少计算量,提高编码效率。
需要说明的是,本发明实施例对目标频域信号中的部分频域信号的选取方式不作具体限定。在一些实施例中,假设目标频域信号的频谱范围可以是[0,F],选取的部分频域信号可以是目标频域信号的低频部分,例如目标频域信号的[0,F/2],[3,F/4]或者[F/4,F/2]部分,这是由于对于稳定信号而言,基于该信号的低频部分得到的结果与基于该信号的整个频谱得到的结果(即多声道的ITD参数)差异不大。
440、根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的ITD参数。
在一些实施例中,步骤440可以包括:在第一目标时域信号满足峰值条件的情况下,根据第一目标时域信号,确定多声道信号的ITD参数;在第一目标时域信号不满足峰值条件的情况下,从多声道信号中获取第二频域信号,其中,第二频域信号为多声道信号中的位于第二频域范围内的信号,第二频域范围与第一频域范围不同(比如第二频域范围可以包括第一频域范围);根据第二频域信号,确定多声道信号的ITD参数。
本发明实施例对第一频域范围和第二频域范围的取值范围不作具体限定,例如,假设多声道信号的整个频段为[0,F],第一频域范围可以为[0,F/2],即第一频域范围包含多声道信号的低频段部分;第二频域范围可以为[0,F],即第二频域范围包含多声道信号的整个频段。
应理解,本发明实施例对峰值条件的具体形式不作限定。在一些实施例中,峰值条件可以是第一目标时域信号的最高峰值大于某个预设阈值。在一些实施例中,峰值条件可以是第一目标时域信号的最高峰值和次高峰值之间 的差值大于某个预设阈值。总之,通过设定峰值条件可以判断出基于第一目标时域信号确定多声道信号的ITD参数是否准确,如果准确,可以根据第一目标时域信号确定多声道信号的ITD参数;如果不准确,可以在第二频域范围内,利用第二目标时域信号确定多声道信号的ITD参数。
450、对所述多声道信号的ITD参数进行编码。
例如,可以对多声道信号的ITD参数进行量化。此外,图4的方法还可包括:向解码端发送编码后的多声道信号的ITD参数。
由于构建出的第一目标频域信号的相位与所述多声道信号的IPD线性相关,第一目标时域信号的最大值会位于ITD处,利用第一目标时域信号得到的ITD参数不会受到多声道信号的IPD是否超出2π范围的影响,比较准确。
图5是本发明实施例的多声道信号的编码方法的示意性流程图。图5的方法包括:
510、获取多声道信号。
520、根据多声道信号,生成第一目标频域信号。
第一目标频域信号可以位于第一频域范围内。在一些实施例中,第一目标时域信号可以为多声道信号在第一频域范围内的信号的互相关信号。在一些实施例中,第一目标频域信号的相位可以与多声道信号的IPD线性相关。在一些实施例中,第一目标频域信号的相位可以为多声道信号的IPD。
530、对第一目标频域信号进行频时变换,得到第一目标时域信号。
具体地,可以对第一目标频域信号整体进行频时变换;也可以将第一目标频域信号中的部分频域信号进行频时变换,这样可以节省计算量,提高编码效率。
540、根据第一目标时域信号,确定多声道信号是否包括反相信号。
应理解,如果两个信号之间的相位相差180度,那么这两个信号可以称为反向信号。步骤540中的多声道信号是否包括反向信号可以指多声道信号 中是否存在相位相差180度的两个信号。
应理解,反相信号的确定方式可以有多种,本发明实施例对此不作具体限定。例如,步骤540可以包括:根据第一目标时域信号的目标采样点对应的索引值,确定多声道信号的初始ITD参数,目标采样点为第一目标时域信号的采样点中的采样值最大的采样点;在所述初始ITD参数小于预设阈值的情况下,确定多声道信号包括反相信号;在所述初始ITD参数大于预设阈值的情况下,确定所述多声道信号不包括反相信号。
此外,在一些实施例中,上述根据第一目标时域信号的目标采样点对应的索引值,确定多声道信号的初始ITD参数可以包括:将第一目标时域信号的目标采样点对应的索引值确定为多声道信号的初始ITD参数。
550、在多声道信号不包括反相信号的情况下,根据多声道信号,生成第二目标频域信号,第二目标频域信号位于第二频域范围内,第二频域范围与第一频域范围不同(比如,第二频域范围可以包含第二频域范围)。
例如,步骤550可以包括:从多声道信号中提取第二频域范围内的频域信号;根据多声道在第二频域范围内的频域信号,生成第二目标频域信号(例如,求多声道信号在第二频域范围内的信号的互相关信号,得到第二频域信号)。
560、对第二目标频域信号进行频时变换,得到第二目标时域信号。
具体地,可以对第二目标频域信号整体进行频时变换,得到第二目标时域信号;也可以对第二目标频域信号中的部分频域信号进行频时变换,得到第二目标时域信号,这样可以降低计算复杂度,提高编码效率。
在一些实施例中,在执行步骤570之前,可以对第二目标时域信号的幅值进行平滑处理。
570、根据第二目标时域信号,确定多声道信号的ITD参数。
在一些实施例中,可以根据第二目标时域信号的目标采样点对应的索引值,确定多声道信号的ITD参数,第二目标时域信号的目标采样点为第二目 标时域信号中的采样值最大的采样点。例如,可以将第二目标时域信号的目标采样点对应的索引值确定为多声道信号的ITD参数。
580、对多声道信号的ITD参数进行编码。
590、在多声道信号包括反相信号的情况下,确定多声道信号的IPD参数。
本发明实施例对确定多声道信号的IPD参数的具体方式不作限定,例如,可以按照公式(3)描述的方式确定。
595、对多声道信号的IPD参数进行编码。
为了便于理解,下文以多声道信号为左声道信号和右声道信号为例进行详细说明,但本发明实施例不限于此。实际中,本发明实施例可用于处理任意两声道或多声道信号,下文中的左声道和右声道可以是两声道或多声道中的任意两个声道。此外,下文均以基于第一目标时域信号得到的初始ITD参数T1与预设阈值TH1比较的方式确定多声道信号是否包含反相信号(预设阈值的取值范围可以是[1,4],例如可以是3。),但本发明实施例不限于此,实际中,可以采用现有技术中的任意反相信号确定方式确定多声道信号是否包含反相信号。
图6是本发明实施例的多声道信号的编码方法的示意性流程图。在图6的实施例中,基于混合域在第一频域范围内提取多声道信号的初始ITD参数T1,当T1≥TH1时,进一步地,基于混合域在第二频域范围内计算多声道信号的ITD参数。本发明实施例对第二频域范围与第一频域范围之间的关系不作具体限定,例如,二者可以相互分离,也可以重叠,也可以相互包含,图6是以第二频域范围包含第一频域范围为例进行举例说明。应理解,图6示出的处理步骤或操作仅是示例,本发明实施例还可以执行其它操作或者图6中的各种操作的变形。此外,图6中的各个步骤可以按照与图6呈现的不同的顺序来执行,并且有可能并非要执行图6中的全部操作。图6主要包括如下步骤:
610、对左右声道的时域信号进行时频变换。
具体地,可以采用如下公式进行FFT变换:
Figure PCTCN2016103594-appb-000010
Figure PCTCN2016103594-appb-000011
其中,xL(n)和xR(n)分别为左右声道的时域信号,k表示频点,Length表示帧长或子帧长,L表示时频变换的长度。
FFT变换后得到的频域信号是复数信号,包含了实部和虚部,对于左声道的频域信号,其实部为XL_real(k),虚部为XL_image(k);对于右声道的频域信号,其实部为XR_real(k),虚部为XR_image(k),其中,
Figure PCTCN2016103594-appb-000012
具体地,以左声道的频域信号为例,其实部和虚部的取值可以采用如下计算方式:
XL_real(0)=XL(0),XL_image(0)=0    (9)
Figure PCTCN2016103594-appb-000013
Figure PCTCN2016103594-appb-000014
或者,
XL_real(0)=XL(0),XL_image(0)=0    (12)
Figure PCTCN2016103594-appb-000015
Figure PCTCN2016103594-appb-000016
需要注意的是,时频变换以后,对于宽带(WideBand,WB)信号,如果时频变换长度为512,则得到的频域信号包括256个频点,其中第256个频点对应的是8kHz的频谱,第128个频点对应的是4kHz的频谱,其它依次类推。
620、在第一频域范围内构建第一目标频域信号。
在一些实施例中,可以先计算第一目标频域信号的幅值以及左右声道信号的IPD,然后基于第一目标频域信号的幅值以及左右声道信号的IPD,构建第一目标频域信号。
具体地,可以采用如下公式,在第一频域范围[k3,k4]内计算第一目标频域信号AM(k)的幅值,其中,k3和k4可以位于0和L/2之间:
Figure PCTCN2016103594-appb-000017
其中,左声道频域信号的幅值可以采用如下公式计算获得:
Figure PCTCN2016103594-appb-000018
右声道频域信号的幅值可以采用如下公式计算获得:
Figure PCTCN2016103594-appb-000019
可以采用如下公式计算左右声道信号的IPD:
Figure PCTCN2016103594-appb-000020
在计算出第一目标频域信号的幅值和左右声道信号的IPD之后,可以采用如下公式构建第一目标频域信号:
Figure PCTCN2016103594-appb-000021
在另一些实施例中,可以直接将左右声道的频域信号中的一个频域信号与另一个频域信号的共轭相乘,得到第一目标频域信号。进一步地,在该实施例中,还可以对第一目标频域信号的幅值进行平滑处理。这种计算方式分别构建第一目标频域信号的幅值和相位,比较简单。
630、对第一目标频域信号进行频时变换,得到第一目标时域信号。
步骤630可以采用逆离散傅里叶变换(Inverse Discrete Fourier Transform,IDFT)进行频时变换,也可以采用逆快速傅里叶变换(Inverse Fast Fourier Transform,IFFT)进行频时变换,本发明实施例对此不作具体限定。
具体地,可以先对第一目标频域信号进行加窗处理:
Figure PCTCN2016103594-appb-000022
Figure PCTCN2016103594-appb-000023
其中,k为频点,0≤k≤L/2,L为将左右声道的时域信号变换成左右声道的频域信号时采用的时频变换长度。
然后,对加窗后的信号进行IDFT变换,得到第一目标时域信号:
Figure PCTCN2016103594-appb-000024
Figure PCTCN2016103594-appb-000025
其中,n为采样点的索引值,0≤n<L/2。
此外,还可以对得到的第一目标时域信号的幅值进行平滑处理。
具体地,第一目标时域信号的幅值可以通过下式表示:
Figure PCTCN2016103594-appb-000026
对第一目标时域信号的幅值进行平滑处理,得到幅度平滑值Asm(n):
Figure PCTCN2016103594-appb-000027
其中,
Figure PCTCN2016103594-appb-000028
为当前帧的前一帧/子帧第n点的幅度平滑值;w1、w2为平滑因子,可以设置为常数,也可以随
Figure PCTCN2016103594-appb-000029
和A(n)的大小关系的变化而变化。w1、w2满足w1+w2=1,例如,可以设置w1=0.75,w2=0.25,或者w1=0.8,w2=0.2,或者w1=0.9,w2=0.1,或者
Figure PCTCN2016103594-appb-000030
640、根据第一目标时域信号,确定多声道信号的初始ITD参数T1
具体地,搜索第一目标时域信号的采样值最大的采样点对应的索引值index=argmax(Asm(n))得到初始ITD参数T1,例如T1=index。
650、将初始ITD参数与预设阈值TH1进行比较。
具体地,如果T1>TH1,可以执行步骤660。需要说明的是,本发明实 施例对T1<TH1的实现方式不作具体限定,例如,可以如步骤690所示,提取IPD参数,或者可以按照现有技术的方式提取ITD参数,或者不作处理。
660、在第二频域范围内构建第二目标频域信号。
670、对第二目标频域信号进行频时变换,得到第二目标时域信号。
步骤660至步骤670与步骤620至步骤630的处理方式类似,可以参见步骤620至步骤630的处理方式,不同之处在于步骤660至步骤670是在第二频域范围内提取多声道信号的ITD参数,而步骤620至步骤630是在第一频域范围内提取多声道信号的ITD参数。
在一个例子中,第一频域范围可以位于第二频域范围之内,比如第一频域范围为[k3,k4],第二频域范围为[k5,k6],其中,k5<k3,k6>k4。例如,假设多声道信号的整个频段为[0,F],第一频域范围可以为[0,F/2],[0,F/4]或者[F/4,F/2],即第一频域范围包含多声道信号的低频段部分;第二频域范围可以为[0,F],即第二频域范围包含多声道信号的整个频段。参见图7,第一频域范围[k3,k4]包含n个频点,第二频域范围包含n+m+p个频点,其中,m为第一频域范围之前的m个频点,p为第一频域范围之后的p个频点。此时,如图7所示,第一频域范围的计算结果(第一目标时域信号的波形)可用于第二频域范围的计算(即用于计算第二目标时域信号的波形),也就是说,在计算第二频域范围对应的第二目标时域信号时,可以无需计算第一频域范围对应的时域波形,仅需计算除第一频域范围之外的其他频域范围对应的时域波形(即,第三目标时域信号的波形),然后将得到的时域波形与第一目标时域信号叠加(可以将时域信号的幅度叠加),即可得到第二目标时域信号,这样可以节省计算量,提高编码效率。
680、根据第二目标时域信号,确定多声道信号的ITD参数。
步骤680具体可以包括:将第二目标时域信号的采样值最大的采样点对应的索引值确定为多声道信号的ITD参数。
690、提取多声道信号的IPD参数。
例如,可以采用图3中描述的IPD参数提取方式提取该多声道的IPD参数。
695、对得到的相位参数(多声道信号的ITD参数或IPD参数)进行量化。
图8是本发明实施例的多声道信号的编码方法的示意性流程图。应理解,图8示出的处理步骤或操作仅是示例,本发明实施例还可以执行其它操作或者图8中的各种操作的变形。此外,图8中的各个步骤可以按照与图8呈现的不同的顺序来执行,并且有可能并非要执行图8的全部操作。
步骤810~850与步骤610~650类似,为避免重复,不再详细描述。应理解,本发明实施例中,步骤820可以在左右声道频域信号的全部或部分频域范围内构建第一目标频域信号,而不限于步骤620描述的第一频域范围。此外,在步骤850中,当T1<TH1时,可以直接将初始ITD参数T1确定为多声道信号的ITD参数。
步骤860和步骤870分别与图6中的步骤690和步骤695类似,为避免重复,此处不再详述。
上文结合图4至图8,详细描述了根据本发明实施例的多声道信号的编码方法,下文结合图9至图12,详细描述根据本发明实施例的编码器。
图9是本发明实施例的编码器的示意性结构图。图9的编码器900能够执行图4中的各个步骤,为避免重复,此处不再详述。编码器900包括:
获取单元910,用于获取多声道信号;
生成单元920,用于根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;
频时变换单元930,用于对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
确定单元940,用于根据所述第一目标时域信号,以及预设的时域信号 的峰值条件,确定所述多声道信号的声道间时间差ITD参数;
编码单元950,用于对所述多声道信号的ITD参数进行编码。
可选地,作为一个实施例,所述生成单元920具体用于从所述多声道信号中获取第一频域信号,其中,所述第一频域信号为所述多声道信号中的位于第一频域范围内的信号;根据所述第一频域信号,生成所述第一目标频域信号;所述确定单元940具体用于在所述第一目标时域信号满足所述峰值条件的情况下,根据第一目标时域信号,确定所述多声道信号的ITD参数;在所述第一目标时域信号的峰值不满足所述峰值条件的情况下,从所述多声道的频域信号中获取第二频域信号,其中,所述第二频域信号为所述多声道信号中的位于第二频域范围内的信号,所述第二频域范围与所述第一频域范围不同;根据所述第二频域信号,确定所述多声道信号的ITD参数。
可选地,作为一个实施例,所述确定单元940具体用于根据所述第二频域信号,生成第二目标频域信号,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的ITD参数。
可选地,作为一个实施例,所述确定单元940具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
可选地,作为一个实施例,所述确定单元940具体用于从所述第一目标时域信号的N个采样点中选取目标采样点,所述目标采样点为所述N个采样点中的采样值最大的采样点,N表示所述第一目标时域信号的采样点的数目;根据所述目标采样点对应的索引值,确定所述多声道信号的ITD参数,其中,所述索引值用于指示所述目标采样点在所述N个采样点中的排序。
可选地,作为一个实施例,所述确定单元940具体用于将所述目标采样 点对应的索引值确定为所述多声道信号的ITD参数。
可选地,作为一个实施例,所述生成单元920具体用于根据所述多声道信号,确定所述第一目标频域信号的幅值;根据所述多声道信号,确定所述多声道的IPD参数;根据所述第一目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一目标频域信号。
可选地,作为一个实施例,所述生成单元920具体用于根据
Figure PCTCN2016103594-appb-000031
确定所述第一目标频域信号的幅值,其中,AM(k)表示所述第一目标频域信号的幅值,A1(k)和A2(k)分别表示所述多声道信号中的任意两个声道的频域信号的幅值,k表示频点。
可选地,作为一个实施例,所述生成单元920具体用于根据
Figure PCTCN2016103594-appb-000032
生成所述第一目标频域信号,其中,AM(k)表示所述第一目标频域信号的幅值,XM_real(k)表示所述第一目标频域信号的实部,XM_iamge(k)表示所述第一目标频域信号的虚部,IPD(k)表示所述多声道信号的IPD参数,k表示频点。
可选地,作为一个实施例,所述生成单元920具体用于根据XM(k)=X1(k)*X* 2(k),生成频域信号XM(k),其中,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道信号中的第二声道的频域信号的共轭,k表示频点;对所述频域信号XM(k)的幅值进行归一化处理,得到所述第一目标频域信号。对频域信号的幅值进行归一化处理可以包括:从频域信号的频点的幅值中选取最大幅值;然后用频域信号的各频点的幅值除以该最大幅值,得到各频点归一化之后的幅值。
图10是本发明实施例的编码器的示意性结构图。图10的编码器1000能够执行图4中的各个步骤,为避免重复,此处不再详述。编码器1000包括:
存储器1010,用于存储程序;
处理器1020,用于执行存储器1010中的程序,当所述程序被执行时,所述处理器1020获取多声道信号;根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;对所述第一目标频域信号进行频时变换,得到第一目标时域信号;根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的声道间时间差ITD参数;对所述多声道信号的ITD参数进行编码。
可选地,作为一个实施例,所述处理器1020具体用于从所述多声道信号中获取第一频域信号,其中,所述第一频域信号为所述多声道信号中的位于第一频域范围内的信号;根据所述第一频域信号,生成所述第一目标频域信号;在所述第一目标时域信号满足所述峰值条件的情况下,根据第一目标时域信号,确定所述多声道信号的ITD参数;在所述第一目标时域信号不满足所述峰值条件的情况下,从所述多声道信号中获取第二频域信号,其中,所述第二频域信号位于第二频域范围内,所述第二频域范围与所述第一频域范围不同;根据所述第二频域信号,确定所述多声道信号的ITD参数。
可选地,作为一个实施例,所述处理器1020具体用于根据所述第二频域信号,生成第二目标频域信号,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的ITD参数。
可选地,作为一个实施例,所述处理器1020具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
可选地,作为一个实施例,所述处理器1020具体用于从所述第一目标时域信号的N个采样点中选取目标采样点,所述目标采样点为所述N个采 样点中的采样值最大的采样点,N表示所述第一目标时域信号的采样点的数目;根据所述目标采样点对应的索引值,确定所述多声道信号的ITD参数,其中,所述索引值用于指示所述目标采样点在所述N个采样点中的排序。
可选地,作为一个实施例,所述处理器1020具体用于将所述目标采样点对应的索引值确定为所述多声道信号的ITD参数。
可选地,作为一个实施例,所述处理器1020具体用于根据所述多声道信号,确定所述第一目标频域信号的幅值;根据所述多声道信号,确定所述多声道信号的IPD参数;根据所述第一目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一目标频域信号。
可选地,作为一个实施例,所述处理器1020具体用于根据
Figure PCTCN2016103594-appb-000033
确定所述第一目标频域信号的幅值,其中,AM(k)表示所述第一目标频域信号的幅值,A1(k)和A2(k)分别表示所述多声道信号中的任意两个声道的频域信号的幅值,k表示频点。
可选地,作为一个实施例,所述处理器1020具体用于根据
Figure PCTCN2016103594-appb-000034
生成所述第一目标频域信号,其中,AM(k)表示所述第一目标频域信号的幅值,XM_real(k)表示所述第一目标频域信号的实部,XM_iamge(k)表示所述第一目标频域信号的虚部,IPD(k)表示所述多声道信号的IPD参数,k表示频点。
可选地,作为一个实施例,所述处理器1020具体用于根据XM(k)=X1(k)*X* 2(k),生成频域信号XM(k),其中,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道信号中的第二声道的频域信号的共轭,k表示频点;对所述频域信号XM(k)的幅值进行归一化处理,得到所述第一目标频域信号。
图11是本发明实施例的编码器的示意性结构图。图11的编码器1100能够实现图5至图8中的各个步骤,为避免重复,此处不再详述。编码器1100 包括:
获取单元1110,用于获取多声道信号;
第一生成单元1120,用于根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号位于第一频域范围内,且所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;
第一频时变换单元1130,用于对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
第一确定单元1140,用于根据所述第一目标时域信号,确定所述多声道信号是否包括反相信号;
第二生成单元1150,用于在所述多声道信号不包括反相信号的情况下,根据所述多声道信号,生成第二目标频域信号,所述第二目标频域信号位于第二频域范围内,所述第二频域范围与所述第一频域范围不同,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;
第二频时变换单元1160,用于对所述第二目标频域信号进行频时变换,得到第二目标时域信号;
第二确定单元1170,用于根据所述第二目标时域信号,确定所述多声道信号的声道间时间差ITD参数;
第一编码单元1180,用于对所述多声道信号的ITD参数进行编码。
第三确定单元1190,用于在所述多声道信号包括反向信号的情况下,确定所述多声道信号的IPD参数;
第二编码单元1195,用于对所述多声道信号的IPD参数进行编码。
可选地,作为一个实施例,所述第二频时变换单元1160具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
图12是本发明实施例的编码器的示意性结构图。图12的编码器1200能够实现图5至图8中的各个步骤,为避免重复,此处不再详述。编码器1200包括:
存储器1210,用于存储程序;
处理器1220,用于执行存储器1210中的程序,当所述程序被执行时,所述处理器1220获取多声道信号;根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号位于第一频域范围内,且所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;对所述第一目标频域信号进行频时变换,得到第一目标时域信号;根据所述第一目标时域信号,确定所述多声道信号是否包括反相信号;在所述多声道信号不包括反相信号的情况下,根据所述多声道信号,生成第二目标频域信号,所述第二目标频域信号位于第二频域范围内,所述第二频域范围与所述第一频域范围不同,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的声道间时间差ITD参数;对所述多声道信号的ITD参数进行编码。
可选地,作为一个实施例,所述第二频时变换单元1160具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
可选地,作为一个实施例,所述编码器1100还包括:第三确定单元,用于在所述多声道信号包括反相信号的情况下,确定所述多声道信号的IPD参数;第二编码单元,用于对所述IPD参数进行编码。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结 合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前 述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (24)

  1. 一种多声道信号的编码方法,其特征在于,包括:
    获取多声道信号;
    根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;
    对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
    根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的声道间时间差ITD参数;
    对所述多声道信号的ITD参数进行编码。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述多声道信号,生成第一目标频域信号,包括:
    从所述多声道信号中获取第一频域信号,其中,所述第一频域信号为所述多声道信号中的位于第一频域范围内的信号;
    根据所述第一频域信号,生成所述第一目标频域信号;
    所述根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的ITD参数,包括:
    在所述第一目标时域信号满足所述峰值条件的情况下,根据所述第一目标时域信号,确定所述多声道信号的ITD参数;
    在所述第一目标时域信号不满足所述峰值条件的情况下,从所述多声道信号中获取第二频域信号,其中,所述第二频域信号为所述多声道信号中的位于第二频域范围内的信号,所述第二频域范围与所述第一频域范围不同;
    根据所述第二频域信号,确定所述多声道信号的ITD参数。
  3. 如权利要求2所述的方法,其特征在于,所述根据所述第二频域信号,确定所述多声道信号的ITD参数,包括:
    根据所述第二频域信号,生成第二目标频域信号,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;
    对所述第二目标频域信号进行频时变换,得到第二目标时域信号;
    根据所述第二目标时域信号,确定所述多声道信号的ITD参数。
  4. 如权利要求3所述的方法,其特征在于,所述对所述第二目标频域信号进行频时变换,得到第二目标时域信号,包括:
    对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;
    将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
  5. 如权利要求2-4中任一项所述的方法,其特征在于,所述根据所述第一目标时域信号,确定所述多声道信号的ITD参数,包括:
    根据所述第一目标时域信号的采样值最大的采样点对应的索引值,确定所述多声道信号的ITD参数。
  6. 如权利要求5所述的方法,其特征在于,所述根据所述第一目标时域信号的采样值最大的采样点对应的索引值,确定所述多声道信号的ITD参数,包括:
    将所述索引值确定为所述多声道信号的ITD参数。
  7. 如权利要求1-6中任一项所述的方法,其特征在于,所述根据所述多声道信号,生成第一目标频域信号,包括:
    根据所述多声道信号,确定所述第一目标频域信号的幅值;
    根据所述多声道信号,确定所述多声道信号的IPD参数;
    根据所述第一目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一目标频域信号。
  8. 如权利要求7所述的方法,其特征在于,所述根据所述多声道信号,确定所述第一目标频域信号的幅值,包括:
    根据
    Figure PCTCN2016103594-appb-100001
    确定所述第一目标频域信号的幅值,其中, AM(k)表示所述第一目标频域信号的幅值,A1(k)和A2(k)分别表示所述多声道信号中的任意两个声道的频域信号的幅值,k表示频点。
  9. 如权利要求7或8所述的方法,其特征在于,所述根据所述第一目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一目标频域信号,包括:
    根据
    Figure PCTCN2016103594-appb-100002
    生成所述第一目标频域信号,其中,AM(k)表示所述第一目标频域信号的幅值,XM_real(k)表示所述第一目标频域信号的实部,XM_iamge(k)表示所述第一目标频域信号的虚部,IPD(k)表示所述多声道信号的IPD参数,k表示频点。
  10. 如权利要求1-6中任一项所述的方法,其特征在于,所述根据所述多声道信号,生成第一目标频域信号,包括:
    根据XM(k)=X1(k)*X* 2(k),生成频域信号XM(k),其中,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道信号中的第二声道的频域信号的共轭,k表示频点;
    对所述频域信号XM(k)的幅值进行归一化处理,得到所述第一目标频域信号。
  11. 一种多声道信号的编码方法,其特征在于,包括:
    获取多声道信号;
    根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号位于第一频域范围内,且所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;
    对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
    根据所述第一目标时域信号,确定所述多声道信号是否包括反相信号;
    在所述多声道信号不包括反相信号的情况下,根据所述多声道信号,生成第二目标频域信号,所述第二目标频域信号位于第二频域范围内,所述第 二频域范围与所述第一频域范围不同,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;
    对所述第二目标频域信号进行频时变换,得到第二目标时域信号;
    根据所述第二目标时域信号,确定所述多声道信号的声道间时间差ITD参数;
    对所述多声道信号的ITD参数进行编码;
    在所述多声道信号包括反相信号的情况下,确定所述多声道信号的IPD参数;
    对所述多声道信号的IPD参数进行编码。
  12. 如权利要求11所述的方法,其特征在于,所述对所述第二目标频域信号进行频时变换,得到第二目标时域信号,包括:
    对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;
    将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
  13. 一种编码器,其特征在于,包括:
    获取单元,用于获取多声道信号;
    生成单元,用于根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号的相位与多声道信号的声道间相位差IPD线性相关;
    频时变换单元,用于对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
    确定单元,用于根据所述第一目标时域信号,以及预设的时域信号的峰值条件,确定所述多声道信号的声道间时间差ITD参数;
    编码单元,用于对所述多声道信号的ITD参数进行编码。
  14. 如权利要求13所述的编码器,其特征在于,所述生成单元具体用 于从所述多声道信号中获取第一频域信号,其中,所述第一频域信号为所述多声道信号中的位于第一频域范围内的信号;根据所述第一频域信号,生成所述第一目标频域信号;
    所述确定单元具体用于在所述第一目标时域信号满足所述峰值条件的情况下,根据第一目标时域信号,确定所述多声道信号的ITD参数;在所述第一目标时域信号不满足所述峰值条件的情况下,从所述多声道信号中获取第二频域信号,其中,所述第二频域信号为所述多声道信号中的位于第二频域范围内的信号,所述第二频域范围与所述第一频域范围不同;根据所述第二频域信号,确定所述多声道信号的ITD参数。
  15. 如权利要求14所述的编码器,其特征在于,所述确定单元具体用于根据所述第二频域信号,生成第二目标频域信号,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;对所述第二目标频域信号进行频时变换,得到第二目标时域信号;根据所述第二目标时域信号,确定所述多声道信号的ITD参数。
  16. 如权利要求14或15所述的编码器,其特征在于,所述确定单元具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
  17. 如权利要求14-16中任一项所述的编码器,其特征在于,所述确定单元具体用于根据所述第一目标时域信号的采样值最大的采样点对应的索引值,确定所述多声道信号的ITD参数。
  18. 如权利要求17所述的编码器,其特征在于,所述确定单元具体用于将所述索引值确定为所述多声道信号的ITD参数。
  19. 如权利要求13-18中任一项所述的编码器,其特征在于,所述生成单元具体用于根据所述多声道信号,确定所述第一目标频域信号的幅值;根 据所述多声道信号,确定所述多声道信号的IPD参数;根据所述第一目标频域信号的幅值,以及所述多声道信号的IPD参数,生成所述第一目标频域信号。
  20. 如权利要求19所述的编码器,其特征在于,所述生成单元具体用于根据
    Figure PCTCN2016103594-appb-100003
    确定所述第一目标频域信号的幅值,其中,AM(k)表示所述第一目标频域信号的幅值,A1(k)和A2(k)分别表示所述多声道信号中的任意两个声道的频域信号的幅值,k表示频点。
  21. 如权利要求19或20所述的编码器,其特征在于,所述生成单元具体用于根据
    Figure PCTCN2016103594-appb-100004
    生成所述第一目标频域信号,其中,AM(k)表示所述第一目标频域信号的幅值,XM_real(k)表示所述第一目标频域信号的实部,XM_iamge(k)表示所述第一目标频域信号的虚部,IPD(k)表示所述多声道信号的IPD参数,k表示频点。
  22. 如权利要求13-18中任一项所述的编码器,其特征在于,所述生成单元具体用于根据XM(k)=X1(k)*X* 2(k),生成频域信号XM(k),其中,X1(k)表示所述多声道信号中的第一声道的频域信号,X* 2(k)表示所述多声道信号中的第二声道的频域信号的共轭,k表示频点;对所述频域信号XM(k)的幅值进行归一化处理,得到所述第一目标频域信号。
  23. 一种编码器,其特征在于,包括:
    获取单元,用于获取多声道信号;
    第一生成单元,用于根据所述多声道信号,生成第一目标频域信号,所述第一目标频域信号位于第一频域范围内,且所述第一目标频域信号的相位与所述多声道信号的声道间相位差IPD线性相关;
    第一频时变换单元,用于对所述第一目标频域信号进行频时变换,得到第一目标时域信号;
    第一确定单元,用于根据所述第一目标时域信号,确定所述多声道信号 是否包括反相信号;
    第二生成单元,用于在所述多声道信号不包括反相信号的情况下,根据所述多声道信号,生成第二目标频域信号,所述第二目标频域信号位于第二频域范围内,所述第二频域范围与所述第一频域范围不同,所述第二目标频域信号的相位与所述多声道信号的IPD线性相关;
    第二频时变换单元,用于对所述第二目标频域信号进行频时变换,得到第二目标时域信号;
    第二确定单元,用于根据所述第二目标时域信号,确定所述多声道信号的声道间时间差ITD参数;
    第一编码单元,用于对所述多声道信号的ITD参数进行编码;
    第三确定单元,用于在所述多声道信号包括反向信号的情况下,确定所述多声道信号的IPD参数;
    第二编码单元,用于对所述多声道信号的IPD参数进行编码。
  24. 如权利要求23所述的编码器,其特征在于,所述第二频时变换单元具体用于对所述第二目标频域信号中的除所述第一频域范围的频域信号进行频时变换,得到第三目标时域信号,其中,所述第二频域范围包括所述第一频域范围;将所述第一目标时域信号和所述第三目标时域信号叠加,得到所述第二目标时域信号。
PCT/CN2016/103594 2016-05-10 2016-10-27 多声道信号的编码方法和编码器 WO2017193550A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610304389.8A CN107358960B (zh) 2016-05-10 2016-05-10 多声道信号的编码方法和编码器
CN201610304389.8 2016-05-10

Publications (1)

Publication Number Publication Date
WO2017193550A1 true WO2017193550A1 (zh) 2017-11-16

Family

ID=60266133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103594 WO2017193550A1 (zh) 2016-05-10 2016-10-27 多声道信号的编码方法和编码器

Country Status (2)

Country Link
CN (1) CN107358960B (zh)
WO (1) WO2017193550A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647156A (zh) * 2002-04-22 2005-07-27 皇家飞利浦电子股份有限公司 参数多声道音频表示
CN1748247A (zh) * 2003-02-11 2006-03-15 皇家飞利浦电子股份有限公司 音频编码
CN101556799A (zh) * 2009-05-14 2009-10-14 华为技术有限公司 一种音频解码方法和音频解码器
CN103403800A (zh) * 2011-02-02 2013-11-20 瑞典爱立信有限公司 确定多声道音频信号的声道间时间差
CN104205211A (zh) * 2012-04-05 2014-12-10 华为技术有限公司 多声道音频编码器以及用于对多声道音频信号进行编码的方法
CN104681029A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 立体声相位参数的编码方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7116787B2 (en) * 2001-05-04 2006-10-03 Agere Systems Inc. Perceptual synthesis of auditory scenes
BR0305555A (pt) * 2002-07-16 2004-09-28 Koninkl Philips Electronics Nv Método e codificador para codificar um sinal de áudio, aparelho para fornecimento de um sinal de áudio, sinal de áudio codificado, meio de armazenamento, e, método e decodificador para decodificar um sinal de áudio codificado
KR20060090984A (ko) * 2003-09-29 2006-08-17 코닌클리케 필립스 일렉트로닉스 엔.브이. 오디오 신호들을 엔코딩하는 방법 및 장치
WO2009046223A2 (en) * 2007-10-03 2009-04-09 Creative Technology Ltd Spatial audio analysis and synthesis for binaural reproduction and format conversion
EP2702776B1 (en) * 2012-02-17 2015-09-23 Huawei Technologies Co., Ltd. Parametric encoder for encoding a multi-channel audio signal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647156A (zh) * 2002-04-22 2005-07-27 皇家飞利浦电子股份有限公司 参数多声道音频表示
CN1748247A (zh) * 2003-02-11 2006-03-15 皇家飞利浦电子股份有限公司 音频编码
CN101556799A (zh) * 2009-05-14 2009-10-14 华为技术有限公司 一种音频解码方法和音频解码器
CN103403800A (zh) * 2011-02-02 2013-11-20 瑞典爱立信有限公司 确定多声道音频信号的声道间时间差
CN104205211A (zh) * 2012-04-05 2014-12-10 华为技术有限公司 多声道音频编码器以及用于对多声道音频信号进行编码的方法
CN104681029A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 立体声相位参数的编码方法及装置

Also Published As

Publication number Publication date
CN107358960B (zh) 2021-10-26
CN107358960A (zh) 2017-11-17

Similar Documents

Publication Publication Date Title
JP7443423B2 (ja) マルチチャネル信号の符号化方法およびエンコーダ
US10311881B2 (en) Determining the inter-channel time difference of a multi-channel audio signal
US9525956B2 (en) Determining the inter-channel time difference of a multi-channel audio signal
US20160254002A1 (en) Method and apparatus for encoding stereo phase parameter
WO2018188424A1 (zh) 多声道信号的编解码方法和编解码器
WO2018028171A1 (zh) 多声道信号的编码方法和编码器
US20240161755A1 (en) Inter-Channel Phase Difference Parameter Extraction Method and Apparatus
JP6487569B2 (ja) チャネル間時間差パラメータを決定するための方法および装置
WO2017193550A1 (zh) 多声道信号的编码方法和编码器
WO2017193551A1 (zh) 多声道信号的编码方法和编码器
WO2017193549A1 (zh) 多声道信号的编码方法和编码器

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16901505

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16901505

Country of ref document: EP

Kind code of ref document: A1