WO2016141731A1 - 确定声道间时间差参数的方法和装置 - Google Patents

确定声道间时间差参数的方法和装置 Download PDF

Info

Publication number
WO2016141731A1
WO2016141731A1 PCT/CN2015/095090 CN2015095090W WO2016141731A1 WO 2016141731 A1 WO2016141731 A1 WO 2016141731A1 CN 2015095090 W CN2015095090 W CN 2015095090W WO 2016141731 A1 WO2016141731 A1 WO 2016141731A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
search
complexity
time domain
domain signal
Prior art date
Application number
PCT/CN2015/095090
Other languages
English (en)
French (fr)
Inventor
张兴涛
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to MX2017011466A priority Critical patent/MX2017011466A/es
Priority to KR1020177025506A priority patent/KR20170116132A/ko
Priority to CA2977843A priority patent/CA2977843A1/en
Priority to JP2017547578A priority patent/JP2018508047A/ja
Priority to SG11201706997PA priority patent/SG11201706997PA/en
Priority to EP15884409.2A priority patent/EP3255632B1/en
Priority to BR112017018819-8A priority patent/BR112017018819A2/zh
Priority to RU2017134756A priority patent/RU2682026C1/ru
Priority to AU2015385489A priority patent/AU2015385489B2/en
Publication of WO2016141731A1 publication Critical patent/WO2016141731A1/zh
Priority to US15/696,716 priority patent/US10388288B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the present invention relates to the field of audio processing and, more particularly, to a method and apparatus for determining inter-channel time difference parameters.
  • stereo audio has the sense of orientation and distribution of each source, which can improve the clarity and intelligibility of information, and is therefore favored by people.
  • a transmission technology for a stereo audio signal is known, and the encoding end converts a stereo signal into a mono audio signal and an Inter-Channel Time Difference (ITD) parameter, which are respectively encoded and transmitted.
  • ITD Inter-Channel Time Difference
  • the stereo signal is further restored according to parameters such as ITD, thereby enabling low-bit high-quality transmission of the stereo signal.
  • the encoding end can determine the limit value T max of the ITD parameter at the sampling rate based on the sampling rate of the input audio signal, and thus, based on the input audio signal, the search range of [-T max , T max ] Within the specified step size, the calculation is performed to obtain the ITD parameters. Therefore, regardless of the channel quality, the above search range and search step are the same.
  • the accuracy requirements of the ITD parameters are different. For example, if the channel quality is poor, the accuracy of the ITD parameters is low. At this time, if the above larger search range and smaller search are still used, The step size will cause waste of computing resources and seriously affect processing efficiency.
  • Embodiments of the present invention provide a method and apparatus for determining a time difference parameter between channels, which can adapt the accuracy of the determined ITD parameter to the channel quality.
  • a method for determining an inter-channel time difference parameter comprising: determining a target search complexity from at least two search complexity, wherein the at least two search complexity and at least two channels The quality values are corresponding one by one; according to the target search complexity, the first channel signal and the second channel signal are searched to determine the first channel and the second channel corresponding to the first channel Time difference ITD parameter between channels.
  • determining the target search complexity from the at least two search complexity includes: acquiring an encoding parameter for a stereo signal, where the stereo signal is based on the first A channel signal and a signal of the second channel are generated, and the coding parameter is determined according to a current channel quality value, and the coding parameter includes any one of the following parameters: an encoding bit rate, a coding bit number, or used to indicate the The complexity control parameter of the search complexity; according to the coding parameter, the target search complexity is determined from at least two search complexity.
  • the at least two search complexity are in one-to-one correspondence with the at least two search steps, and the at least two search complexity includes the first a search complexity and a second search complexity, the at least two search steps including a first search step and a second search step, wherein the first search step corresponding to the first search complexity is smaller than the second Searching for a second search step corresponding to the complexity, the first search complexity being higher than the second search complexity, and the signal of the first channel and the signal of the second channel according to the target search complexity
  • Performing a search process includes: determining a target search step size corresponding to the target search complexity; and performing a search process on the signal of the first channel and the signal of the second channel according to the target search step.
  • the at least two search complexity are in one-to-one correspondence with at least two search ranges, where the at least two search complexity includes a third search
  • the complexity and the fourth search complexity the at least two search ranges include a first search range and a second search range, wherein the first search range corresponding to the third search complexity is greater than the fourth search complexity a second search range, the third search complexity is higher than the fourth search complexity, and the search processing is performed on the signal of the first channel and the signal of the second channel according to the target search complexity, including: Determining a target search range corresponding to the target search complexity; and searching for the first channel signal and the second channel signal on the target search range.
  • the determining a target search range corresponding to the target search complexity includes: determining a time domain signal according to the first channel The time domain signal of the second channel determines a reference parameter corresponding to an acquisition order between the time domain signal of the first channel and the time domain signal of the second channel, wherein the first sound The time domain signal of the track and the time domain signal of the second channel correspond to the same time period; the target search range is determined according to the target search complexity, the reference parameter and the limit value T max , wherein the limit value T max is The target search range belongs to [-T max , 0], or the target search range belongs to [0, T max ], determined according to the sampling rate of the time domain signal of the first channel.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including: The time domain signal of the first channel and the time domain signal of the second channel are subjected to cross-correlation processing to determine a first cross-correlation processing value and a second cross-correlation processing value, wherein the first cross-correlation processing value is a maximum function value of a first-channel time domain signal relative to a cross-correlation function of the second-channel time-domain signal within a preset range, the second cross-correlation processing value being a time domain signal of the second channel a maximum function value of the cross-correlation function of the time domain signal of the first channel in the preset range; determining the size relationship between the first cross-correlation processing value and the second cross-correlation processing value Benchmark parameters.
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value. Or the opposite of the index value.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including: The time domain signal of the first channel and the time domain signal of the second channel perform peak detection processing to determine a first index value and a second index value, wherein the first index value is related to the first channel An index value corresponding to a maximum amplitude value of the time domain signal within a preset range, the second index value being an index value corresponding to a maximum amplitude value of the time domain signal of the second channel within the preset range; The reference parameter is determined according to a size relationship between the first index value and the second index value.
  • the method further includes: performing smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter Is the ITD parameter of the first time period, the second ITD parameter is a smoothed value of the ITD parameter of the second time period, and the second time period is before the first time period.
  • an apparatus for determining a time difference parameter between channels comprising: a determining unit, configured to determine a target search complexity from at least two search complexity, wherein the at least two search complexity correspond to at least two channel quality values one by one; and a processing unit configured to search according to the target The complexity is to perform a search process on the signal of the first channel and the signal of the second channel to determine a first inter-channel time difference ITD parameter corresponding to the first channel and the second channel.
  • the determining unit is specifically configured to acquire an encoding parameter for the stereo signal, where the stereo signal is based on the signal of the first channel and the second channel Generating the signal, the encoding parameter is determined according to a current channel quality value, and the encoding parameter includes any one of the following parameters: an encoding bit rate, a number of encoding bits, or a complexity control parameter used to indicate the search complexity; The encoding parameter determines a target search complexity from at least two search complexity.
  • the at least two search complexity are in one-to-one correspondence with the at least two search steps, and the at least two search complexity includes the first a search complexity and a second search complexity, the at least two search steps including a first search step and a second search step, wherein the first search step corresponding to the first search complexity is smaller than the second Searching for a second search step corresponding to the complexity, the first search complexity is higher than the second search complexity, and the processing unit is specifically configured to determine a target search step size corresponding to the target search complexity; Searching for the signal of the first channel and the signal of the second channel according to the target search step.
  • the at least two search complexity are in one-to-one correspondence with at least two search ranges, where the at least two search complexity includes a third search
  • the complexity and the fourth search complexity the at least two search ranges include a first search range and a second search range, wherein the first search range corresponding to the third search complexity is greater than the fourth search complexity a second search range, the third search complexity is higher than the fourth search complexity
  • the processing unit is specifically configured to determine a target search range corresponding to the target search complexity; for using the target search range And performing a search process on the signal of the first channel and the signal of the second channel.
  • the processing unit is configured to determine, according to the time domain signal of the first channel and the time domain signal of the second channel, a reference parameter, the reference parameter corresponding to an acquisition sequence between the time domain signal of the first channel and the time domain signal of the second channel, wherein the time domain signal of the first channel and the second channel
  • the time domain signal corresponds to the same time period
  • the target search range is determined according to the target search complexity, the reference parameter and the limit value T max , wherein the limit value T max is based on the time domain signal of the first channel
  • the target search range is determined by the sampling rate, which belongs to [-T max , 0], or the target search range belongs to [0, T max ].
  • the processing unit is specifically configured to perform mutual interaction between the time domain signal of the first channel and the time domain signal of the second channel.
  • Correlation processing to determine a first cross-correlation processing value and a second cross-correlation processing value, wherein the first cross-correlation processing value is a time domain signal of the first channel relative to a time domain signal of the second channel a maximum function value of the cross-correlation function within a preset range, the second cross-correlation processing value being a cross-correlation function of the time domain signal of the second channel relative to the time domain signal of the first channel at the preset range a maximum function value; configured to determine the reference parameter according to a size relationship between the first cross-correlation processing value and the second cross-correlation processing value.
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value. Or the opposite of the index value.
  • the processing unit is configured to perform peaking on the time domain signal of the first channel and the time domain signal of the second channel. Detecting to determine a first index value and a second index value, wherein the first index value is an index value corresponding to a maximum amplitude value of the first channel time domain signal within a preset range, the first The index value corresponding to the maximum amplitude value of the time domain signal of the second channel in the preset range; for determining a size relationship between the first index value and the second index value, Determine the baseline parameter.
  • the processing unit is further configured to perform a smoothing process on the first ITD parameter based on the second ITD parameter, where the first ITD The parameter is an ITD parameter of a first time period, the second ITD parameter being a smoothed value of the ITD parameter of the second time period, the second time period being before the first time period.
  • a method and apparatus for determining an inter-channel time difference parameter according to an embodiment of the present invention, determining a target search complexity corresponding to a current channel quality from at least two search complexity levels, and pairing the first according to the target search complexity
  • the signal of the channel and the signal of the second channel are searched, so that the accuracy of the determined ITD parameter can be adapted to the channel quality, and thus, in the case of poor current channel quality, the target search complexity can be reduced.
  • the complexity or amount of computation of the search process can support savings in computing resources and increase processing efficiency.
  • FIG. 1 is a schematic flow chart of a method of determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a search range determination process in accordance with an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a process of determining a target search range according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a process of determining a target search range according to still another embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of an apparatus for determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • the execution body of the method 100 may be an encoding end device for transmitting an audio signal (also referred to as a transmitting device). As shown in FIG. 1, the method 100 includes:
  • S110 Determine target search complexity from at least two search complexity, where the at least two search complexity corresponds to at least two channel quality values.
  • the method 100 of determining an inter-channel time difference parameter of an embodiment of the present invention may be applied to an audio system having at least two channels in which, by at least two channels (ie, including the first channel and the a two-channel mono signal that synthesizes a stereo signal, for example, by a left sound
  • the mono signal of the track i.e., an example of the first channel
  • the mono signal from the right channel i.e., an example of the second channel
  • a parametric stereo (PS) technique can be cited as a method for transmitting the stereo signal.
  • the encoding end converts the stereo signal into a mono signal and a spatial sensing parameter, and respectively performs encoding, and the decoding end is obtained. After the mono audio, the stereo signal is further restored according to the spatial parameters.
  • the inter-channel time difference (ITD) parameter is a spatial parameter indicating the horizontal orientation of the sound source and is an important component of the spatial parameter.
  • the embodiment of the present invention mainly relates to the process of determining the ITD parameter.
  • the process of encoding and decoding the stereo signal and the mono signal according to the ITD parameter is similar to the prior art, and a detailed description thereof is omitted herein to avoid redundancy.
  • the audio system may also have three or more channels, and can pass The mono signal of any two channels is combined into a stereo signal.
  • the processing procedure of applying the method 100 to an audio system having two channels ie, left channel and right channel
  • left The channel is used as the first channel
  • the right channel is used as the second channel.
  • the method for acquiring the ITD parameters between the left and right channels is also different for different search complexity, so that the encoding end device may first determine the current search complexity before determining the ITD parameters.
  • different search complexity corresponds to different ITD parameter acquisition manners (hereinafter, the specific relationship between the search complexity and the ITD parameter acquisition manner is described in detail), and the higher the search complexity, the obtained The higher the accuracy of the ITD parameters. Conversely, the lower the search complexity, the lower the accuracy of the obtained ITD parameters.
  • the encoding end device can make the accuracy of the obtained ITD parameter correspond to the current channel quality by selecting the search complexity corresponding to the current channel quality (ie, the target search complexity).
  • a plurality of (ie, at least two) channel qualities and a plurality of (ie, at least two) search complexity are directly recorded in a mapping relationship with each other in a mapping table (for ease of understanding) And the distinction is recorded as: mapping entry #1) and stored in the encoding device, so that the encoding device can directly search for the current channel quality in the mapping entry #1 after acquiring the current channel quality.
  • Search complexity as a target search complexity.
  • the search complexity can be divided into M levels (or, say, M kinds of search complexity, which are described as: M, M-1, ..., 1), and the M-level search complexity and M channels can be used.
  • Quality for example, note: Q M , Q M-1 , Q M-2 , ..., Q 1 , where Q M >Q M-1 >Q M-2 >...>Q 1 ) one-to-one correspondence, ie :
  • the search complexity corresponding to the channel quality Q M is M, that is, if the current channel quality is higher than or equal to the channel quality Q M , the determined target search complexity may be set to M.
  • the search complexity corresponding to the channel quality Q M-1 is M-1, that is, if the current channel quality is higher than or equal to the channel quality Q M-1 and lower than the channel quality Q M , the determined target
  • the search complexity can be set to M-1.
  • the search complexity corresponding to the channel quality Q M-2 is M-2, that is, if the current channel quality is higher than or equal to the channel quality Q M-2 and lower than the channel quality Q M-1 , then determined
  • the target search complexity can be set to M-2.
  • the search complexity corresponding to the channel quality Q 2 is 2, that is, if the current channel quality is higher than or equal to the channel quality Q 2 and lower than the channel quality Q 3 , the determined target search complexity can be set. Is 2.
  • the search complexity corresponding to the channel quality Q 1 is 1, that is, if the current channel quality is lower than the channel quality Q 2 , the determined target search complexity can be set to 1.
  • the channel quality refers to the quality of a channel between an encoding end and a decoding end for transmitting an audio signal and an ITD parameter to be described later.
  • determining the target search complexity from the at least two search complexity including:
  • the encoding parameter is determined according to a current channel quality value, and the encoding parameter includes any one of the following parameters: an encoding bit rate, a coding bit number, or a complexity control parameter used to indicate the search complexity;
  • the target search complexity is determined from at least two search complexity.
  • the better the channel quality the higher the encoding bit rate, the larger the number of encoding bits.
  • the worse the channel quality the lower the coding bit rate and the smaller the number of coded bits.
  • a plurality of (ie, at least two) coding bit rates and a plurality of (ie, at least two) search complexityes may be recorded in a one-to-one correspondence relationship with each other in a mapping entry ( In order to facilitate understanding and distinguishing, it is recorded as: mapping entry #2) and stored in the encoding device, so that the encoding device can directly find and current in the mapping entry #2 after obtaining the current encoding bit rate.
  • the encoding bit rate corresponds to the search complexity as the target search complexity.
  • the method and process for the encoding end device to obtain the current encoding bit rate may be similar to the prior art, and a detailed description thereof is omitted in order to avoid redundancy.
  • the search complexity can be divided into M levels (or, say, M kinds of search complexity, which are described as: M, M-1, ..., 1), and the M-level search complexity and M codes can be used.
  • Bit rate (recorded as: B M , B M-1 , B M-2 , ..., B 1 , where B M >B M-1 >B M-2 >...>B 1 )
  • B M >B M-1 >B M-2 >...>B 1 One-to-one correspondence, namely:
  • the search complexity corresponding to the encoding bit rate B M is M, that is, if the current encoding bit rate is higher than or equal to the encoding bit rate B M , the determined target search complexity may be set to M.
  • the search complexity corresponding to the encoding bit rate B M-1 is M-1, that is, if the current encoding bit rate is higher than or equal to the encoding bit rate B M-1 and lower than the encoding bit rate B M , then The determined target search complexity can be set to M-1.
  • the encoding complexity corresponding to the encoding bit rate B M-2 is M-2, that is, if the current encoding bit rate is higher than or equal to the encoding bit rate B M-2 and lower than the encoding bit rate B M-1 , the determined target search complexity can be set to M-2.
  • the encoding complexity corresponding to the encoding bit rate B 2 is 2, that is, if the current encoding bit rate is higher than or equal to the encoding bit rate B 2 and lower than the encoding bit rate B 3 , the determined target search is complicated.
  • the degree can be set to 2.
  • the encoding complexity corresponding to the encoding bit rate B 1 is 1, i.e., if the current encoding bit rate is lower than the encoding bit rate B 2 , the determined target search complexity can be set to 1.
  • a plurality of (ie, at least two) coded bit numbers and a plurality of (ie, at least two) search complexity degrees may be recorded in a one-to-one correspondence relationship with each other in a mapping entry ( In order to facilitate understanding and distinguishing, it is recorded as: mapping entry #3) and stored in the encoding device, so that the encoding device can directly find and current in the mapping entry #3 after obtaining the current number of encoded bits.
  • the number of encoded bits corresponds to the search complexity as the target search complexity.
  • the method and the process for the encoding end device to obtain the current number of encoded bits may be similar to the prior art, and a detailed description thereof will be omitted in order to avoid redundancy.
  • the search complexity can be divided into M levels (or, say, M kinds of search complexity, which are described as: M, M-1, ..., 1), and the M-level search complexity and M codes can be used.
  • the number of bits (remembered as: C M , C M-1 , C M-2 , ..., C 1 , where C M >C M-1 >C M-2 >...>C 1 ) one-to-one correspondence, namely:
  • the search complexity corresponding to the number of coded bits C M is M, that is, if the current number of coded bits is higher than or equal to the number of coded bits C M , the determined target search complexity may be set to M.
  • the search complexity corresponding to the number of coded bits C M-1 is M-1, that is, if the current number of coded bits is higher than or equal to the number of coded bits C M-1 and lower than the number of coded bits C M , then
  • the determined target search complexity can be set to M-1.
  • the search complexity corresponding to the number of coded bits C M-2 is M-2, that is, if the current number of coded bits is higher than or equal to the number of coded bits C M-2 and lower than the number of coded bits C M-1 , the determined target search complexity can be set to M-2.
  • the search complexity corresponding to the number of coded bits C 2 is 2, that is, if the current number of coded bits is higher than or equal to the number of coded bits C 2 and lower than the number of coded bits C 3 , the determined target search is complicated.
  • the degree can be set to 2.
  • the search complexity corresponding to the number of coded bits C 1 is 1, that is, if the current number of coded bits is lower than the number of coded bits C 2 , the determined target search complexity may be set to 1.
  • different complexity control parameters may be configured for different channel qualities, so that different complexity control parameter values can be matched to different search complexity, and thus, multiple (ie, , at least two) complexity control parameter values and a plurality of (ie, at least two) search complexity one-to-one correspondence with each other are recorded in the mapping table item (for ease of understanding and differentiation, it is recorded as: mapping table item #4 ) and stored in the encoding device, so that the encoding device can After obtaining the current complexity control parameter value, the search complexity corresponding to the current complexity control parameter value is directly searched in the mapping table item #4 as the target search complexity.
  • the complexity control parameter value can be written into the command line in advance, so that the encoding end device can read the current complexity control parameter value in the command line.
  • the search complexity can be divided into M levels (or, say, M kinds of search complexity, which are described as: M, M-1, ..., 1), and the M-level search complexity and M complexes can be used.
  • Degree control parameters (recorded as: N M , N M-1 , N M-2 , ..., N 1 , where N M >N M-1 >N M-2 >...>N 1 ) one-to-one correspondence, ie :
  • the complexity of the complexity control parameter N M is M, that is, if the current complexity control parameter is higher than or equal to the complexity control parameter N M , the determined target search complexity may be set to M. .
  • the complexity of the complexity control parameter N M-1 is M-1, that is, if the current complexity control parameter is higher than or equal to the complexity control parameter N M-1 and lower than the complexity control parameter. N M , then the determined target search complexity can be set to M-1.
  • the complexity control parameter N M-2 corresponds to a search complexity of M-2, that is, if the current complexity control parameter is higher than or equal to the complexity control parameter N M-2 and lower than the complexity control parameter N M-1 , the determined target search complexity can be set to M-2.
  • the complexity control parameter N 2 corresponds to a search complexity of 2, that is, if the current complexity control parameter is higher than or equal to the complexity control parameter N 2 and lower than the complexity control parameter N 3 , then determined.
  • the target search complexity can be set to 2.
  • the complexity control parameter N 1 corresponds to a search complexity of 1, that is, if the current complexity control parameter is lower than the complexity control parameter N 2 , the determined target search complexity may be set to 1.
  • the coding bit rate, the number of coding bits or the complexity control parameter listed above as the coding parameters are merely exemplary descriptions, and the present invention is not limited thereto. Others can be determined by channel quality, or can reflect channel quality. The information or parameters are all within the scope of the present invention.
  • the encoding end device may perform a search process according to the target search complex to acquire the ITD parameter.
  • different search complexity may correspond to different search step sizes (ie, case 1), or different search complexity may correspond to different search ranges (ie, case 2), below, respectively
  • the encoder determines the ITD parameters based on the target search complexity. The process is described in detail.
  • the at least two search complexity are in one-to-one correspondence with at least two search steps, the at least two search complexity including a first search complexity and a second search complexity, the at least two search steps including the first search step a second search step size, wherein the first search step size corresponding to the first search complexity is less than the second search step size corresponding to the second search complexity, the first search complexity being higher than the first Second search complexity, and
  • the first channel signal and the second channel signal are searched, including:
  • the M kinds of search complexity (ie, M, M-1, . . . , 1) may be compared with M search steps (recorded as: L M , L M-1 , L M-2 ,...,L 1 , where L M ⁇ L M-1 ⁇ L M-2 ⁇ ... ⁇ L 1 ) one-to-one correspondence, namely:
  • L M search step searches corresponding complexity is M, i.e., determined as described above, if the target of the search complexity is M, M can be the complexity of the search for the corresponding search step is set to L M Target search step size.
  • the search complexity corresponding to the search step L M-1 is M-1, that is, if the target search complexity determined as described above is M-1, the search complexity M-1 may be corresponding.
  • the search step size L M-1 is set to the target search step size.
  • the search complexity corresponding to the search step L M-2 is M-2, that is, if the target search complexity determined as described above is M-2, the search complexity M-2 may be corresponding.
  • the search step size L M-2 is set to the target search step size.
  • search step L 2 corresponding to the search complexity is 2, i.e., determined as described above if the search complexity of the target is 2, may be the complexity of the search corresponding to 2 L 2 set search step Search for the step size for the target.
  • a search step length L corresponding to the search complexity is 1, i.e., if the search is determined as described above is a certain complexity, the complexity of the search can be the one corresponding to the length L 1 is set search step Search for the step size for the target.
  • M search step sizes i.e., L M , L M-1 , L M-2 , ..., L 1
  • L M search step sizes can be determined according to the following formula. Specific value.
  • K is a preset value, indicating the number of searches when the complexity is the lowest, Indicates the rounding operation.
  • the left channel signal and the right channel signal may be searched according to the target search step length, Determine the ITD parameters.
  • the above-mentioned searchable processing may be performed in the time domain (ie, mode 1) or in the frequency domain (ie, mode 2), and the present invention is not particularly limited. .
  • the encoding end device can acquire an audio signal corresponding to the left channel by, for example, an audio input device such as a microphone corresponding to the left channel, and according to a preset sampling rate ⁇ (ie, the first channel
  • ie, the first channel
  • An example of the sampling rate of the time domain signal is to sample the audio signal to generate a time domain signal of the left channel (that is, an example of the time domain signal of the first channel, hereinafter, for ease of understanding and distinction, Do the time domain signal #L).
  • the process of acquiring the time domain signal #L may be similar to the prior art.
  • detailed description thereof is omitted.
  • the sampling rate of the time domain signal of the first channel is the same as the sampling rate of the time domain signal of the second channel. Therefore, similarly, the encoding end device may be, for example, opposite to the right channel.
  • An audio input device such as a microphone acquires an audio signal corresponding to the right channel, and samples the audio signal according to the sampling rate ⁇ to generate a time domain signal of the right channel (ie, the second channel
  • a time domain signal #R An example of the time domain signal is hereinafter described as a time domain signal #R) for ease of understanding and differentiation.
  • the time domain signal #L and the time domain signal #R are time domain signals corresponding to the same time period (or time domain signals acquired in the same time period), for example, when The domain signal #L and the time domain signal #R may be time domain signals corresponding to the same frame (ie, 20 ms). In this case, the time domain signal #L and the time domain signal #R can be obtained corresponding to the one frame signal.
  • An ITD parameter when the domain signal #L and the time domain signal #R may be time domain signals corresponding to the same frame (ie, 20 ms).
  • the time domain signal #L and the time domain signal #R may also be time domain signals corresponding to the same subframe (ie, 10 ms or 5 ms, etc.) in the same frame.
  • the time domain signal #R can obtain a plurality of ITD parameters corresponding to the one frame signal, for example, if the subframe corresponding to the time domain signal #L and the time domain signal #R is 10 ms, then the frame is passed (ie, , 20ms) signal can get two ITD parameters.
  • the subframe corresponding to the time domain signal #L and the time domain signal #R is 5 ms
  • four ITD parameters can be obtained by the one frame (ie, 20 ms) signal.
  • the lengths of the time periods corresponding to the time domain signal #L and the time domain signal #R enumerated above are merely illustrative, and the present invention is not limited thereto, and the length of the time period may be arbitrarily changed as needed.
  • the encoding end device may perform a search process on the time domain signal #L and the time domain signal #R according to the target search step size (ie, L t ) determined as described above, that is,
  • the encoding end device may determine the cross-correlation function c n (i) of the time domain signal #L with respect to the time domain signal #R according to the following Equation 1, and determine the time domain signal #R relative to the time domain signal according to the following Equation 2 #L's cross-correlation function c p (i), ie:
  • x R (j) represents the signal value of the time domain signal #R at the jth sampling point
  • x L (j+i) represents the signal value of the time domain signal #L at the j+ith sampling point
  • x L (j) represents the signal value of the time domain signal #L at the jth sampling point
  • x R (j+i) represents the signal value of the time domain signal #R at the j+ith sampling point
  • Length represents The total number of sampling points included in the time domain signal #R time domain signal #L, or the length of the time domain signal #R time domain signal #L, for example, may be the length of one frame (ie, 20 ms) or one sub The length of the frame (for example, 10ms or 5ms, etc.);
  • T max represents a limit value of the ITD parameter (or the maximum value of the acquisition time difference between the left time domain signal #L and the time domain signal #R) may be determined according to the above sampling rate ⁇ , and the determination method thereof may be The prior art is similar, and detailed description is omitted here to avoid redundancy;
  • Step 4 The encoding end device can calculate the time domain signal #L determined relative to the time domain signal when the search processing is performed on the time domain signal #R and the time domain signal #L with the target search step size (ie, L t ).
  • the encoding side device can calculate the time domain signal #R determined with respect to the time domain signal #L when the search processing is performed on the time domain signal #R and the time domain signal #L with the target search step size (ie, L t ).
  • the encoding end device can be versus The comparison is made and the ITD parameters are determined based on the comparison results.
  • the encoding device can The corresponding index value is taken as the ITD parameter.
  • the encoding device can The opposite of the corresponding index value is taken as the ITD parameter.
  • T max represents a limit value of the ITD parameter (or the maximum value of the acquisition time difference between the time domain signal #L and the time domain signal #R) may be determined according to the above sampling rate ⁇ , and the determination method thereof may be There is a technical similarity, and a detailed description thereof will be omitted herein to avoid redundancy.
  • the encoding end device may perform time-frequency transform processing on the time domain signal #L to obtain a frequency domain signal of the left channel (ie, an example of a frequency domain signal of the first channel, and below, for easy understanding and differentiation, recording frequency Domain signal #L).
  • the time domain signal #R may be subjected to time-frequency transform processing to obtain a frequency domain signal of the right channel (ie, an example of the frequency domain signal of the second channel, hereinafter, for ease of understanding and distinction, the frequency domain signal #R is recorded. )
  • a fast Fourier transform (FFT, Fast Fourier) may be employed.
  • FFT Fast Fourier transform
  • the Transformation technology performs time-frequency transform processing based on Equation 3 below.
  • X(k) represents the frequency domain signal and FFT_LENGTH represents the time-frequency transform length.
  • x(n) represents a time domain signal (ie, time domain signal #L or time domain signal #R), and Length represents the total number of sampling points included in the time domain signal.
  • the encoding end device may perform a search process on the frequency domain signal #L and the frequency domain signal #R according to the target search step size (ie, L t ) determined as described above, that is,
  • the encoding end device may divide the FFT_LENGTH frequency points of the frequency domain signal into N subband (for example, 1) subband according to the preset bandwidth A, where the information is included for the kth subband A k
  • the frequency point is A k-1 ⁇ b ⁇ A k -1;
  • Step c calculating a correlation function mag(j) of the frequency domain signal #L and the frequency domain signal #R according to the following Equation 4
  • X L (b) represents the signal value of the frequency domain signal #L at the bth frequency point
  • X R (b) represents the signal value of the frequency domain signal #R at the bth frequency point
  • FFT_LENGTH represents the time frequency conversion length.
  • T max represents a limit value of the ITD parameter (or the maximum value of the acquisition time difference between the left time domain signal #L and the time domain signal #R) may be determined according to the above sampling rate ⁇ , and the determination method thereof may be The prior art is similar, and detailed description is omitted here to avoid redundancy.
  • the encoding end device can determine the value of the ITD parameter of the kth subband. That is, the index value corresponding to the maximum value of mag(j).
  • the number of subbands corresponds to the value of the ITD parameter.
  • the encoding end device may further perform quantization processing and the like on the ITD parameter value, and process the processed ITD parameter value and the mono signal (for example, the time domain signal #L, the time domain signal #R, and the frequency domain signal). #L or frequency domain signal #R) is sent to the decoding device (or the receiving device).
  • the decoder device can recover the stereo audio signal based on the mono audio signal and the ITD parameter value.
  • the at least two search complexity are in one-to-one correspondence with at least two search scopes, the at least two search complexity including a third search complexity and a fourth search complexity, the at least two search ranges including the first search range and the first search range a second search range, wherein the first search range corresponding to the third search complexity is greater than the second search range corresponding to the fourth search complexity, the third search complexity being higher than the fourth search complexity, and
  • the first channel signal and the second channel signal are searched, including:
  • the M kinds of search complexity ie, M, M-1, . . . , 1
  • M search ranges (recorded as: F M , F M-1 , F M-2 ,...,F 1 , where F M >F M-1 >F M-2 >...>F 1 ) one-to-one correspondence, namely:
  • the search range corresponding F. M search complexity is M, i.e., determined as described above, if the target of the search complexity is M, M can be the complexity of the search corresponding to the search range set as the target search M F. range.
  • the search complexity corresponding to the search range F M-1 is M-1, that is, if the target search complexity determined as described above is M-1, the search complexity M-1 may be corresponding.
  • the search range F M-1 is set as the target search range.
  • the search complexity corresponding to the search range F M-2 is M-2, that is, if the target search complexity determined as described above is M-2, the search complexity M-2 may be corresponding.
  • the search range F M-2 is set as the target search range.
  • F 2 corresponding to the search range of the search complexity is 2, i.e., determined as described above if the search complexity of the target is 2, the search complexity may be 2 F 2 is set corresponding to the search range as a target Search range.
  • Search complexity is 1, i.e., if the search is determined as described above is a certain complexity, the complexity of the search can be the one corresponding to a search range set as the target F. Search range.
  • the search ranges F M , F M-1 , F M-2 , . . . , F 1 may all be search ranges in the time domain, or the above search range F M , F M-1 , F M-2 , ..., F 1 may also be search ranges in the frequency domain, and the present invention is not particularly limited.
  • the search range F M on the frequency domain with the highest search complexity can be determined as [-T max , T max ].
  • the target search range corresponding to the target search complexity is determined, including:
  • the target search range Determining the target search range according to the target search complexity, the reference parameter and the limit value T max , wherein the limit value T max is determined according to a sampling rate of the time domain signal, the target search range belongs to [-T max , 0], or the target search range belongs to [0, T max ].
  • the encoding end device can determine the reference parameter based on the time domain signal #L and the time domain signal #R.
  • the reference parameter may correspond to the time domain signal #L and the time domain signal #R acquisition order (for example, the sequence of input to the audio input device), and then, corresponding to the determination process of the reference parameter, the corresponding parameter
  • the relationship is described in detail.
  • the reference parameter (ie, mode X) may be determined by performing cross-correlation processing on the time domain signal #L and the time domain signal #R, and may also search for the time domain signal #L and the time domain signal.
  • the maximum value of #R is used to determine the reference parameter (ie, mode Y).
  • the mode X and the mode Y will be described in detail.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including:
  • Performing cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel Determining a first cross-correlation processing value and a second cross-correlation processing value, wherein the first cross-correlation processing value is a cross-correlation function of a time domain signal of the first channel relative to a time domain signal of the second channel a maximum function value within a preset range, the second cross-correlation processing value being a maximum function of a cross-correlation function of the time domain signal of the second channel relative to the time domain signal of the first channel within the preset range value;
  • the reference parameter is determined according to a size relationship between the first cross-correlation processing value and the second cross-correlation processing value.
  • the encoding end device may determine the cross-correlation function c n (i) of the time domain signal #L with respect to the time domain signal #R according to the following Equation 5, that is,
  • T max represents a limit value of the ITD parameter (or the maximum value of the acquisition time difference between the time domain signal #L and the time domain signal #R) may be determined according to the above sampling rate ⁇ , and the determination method thereof may be There is a technical similarity, and a detailed description thereof will be omitted herein to avoid redundancy.
  • x R (j) represents the signal value of the time domain signal #R at the jth sampling point
  • x L (j+i) represents the signal value of the time domain signal #L at the j+ith sampling point
  • Length represents The total number of sampling points included in the time domain signal #R, or the length of the time domain signal #R, for example, may be the length of one frame (ie, 20 ms) or the length of one subframe (for example, 10 ms or 5 ms, etc.) ).
  • the encoding end device can determine the maximum value of the cross correlation function c n (i)
  • the encoding end device can determine the cross-correlation function c p (i) of the time domain signal #R with respect to the time domain signal #L according to Equation 6 below, namely:
  • the encoding end device can determine the maximum value of the cross correlation function c p (i)
  • the encoding end device may be configured according to versus The relationship between the parameters is determined by the following method X1 or mode X2.
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to 1.
  • the encoding end device can determine that the reference parameter is greater than 0, thereby determining that the search range is [0, T max ], that is, when the time domain signal #L is acquired before the time domain signal #R
  • the ITD parameter is a positive number
  • the search range is [0, T max ] (that is, an example in which the search range belongs to [0, T max ]).
  • the encoding end device can determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number.
  • the reference parameter T can be set to zero.
  • the encoding end device can determine that the reference parameter is not greater than 0, thereby determining that the search range is [-T max , 0], that is, the time domain signal #L is after the time domain signal #R When acquired, the ITD parameter is a negative number and the search range is [-T max , 0] (ie, the search range belongs to an example of [-T max , 0]).
  • the search scope is F 2 .
  • the reference parameter is an inverse of an index value or an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value.
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to The corresponding index value.
  • the encoding end device may further determine whether the reference parameter T is greater than or equal to T max /2, and determine a search range according to the determination result, for example, when T When ⁇ T max /2, the search range is [T max /2, T max ] (that is, an example in which the search range belongs to [0, T max ]). When T ⁇ T max /2, the search range is [0, T max /2] (that is, another example in which the search range belongs to [0, T max ]).
  • the encoding end device can determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number.
  • the reference parameter T can be set to The opposite of the corresponding index value.
  • the encoding end device may further determine whether the reference parameter T is less than or equal to -T max /2, and determine a search range according to the determination result. For example, when T ⁇ -T max /2, the search range is [-T max , -T max /2] (that is, the search range belongs to an example of [-T max , 0]). When T>-T max /2, the search range is [-T max /2, 0] (that is, another example in which the search range belongs to [-T max , 0]).
  • the complexity of the search comprises three or more, can be from the [-T max, -T max / 2 ], [- T max / 2,0], [0, T max / 2] and [
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including:
  • the first index value is the first sound value
  • the second index value being an index corresponding to a maximum amplitude value of the time domain signal of the second channel within the preset range value
  • the reference parameter is determined according to a size relationship between the first index value and the second index value.
  • the encoding end device can detect the amplitude value of the time domain signal #L (represented as: L(j)) maximum value max(L(j)), j ⁇ [0, Length- 1], and record the index value p left corresponding to the max(L(j)), where Length represents the total number of sampling points included in the time domain signal #L.
  • the encoding end device can detect the amplitude value (represented as: R(j)) maximum value max(R(j)), j ⁇ [0, Length-1] of the time domain signal #R, and record the max (R) (j)) The corresponding index value p right , where Length represents the total number of sample points included in the time domain signal #R.
  • the encoding end device can determine the size relationship between p left and p right .
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to 1.
  • the encoding end device can determine that the reference parameter is greater than 0, thereby determining that the search range is [0, T max ], that is, when the time domain signal #L is acquired before the time domain signal #R
  • the ITD parameter is a positive number
  • the search range is [0, T max ] (that is, an example in which the search range belongs to [0, T max ]).
  • the encoding end device may determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number. In this case, The reference parameter T is set to zero.
  • the encoding end device can determine that the reference parameter is not greater than 0, thereby determining that the search range is [-T max , 0], that is, the time domain signal #L is after the time domain signal #R When acquired, the ITD parameter is a negative number and the search range is [-T max , 0] (ie, the search range belongs to an example of [-T max , 0]).
  • the search scope is F 2 .
  • the encoding end device may perform time-frequency transform processing on the time domain signal #L to obtain a frequency domain signal of the left channel (ie, an example of a frequency domain signal of the first channel, hereinafter, for ease of understanding and differentiation, the frequency domain is recorded.
  • Signal #L The time domain signal #R may be subjected to time-frequency transform processing to obtain a frequency domain signal of the right channel (ie, an example of the frequency domain signal of the second channel, hereinafter, for ease of understanding and distinction, the frequency domain signal #R is recorded. )
  • a fast Fourier transform (FFT) technique may be employed to perform time-frequency transform processing based on Equation 7 below.
  • X(k) represents the frequency domain signal and FFT_LENGTH represents the time-frequency transform length.
  • x(n) means The time domain signal (ie, time domain signal #L or time domain signal #R), Length represents the total number of sample points included in the time domain signal.
  • the encoding end device can perform search processing on the frequency domain signal #L and the frequency domain signal #R determined as described above within the search range determined as described above to determine the ITD between the left channel and the right channel.
  • Parameters for example, can be enumerated as follows:
  • the encoding end device may divide the FFT_LENGTH frequency points of the frequency domain signal into N subband (for example, 1) subband according to the preset bandwidth A, where the frequency included in the kth subband A k is included.
  • the point is A k-1 ⁇ b ⁇ A k -1,
  • the correlation function mag(j) of the frequency domain signal #L is calculated according to the following Equation 8.
  • X L (b) represents the signal value of the frequency domain signal #L at the bth frequency point
  • X R (b) represents the signal value of the frequency domain signal #R at the bth frequency point
  • FFT_LENGTH represents the time frequency conversion length.
  • the range of values of j is the search range determined as described above. For ease of understanding and explanation, the search range is denoted as [a, b].
  • the ITD parameter value of the kth subband is That is, the index value corresponding to the maximum value of mag(j).
  • one or more (corresponding to the number of sub-bands determined as described above) between the left channel and the right channel can be obtained as the ITD parameter value.
  • the encoding end device may further perform quantization processing or the like on the ITD parameter value, and send the processed ITD parameter value and the mono signal obtained by, for example, downmixing the signals of the left and right channels to the decoding end device. (or, the receiving device).
  • the decoder device can recover the stereo audio signal based on the mono audio signal and the ITD parameter value.
  • the method further includes:
  • the first ITD parameter is an ITD parameter of a first time period
  • the second ITD parameter is a smoothed value of an ITD parameter of a second time period
  • the second The time period is before the first time period
  • the encoding end device may further smooth the ITD parameter value as described above, as an example and not a limitation, the encoding end device. This smoothing can be performed according to Equation 5 below:
  • T sm (k) w 1 *T sm [-1] (k)+w 2 *T(k) Equation 5
  • T sm (k) represents the smoothed ITD parameter value corresponding to the kth frame or the kth subframe
  • T sm [-1] represents the k-1th frame or the k-1th subframe corresponding to
  • T(k) represents the unsmoothed ITD parameter value corresponding to the kth frame or the kth subframe
  • w 1 and w 2 are smoothing factors
  • T sm [-1] can be a preset value.
  • the foregoing smoothing process may be performed by the encoding end device, or may be performed by the decoding end device, and the present invention is not particularly limited, that is, the encoding end.
  • the device may also directly send the ITD parameter value obtained as described above to the decoding end device without performing the smoothing process described above, and perform smoothing processing on the ITD parameter value by the decoding end device, and perform smoothing processing by the decoding end device.
  • the method and process may be similar to the method and process of smoothing performed by the above-mentioned decoding device. Here, in order to avoid redundancy, detailed description thereof will be omitted.
  • a method for determining an inter-channel time difference parameter by determining a target search complexity corresponding to a current channel quality from at least two search complexity levels, and pairing the first channel according to the target search complexity
  • the signal and the signal of the second channel are searched, so that the accuracy of the determined ITD parameter can be adapted to the channel quality, so that in the case of poor current channel quality, the search complexity can be reduced by the target search complexity.
  • the complexity or amount of computation can support savings in computing resources and increase processing efficiency.
  • FIG. 5 shows a schematic block diagram of an apparatus 200 for determining an inter-channel time difference parameter in accordance with an embodiment of the present invention. As shown in FIG. 5, the apparatus 200 includes:
  • a determining unit 210 configured to determine a target search complexity from at least two search complexity, wherein the at least two search complexity correspond to at least two channel quality values one by one;
  • the processing unit 220 is configured to perform a search process on the signal of the first channel and the signal of the second channel according to the target search complexity to determine a first corresponding to the first channel and the second channel Inter-channel time difference ITD parameter.
  • the determining unit 210 is specifically configured to acquire an encoding parameter for the stereo signal, where the stereo signal is generated based on the signal of the first channel and the signal of the second channel, where the encoding parameter is based on a current channel.
  • the encoding parameter includes any one of the following parameters: an encoding bit rate, a number of encoding bits, or a complexity control parameter used to indicate the search complexity; and is used to learn from at least two search complexity according to the encoding parameter. To determine the target search complexity.
  • the at least two search complexity are in one-to-one correspondence with at least two search steps, the at least two search complexity including a first search complexity and a second search complexity, the at least two search steps including a first search step size and a second search step size, wherein the first search step size corresponding to the first search complexity is less than the second search step size corresponding to the second search complexity, the first search complexity Higher than the second search complexity, and the processing unit 220 is specifically configured to determine a target search step size corresponding to the target search complexity; and a signal for the first channel according to the target search step size and The signal of the second channel is subjected to a search process.
  • the at least two search complexity are in one-to-one correspondence with the at least two search ranges, wherein the first search range corresponding to the third search complexity is greater than the second corresponding to the fourth search complexity a search range, the third search complexity is higher than the fourth search complexity, and the processing unit 220 is specifically configured to determine a target search range corresponding to the target search complexity; for using the target search range, The signal of the first channel and the signal of the second channel perform a search process.
  • the processing unit 220 is configured to determine a reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel, where the reference parameter corresponds to a time domain signal of the first channel. And an acquisition sequence between the time domain signals of the second channel, wherein the time domain signal of the first channel and the time domain signal of the second channel correspond to the same time period; and the search complexity is used according to the target
  • the reference parameter and the limit value T max are determined, wherein the limit value T max is determined according to a sampling rate of the time domain signal of the first channel, and the target search range belongs to [-T max , 0], or the target search range belongs to [0, T max ].
  • the processing unit 220 is configured to perform cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first cross-correlation processing value and a second cross-correlation processing.
  • a value wherein the first cross-correlation processing value is a maximum function value of a cross-correlation function of the time domain signal of the first channel relative to a time domain signal of the second channel within a preset range, the second mutual The correlation processing value is a maximum function value of the cross-correlation function of the time domain signal of the second channel relative to the time domain signal of the first channel within the preset range; and is used for processing the value according to the first cross correlation and The size relationship between the second cross-correlation processing values determines the reference parameter.
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value or an inverse of the index value.
  • the processing unit 220 is configured to perform peak detection processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first index value and a second index value, where
  • the first index value is an index value corresponding to a maximum amplitude value of the first channel time domain signal within a preset range
  • the second index value is a time domain signal with the second channel at the pre
  • the index value corresponding to the maximum amplitude value in the range is used; and the reference parameter is determined according to the size relationship between the first index value and the second index value.
  • the processing unit 220 is further configured to perform smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter is an ITD parameter of a first time period, and the second ITD parameter is a second A smoothed value of the ITD parameter of the time period, the second time period being before the first time period.
  • the apparatus 200 for determining the inter-channel time difference parameter according to the embodiment of the present invention may correspond to the encoding end device in the method of the embodiment of the present invention, and
  • the units and modules in the apparatus 200 for determining the inter-channel time difference parameter and the other operations and/or functions described above are respectively implemented in order to implement the corresponding processes of the method 100 in FIG. 1 , and are not described herein again for brevity.
  • An apparatus for determining an inter-channel time difference parameter determining a target search complexity corresponding to a current channel quality from at least two search complexity levels, and pairing the first channel according to the target search complexity
  • the signal and the signal of the second channel are searched, so that the accuracy of the determined ITD parameter can be adapted to the channel quality, so that in the case of poor current channel quality, the search complexity can be reduced by the target search complexity.
  • the complexity or amount of computation can support savings in computing resources and increase processing efficiency.
  • FIGS. 1 through 4 a method of determining an inter-channel time difference parameter according to an embodiment of the present invention is described in detail with reference to FIGS. 1 through 4.
  • a method for determining an inter-channel time difference parameter according to an embodiment of the present invention will be described in detail with reference to FIG. device.
  • FIG. 6 shows an illustration of an apparatus 300 for determining an inter-channel time difference parameter in accordance with an embodiment of the present invention.
  • the device 300 can include:
  • processor 320 connected to the bus
  • the processor 320 calls, by using the bus 310, a program stored in the memory 330, for determining a target search complexity from at least two search complexity, wherein the at least two search complexity and at least two One channel quality value corresponds to one;
  • the processor 320 is specifically configured to acquire an encoding parameter for a stereo signal, where the stereo signal is generated based on the signal of the first channel and the signal of the second channel, where the encoding parameter is based on a current channel.
  • the encoding parameter includes any one of the following parameters: an encoding bit rate, a number of encoding bits, or a complexity control parameter used to indicate the search complexity;
  • the at least two search complexity are in one-to-one correspondence with at least two search steps, the at least two search complexity including a first search complexity and a second search complexity, the at least two search steps including a first search step size and a second search step size, wherein the first search step size corresponding to the first search complexity is less than the second search step size corresponding to the second search complexity, the first search complexity Higher than the second search complexity, and
  • the processor 320 is specifically configured to determine a target search step size corresponding to the target search complexity
  • the at least two search complexity are in one-to-one correspondence with at least two search scopes, where the at least two search complexity includes a third search complexity and a fourth search complexity, where the at least two search ranges include the first a search range and a second search range, wherein the first search range corresponding to the third search complexity is greater than the second search range corresponding to the fourth search complexity, the third search complexity being higher than the fourth search Complexity, and
  • the processor 320 is specifically configured to determine a target search range corresponding to the target search complexity
  • the processor 320 is configured to determine a reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel, where the reference parameter corresponds to a time domain signal of the first channel. And an acquisition sequence between the time domain signals of the second channel, wherein the time domain signal of the first channel and the time domain signal of the second channel correspond to the same time period;
  • the target search range determining, according to the target search complexity, the reference parameter and the limit value T max , the target search range, wherein the limit value T max is determined according to a sampling rate of the time domain signal of the first channel, the target The search range belongs to [-T max , 0], or the target search range belongs to [0, T max ].
  • the processor 320 is configured to perform cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first cross correlation processing value and a second cross correlation processing.
  • a value wherein the first cross-correlation processing value is a maximum function value of a cross-correlation function of the time domain signal of the first channel relative to a time domain signal of the second channel within a preset range, the second mutual The correlation processing value is a maximum function value of the cross-correlation function of the time domain signal of the second channel relative to the time domain signal of the first channel within the preset range;
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value or an inverse of the index value.
  • the processor 320 is configured to perform peak detection processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first index value and a second index value, where
  • the first index value is an index value corresponding to a maximum amplitude value of the first channel time domain signal within a preset range
  • the second index value is a time domain signal with the second channel at the pre Set the index value corresponding to the maximum amplitude value in the range
  • the processor 320 is further configured to perform smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter is an ITD parameter of a first time period, and the second ITD parameter is a second A smoothed value of the ITD parameter of the time period, the second time period being before the first time period.
  • bus 310 the various components of the device 300 are coupled together by a bus 310, wherein the bus 310 includes a power bus, a control bus, and a status signal in addition to the data bus. bus.
  • bus 310 includes a power bus, a control bus, and a status signal in addition to the data bus.
  • bus 310 various buses are labeled as bus 310 in the figure.
  • the processor 320 can implement or perform the steps and logic blocks disclosed in the method embodiments of the present invention.
  • Processor 320 can be a microprocessor or the processor can be any conventional processor, decoder or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 330, and the processor reads the information in the memory 330 and performs the steps of the above method in combination with its hardware.
  • the processor 320 may be a central processing unit (“CPU"), and the processor 320 may also be other general-purpose processors, digital signal processors (DSPs). , an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 330 can include read only memory and random access memory and provides instructions and data to the processor 320. A portion of the memory 330 may also include a non-volatile random access memory. For example, the memory 330 can also store information of the device type.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 320 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the apparatus 300 for determining the inter-channel time difference parameter according to the embodiment of the present invention may correspond to the encoding end device in the method of the embodiment of the present invention, and
  • the units and modules in the apparatus 300 for determining the inter-channel time difference parameter and the other operations and/or functions described above are respectively implemented in order to implement the corresponding processes of the method 100 in FIG. 1 , and are not described herein again for brevity.
  • An apparatus for determining an inter-channel time difference parameter determining a target search complexity corresponding to a current channel quality from at least two search complexity levels, and pairing the first channel according to the target search complexity
  • the signal and the signal of the second channel are searched, so that the accuracy of the determined ITD parameter can be adapted to the channel quality, and thus, the current channel quality is poor.
  • the complexity of the search processing can be reduced by the target search complexity, and the computational resource can be saved and the processing efficiency can be improved.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Investigating Or Analyzing Materials By The Use Of Ultrasonic Waves (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

提供一种确定声道间时间差参数的方法(100)和装置(200),能够使所确定的ITD参数的精度与信道质量相适应,该方法(100)包括:从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应(S110);根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数(S120)。

Description

确定声道间时间差参数的方法和装置
本申请要求于2015年03月09日提交中国专利局、申请号为201510103379.3、发明名称为“确定声道间时间差参数的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及音频处理领域,并且更具体地,涉及确定声道间时间差参数的方法和装置。
背景技术
随着生活质量的提高,人们对高质量音频的需求不断增大。相对于单声道音频,立体声音频具有各生源的方位感和分布感,能够提高信息的清晰度和可懂度,因而备受人们青睐。
目前,已知一种针对立体声音频信号的传输技术,编码端将立体声信号转换为单声道音频信号和声道间时间差(ITD,Inter-Channel Time Difference)等参数,分别对其进行编码并传输给解码端,解码端得到单声道音频信号后,进一步根据ITD等参数恢复立体声信号,从而,能够实现立体声信号的低比特高质量传输。
在上述技术中,编码端基于输入音频信号的采样率,能够确定该采样率下ITD参数的极限值Tmax,从而,可以基于该输入音频信号,在[-Tmax,Tmax]的搜索范围内,以规定的步长搜索计算,以获得ITD参数。因此,无论信道质量如何,上述搜索范围及搜索步长均相同。
但是,根据信道质量的不同,对ITD参数的精度要求不同,例如,如果信道质量较差,则ITD参数的精度要求较低,此时,如果仍然使用上述较大的搜索范围和较小的搜索步长,将造成对计算资源的浪费,严重影响处理效率。
因此,希望提供一种技术,能够使所确定的ITD参数的精度与信道质量相适应。
发明内容
本发明实施例提供一种确定声道间时间差参数的方法和装置,能够使所确定的ITD参数的精度与信道质量相适应。
第一方面,提供了一种确定声道间时间差参数的方法,该方法包括:从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应;根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
结合第一方面,在第一方面的第一种实现方式中,该从至少两个搜索复杂度中,确定目标搜索复杂度,包括:获取针对立体声信号的编码参数,该立体声信号是基于该第一声道的信号及该第二声道的信号生成的,该编码参数是根据当前的信道质量值确定的,该编码参数包括以下任一参数:编码比特率、编码比特数或用于指示该搜索复杂度的复杂度控制参数;根据该编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
结合第一方面及其上述实现方式,在第一方面的第二种实现方式中,该至少两个搜索复杂度与至少两个搜索步长一一对应,该至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,该至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,该第一搜索复杂度高于该第二搜索复杂度,以及该根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:确定与该目标搜索复杂度相对应的目标搜索步长;根据该目标搜索步长,对该第一声道的信号及该第二声道的信号进行搜索处理。
结合第一方面及其上述实现方式,在第一方面的第三种实现方式中,该至少两个搜索复杂度与至少两个搜索范围一一对应,该至少两个搜索复杂度包括第三搜索复杂度和第四搜索复杂度,该至少两个搜索范围包括第一搜索范围和第二搜索范围,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,该第三搜索复杂度高于该第四搜索复杂度,以及该根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:确定与该目标搜索复杂度相对应的目标搜索范围;在该目标搜索范围上,对该第一声道的信号及该第二声道的信号进行搜索处理。
结合第一方面及其上述实现方式,在第一方面的第四种实现方式中,该确定与该目标搜索复杂度相对应的目标搜索范围,包括:根据该第一声道的时域信号及该第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;根据该目标搜索复杂度、该基准参数和极限值Tmax,确定该目标搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该目标搜索范围属于[-Tmax,0],或该目标搜索范围属于[0,Tmax]。
结合第一方面及其上述实现方式,在第一方面的第五种实现方式中,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
结合第一方面及其上述实现方式,在第一方面的第六种实现方式中,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
结合第一方面及其上述实现方式,在第一方面的第七种实现方式中,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
结合第一方面及其上述实现方式,在第一方面的第八种实现方式中,该方法还包括:基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
第二方面,提供了一种确定声道间时间差参数的装置,该装置包括:确 定单元,用于从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应;处理单元,用于根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
结合第二方面,在第二方面的第一种实现方式中,该确定单元具体用于获取针对立体声信号的编码参数,该立体声信号是基于该第一声道的信号及该第二声道的信号生成的,该编码参数是根据当前的信道质量值确定的,该编码参数包括以下任一参数:编码比特率、编码比特数或用于指示该搜索复杂度的复杂度控制参数;用于根据该编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
结合第二方面及其上述实现方式,在第二方面的第二种实现方式中,该至少两个搜索复杂度与至少两个搜索步长一一对应,该至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,该至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,该第一搜索复杂度高于该第二搜索复杂度,以及该处理单元具体用于确定与该目标搜索复杂度相对应的目标搜索步长;用于根据该目标搜索步长,对该第一声道的信号及该第二声道的信号进行搜索处理。
结合第二方面及其上述实现方式,在第二方面的第三种实现方式中,该至少两个搜索复杂度与至少两个搜索范围一一对应,该至少两个搜索复杂度包括第三搜索复杂度和第四搜索复杂度,该至少两个搜索范围包括第一搜索范围和第二搜索范围,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,该第三搜索复杂度高于该第四搜索复杂度,以及该处理单元具体用于确定与该目标搜索复杂度相对应的目标搜索范围;用于在该目标搜索范围上,对该第一声道的信号及该第二声道的信号进行搜索处理。
结合第二方面及其上述实现方式,在第二方面的第四种实现方式中,该处理单元具体用于根据该第一声道的时域信号及该第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;根据该目标搜索复杂度、该基准参数和极限值Tmax,确 定该目标搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该目标搜索范围属于[-Tmax,0],或该目标搜索范围属于[0,Tmax]。
结合第二方面及其上述实现方式,在第二方面的第五种实现方式中,该处理单元具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;用于根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
结合第二方面及其上述实现方式,在第二方面的第六种实现方式中,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
结合第二方面及其上述实现方式,在第二方面的第七种实现方式中,该处理单元具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;用于根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
结合第二方面及其上述实现方式,在第二方面的第八种实现方式中,该处理单元还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
根据本发明实施例的确定声道间时间差参数的方法和装置,通过从至少两个搜索复杂度中确定与当前信道质量相对应的目标搜索复杂度,并根据该目标搜索复杂度对对第一声道的信号及第二声道的信号进行搜索处理,能够使所确定的ITD参数的精度与信道质量相适应,从而,在当前信道质量较差的情况下,可以通过目标搜索复杂度,降低搜索处理的复杂度或计算量,进而,能够支持对计算资源的节约及对处理效率的提高。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本发明实施例的确定声道间时间差参数的方法的示意性流程图。
图2是根据本发明一实施例的搜索范围确定过程的示意图。
图3是根据本发明另一实施例的确定目标搜索范围的过程的示意图。
图4是根据本发明再一实施例的确定目标搜索范围的过程的示意图。
图5是根据本发明实施例的确定声道间时间差参数的装置的示意性框图。
图6是根据本发明实施例的确定声道间时间差参数的设备的示意性结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是示出了本发明实施例的确定声道间时间差参数的方法100的示意性流程图,该方法100的执行主体可以是传输音频信号的编码端设备(也可以称为,发送端设备),如图1所示,该方法100包括:
S110,从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应;
S120,根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
本发明实施例的确定声道间时间差参数的方法100可以应用于具有至少两个声道的音频系统,在该音频系统中,通过来自至少两个声道(即,包括第一声道和第二声道)的单声道信号合成立体声信号,例如,通过来自左声 道(即,第一声道的一例)的单声道信号和来自右声道(即,第二声道的一例)的单声道信号合成立体声信号。
其中,作为传输该立体声信号的方法,可以列举参数立体声(PS)技术,该技术根据空间感知特性,编码端将立体声信号转换为单声道信号和空间感知参数,并分别进行编码,解码端得到单声道音频后,进一步根据空间参数恢复立体声信号。该技术能够实现立体声信号的低比特高质量传输。声道间时间差ITD(ITD,Inter-Channel Time Difference)参数是表示声源水平方位的空间参数,是空间参数的重要组成部分,本发明实施例主要涉及该ITD参数的确定过程。另外,在本发明实施例中,根据ITD参数对立体声信号和单声道信号进行编解码的过程与现有技术相似,这里为了避免赘述,省略其详细说明。
应理解,以上列举的音频系统所具有的声道数量仅为示例性说明,本发明并未限定于此,例如,该音频系统也可以具有三个或三个以上的声道,并且,能够通过任意两个声道的单声道信号合成立体声信号。以下,为了便于理解,以将该方法100应用于具有两个声道(即,左声道和右声道)的音频系统使的处理过程为例,进行说明,并且,为了便于区分,以左声道作为第一声道,以右声道作为第二声道,进行说明。
在本发明实施例中,对于不同搜索复杂度,获取左右声道间的ITD参数的方法也相异,从而,编码端设备在确定ITD参数之前,可以首先确定当前的搜索复杂度。
搜索复杂度与信道质量之间存在映射关系,即,信道质量越好,编码比特率越高,且编码比特数越大,因而对ITD参数精度的要求越高。相反地,信道质量越差,编码比特率越低,且编码比特数越小,因而对ITD参数精度的要求越低。
在本发明实施例中,不同的搜索复杂度对应不同的ITD参数获取方式(随后,对该搜索复杂度和ITD参数获取方式之间的具体关系进行详细说明),搜索复杂度越高,所获得的ITD参数的精度越高。相反地,搜索复杂度越低,所获得的ITD参数的精度越低。
因此,编码端设备通过选择与当前信道质量相对应的搜索复杂度(即,目标搜索复杂度),能够使所获得的ITD参数的精度与当前的信道质量相对应。
即,在本发明实施例中,通过设定与多个(即,至少两个)信道质量彼此之间一一对应多个(即,至少两个)搜索复杂度,能够应对多种(即,至少两种)信道质量相异的通信条件,从而能够灵活应对对于ITD参数的精度的不同要求。
在本发明实施例中,可以直接将多个(即,至少两个)信道质量与多个(即,至少两个)搜索复杂度彼此之间一一对应关系记录在映射表项(为了便于理解和区分,记做:映射表项#1)并存储在编码端设备中,从而,编码端设备可以在获取当前信道质量后,直接在该映射表项#1中查找与当前信道质量相对应的搜索复杂度,作为目标搜索复杂度。
即,可以将搜索复杂度分为M级(或者说,设置M种搜索复杂度,记做:M,M-1,…,1),并且,可以时该M级搜索复杂度与M个信道质量(例如,记做:QM,QM-1,QM-2,…,Q1,其中,QM>QM-1>QM-2>…>Q1)一一对应,即:
例如,信道质量QM所对应的搜索复杂度为M,即,如果当前的信道质量高于或等于信道质量QM,则所确定的目标搜索复杂度可以设定为M。
再例如,信道质量QM-1所对应的搜索复杂度为M-1,即,如果当前的信道质量高于或等于信道质量QM-1且低于信道质量QM,则所确定的目标搜索复杂度可以设定为M-1。
再例如,信道质量QM-2所对应的搜索复杂度为M-2,即,如果当前的信道质量高于或等于信道质量QM-2且低于信道质量QM-1,则所确定的目标搜索复杂度可以设定为M-2。
再例如,信道质量Q2所对应的搜索复杂度为2,即,如果当前的信道质量高于或等于信道质量Q2且低于信道质量Q3,则所确定的目标搜索复杂度可以设定为2。
再例如,信道质量Q1所对应的搜索复杂度为1,即,如果当前的信道质量低于信道质量Q2,则所确定的目标搜索复杂度可以设定为1。
需要说明的是,信道质量是指编码端和解码端之间用于传输音频信号和后述ITD参数等的信道的质量。
应理解,以上列举的确定目标搜索复杂度的方法仅为示例性说明,本发明并不限定于此,例如,还可以列举以下方式,即:
可选地,该从至少两个搜索复杂度中,确定目标搜索复杂度,包括:
获取编码参数,该编码参数是根据当前的信道质量值确定的,该编码参数包括以下任一参数:编码比特率、编码比特数或用于指示该搜索复杂度的复杂度控制参数;
根据该编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
具体地说,由于信道质量与编码比特率和编码比特数之间存在对应关系,即,信道质量越好,编码比特率越高,编码比特数越大。相反地,信道质量越差,编码比特率越低,编码比特数越小。
因此,在本发明实施例中,也可以将多个(即,至少两个)编码比特率与多个(即,至少两个)搜索复杂度彼此之间一一对应关系记录在映射表项(为了便于理解和区分,记做:映射表项#2)并存储在编码端设备中,从而,编码端设备可以在获取当前的编码比特率后,直接在该映射表项#2中查找与当前的编码比特率相对应的搜索复杂度,作为目标搜索复杂度。这里,编码端设备获取当前的编码比特率的方法和过程可以与现有技术相似,为了避免赘述,省略其详细说明。
即,可以将搜索复杂度分为M级(或者说,设置M种搜索复杂度,记做:M,M-1,…,1),并且,可以时该M级搜索复杂度与M个编码比特率(记做:BM,BM-1,BM-2,…,B1,其中,BM>BM-1>BM-2>…>B1)一一对应,即:
例如,编码比特率BM所对应的搜索复杂度为M,即,如果当前的编码比特率高于或等于编码比特率BM,则所确定的目标搜索复杂度可以设定为M。
再例如,编码比特率BM-1所对应的搜索复杂度为M-1,即,如果当前的编码比特率高于或等于编码比特率BM-1且低于编码比特率BM,则所确定的目标搜索复杂度可以设定为M-1。
再例如,编码比特率BM-2所对应的搜索复杂度为M-2,即,如果当前的编码比特率高于或等于编码比特率BM-2且低于编码比特率BM-1,则所确定的目标搜索复杂度可以设定为M-2。
再例如,编码比特率B2所对应的搜索复杂度为2,即,如果当前的编码比特率高于或等于编码比特率B2且低于编码比特率B3,则所确定的目标搜索复杂度可以设定为2。
再例如,编码比特率B1所对应的搜索复杂度为1,即,如果当前的编码 比特率低于编码比特率B2,则所确定的目标搜索复杂度可以设定为1。
或者,在本发明实施例中,也可以将多个(即,至少两个)编码比特数与多个(即,至少两个)搜索复杂度彼此之间一一对应关系记录在映射表项(为了便于理解和区分,记做:映射表项#3)并存储在编码端设备中,从而,编码端设备可以在获取当前的编码比特数后,直接在该映射表项#3中查找与当前的编码比特数相对应的搜索复杂度,作为目标搜索复杂度。这里,编码端设备获取当前的编码比特数的方法和过程可以与现有技术相似,为了避免赘述,省略其详细说明。
即,可以将搜索复杂度分为M级(或者说,设置M种搜索复杂度,记做:M,M-1,…,1),并且,可以时该M级搜索复杂度与M个编码比特数(记做:CM,CM-1,CM-2,…,C1,其中,CM>CM-1>CM-2>…>C1)一一对应,即:
例如,编码比特数CM所对应的搜索复杂度为M,即,如果当前的编码比特数高于或等于编码比特数CM,则所确定的目标搜索复杂度可以设定为M。
再例如,编码比特数CM-1所对应的搜索复杂度为M-1,即,如果当前的编码比特数高于或等于编码比特数CM-1且低于编码比特数CM,则所确定的目标搜索复杂度可以设定为M-1。
再例如,编码比特数CM-2所对应的搜索复杂度为M-2,即,如果当前的编码比特数高于或等于编码比特数CM-2且低于编码比特数CM-1,则所确定的目标搜索复杂度可以设定为M-2。
再例如,编码比特数C2所对应的搜索复杂度为2,即,如果当前的编码比特数高于或等于编码比特数C2且低于编码比特数C3,则所确定的目标搜索复杂度可以设定为2。
再例如,编码比特数C1所对应的搜索复杂度为1,即,如果当前的编码比特数低于编码比特数C2,则所确定的目标搜索复杂度可以设定为1。
另外,在本发明实施例中,可以为不同的信道质量配置不同的复杂度控制参数,从而,能够使不同的复杂度控制参数值对应不同的搜索复杂度,进而,能够并将多个(即,至少两个)复杂度控制参数值与多个(即,至少两个)搜索复杂度彼此之间一一对应关系记录在映射表项(为了便于理解和区分,记做:映射表项#4)并存储在编码端设备中,从而,编码端设备可以在 获取当前的复杂度控制参数值后,直接在该映射表项#4中查找与当前的复杂度控制参数值相对应的搜索复杂度,作为目标搜索复杂度。这里,该复杂度控制参数值可以通过预先写入命令行,从而,编码端设备可以在命令行中读取当前的复杂度控制参数值。
即,可以将搜索复杂度分为M级(或者说,设置M种搜索复杂度,记做:M,M-1,…,1),并且,可以时该M级搜索复杂度与M个复杂度控制参数(记做:NM,NM-1,NM-2,…,N1,其中,NM>NM-1>NM-2>…>N1)一一对应,即:
例如,复杂度控制参数NM所对应的搜索复杂度为M,即,如果当前的复杂度控制参数高于或等于复杂度控制参数NM,则所确定的目标搜索复杂度可以设定为M。
再例如,复杂度控制参数NM-1所对应的搜索复杂度为M-1,即,如果当前的复杂度控制参数高于或等于复杂度控制参数NM-1且低于复杂度控制参数NM,则所确定的目标搜索复杂度可以设定为M-1。
再例如,复杂度控制参数NM-2所对应的搜索复杂度为M-2,即,如果当前的复杂度控制参数高于或等于复杂度控制参数NM-2且低于复杂度控制参数NM-1,则所确定的目标搜索复杂度可以设定为M-2。
再例如,复杂度控制参数N2所对应的搜索复杂度为2,即,如果当前的复杂度控制参数高于或等于复杂度控制参数N2且低于复杂度控制参数N3,则所确定的目标搜索复杂度可以设定为2。
再例如,复杂度控制参数N1所对应的搜索复杂度为1,即,如果当前的复杂度控制参数低于复杂度控制参数N2,则所确定的目标搜索复杂度可以设定为1。
应理解,以上列举的作为编码参数的编码比特率、编码比特数或复杂度控制参数仅为示例性说明,本发明并未限定于此,其他能够由信道质量决定,或者说,能够反映信道质量的信息或参数均落入本发明的保护范围内。
在如上所述确定了目标搜索复杂度之后,在S120,编码端设备可以根据该目标搜索复,进行搜索处理,以获取ITD参数。
在本发明实施例中,不同的搜索复杂度可以对应不同的搜索步长(即,情况1),或者,不同的搜索复杂度可以对应不同的搜索范围(即,情况2),下面,分别对以上两种情况下,编码端基于目标搜索复杂度确定ITD参数的 过程进行详细说明。
情况1
该至少两个搜索复杂度与至少两个搜索步长一一对应,该至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,该至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,该第一搜索复杂度高于该第二搜索复杂度,以及
该根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:
确定与该目标搜索复杂度相对应的目标搜索步长;
根据该目标搜索步长,对该第一声道的信号及该第二声道的信号进行搜索处理。
具体地说,在本发明实施例中,上述M种搜索复杂度(即,M,M-1,…,1),可以与M个搜索步长(记做:LM,LM-1,LM-2,…,L1,其中,LM<LM-1<LM-2<…<L1)一一对应,即:
例如,搜索步长LM所对应的搜索复杂度为M,即,如果如上所述确定的目标搜索复杂度为M,则可以将该搜索复杂度M所对应的搜索步长LM设定为目标搜索步长。
再例如,搜索步长LM-1所对应的搜索复杂度为M-1,即,如果如上所述确定的目标搜索复杂度为M-1,则可以将该搜索复杂度M-1所对应的搜索步长LM-1设定为目标搜索步长。
再例如,搜索步长LM-2所对应的搜索复杂度为M-2,即,如果如上所述确定的目标搜索复杂度为M-2,则可以将该搜索复杂度M-2所对应的搜索步长LM-2设定为目标搜索步长。
再例如,搜索步长L2所对应的搜索复杂度为2,即,如果如上所述确定的目标搜索复杂度为2,则可以将该搜索复杂度2所对应的搜索步长L2设定为目标搜索步长。
再例如,搜索步长L1所对应的搜索复杂度为1,即,如果如上所述确定的目标搜索复杂度为1,则可以将该搜索复杂度1所对应的搜索步长L1设定为目标搜索步长。
作为各步长的设定方式,例如,在本发明实施例中,可以根据以下公式 确定M个搜索步长(即,LM,LM-1,LM-2,…,L1)的具体值。
Figure PCTCN2015095090-appb-000001
Figure PCTCN2015095090-appb-000002
Figure PCTCN2015095090-appb-000003
其中,K为预设值,表示复杂度最低时的搜索次数,
Figure PCTCN2015095090-appb-000004
表示下取整运算。
另外,如果
Figure PCTCN2015095090-appb-000005
则搜索复杂度为i时的搜索次数增加1次。
需要说明的是,以上列举的确定各步长的方法和具体数值仅为示例性说明,本发明并未限定于此,可以根据需要任意确定,只要确保LM<LM-1<LM-2<…<L1即可。
在如上所述确定了目标搜索步长(以下,为了便于理解和区分,记做Lt)后,可以根据该目标搜索步长对左声道的信号和右声道的信号进行搜索处理,以确定ITD参数。
另外,上述可以搜索处理可以在时域上(即,方式1)进行也可以在频域上(即,方式2)进行,本发明并未特别限定,下面,分别对上述两种方式进行详细说明。
方式1
具体地说,编码端设备可以通过例如,与左声道相对应的麦克风等音频输入设备获取与左声道相对应的音频信号,并根据预设的采样率α(即,第一声道的时域信号的采样率的一例),对该音频信号进行采样处理,以生成左声道的时域信号(即,第一声道的时域信号的一例,以下,为了便于理解和区分,记做时域信号#L)。并且,在本发明实施例中,该获取时域信号#L的过程可以与现有技术相似,这里,为了避免赘述,省略其详细说明。
在本发明实施例中,第一声道的时域信号的采样率与第二声道的时域信号的采样率相同,因此,类似地,编码端设备可以通过例如,与右声道相对 应的麦克风等音频输入设备获取与右声道相对应的音频信号,并根据上述采样率α,对该音频信号进行采样处理,以生成右声道的时域信号(即,第二声道的时域信号的一例,以下,为了便于理解和区分,记做时域信号#R)。
需要说明的是,在本发明实施例中,时域信号#L与时域信号#R是对应同一时段的时域信号(或者说,在同一时段内获取的时域信号),例如,该时域信号#L与时域信号#R可以是对应同一帧(即,20ms)的时域信号,此情况下,基于时域信号#L与时域信号#R能够获得与该一帧信号相对应的一个ITD参数。
再例如,该时域信号#L与时域信号#R也可以是对应同一帧内的同一子帧(即,10ms或5ms等)的时域信号,此情况下,基于时域信号#L与时域信号#R能够获得与该一帧信号相对应的多个ITD参数,例如,如果该时域信号#L与时域信号#R所对应的子帧为10ms,则通过该一帧(即,20ms)信号能够获得两个ITD参数。再例如,如果该时域信号#L与时域信号#R所对应的子帧为5ms,则通过该一帧(即,20ms)信号能够获得四个ITD参数。
应理解,以上列举的时域信号#L与时域信号#R所对应的时段的长度仅为示例性说明,本发明并未限定于此,可以根据需要任意变更该时段的长度。
其后,编码端设备可以根据如上所述确定的目标搜索步长(即,Lt),通过以下步骤对上述时域信号#L与时域信号#R进行搜索处理,即:
步骤1.编码端设备可以设置i=0;
步骤2.编码端设备可以根据以下式1确定时域信号#L相对于时域信号#R的互相关函数cn(i),并根据以下式2确定时域信号#R相对于时域信号#L的互相关函数cp(i),即:
Figure PCTCN2015095090-appb-000006
  式1
Figure PCTCN2015095090-appb-000007
  式2
其中,xR(j)表示时域信号#R在第j个采样点处的信号值,xL(j+i)表示时域信号#L在第j+i个采样点处的信号值,xL(j)表示时域信号#L在第j个采样点处的信号值,xR(j+i)表示时域信号#R在第j+i个采样点处的信号值,Length表示时域信号#R及时域信号#L包括的采样点的总数量,或者说,时域信号#R及时域信号#L的长度,例如,可以为一个帧的长度(即,20ms) 或一个子帧的长度(例如,10ms或5ms等);
步骤3.编码端设备可以令i=i+Lt,并在i∈[0,Tmax]的范围内重复步骤2,
其中,Tmax表示ITD参数的极限值(或者说,左时域信号#L与时域信号#R之间的获取时间差的最大值)可以根据上述采样率α确定,并且,其确定方法可以与现有技术相似,这里为了避免赘述,省略其详细说明;
步骤4.编码端设备可以计算在以目标搜索步长(即,Lt)对时域信号#R和时域信号#L进行搜索处理时所确定的时域信号#L相对于时域信号#R的互相关函数cn(i)的最大值
Figure PCTCN2015095090-appb-000008
并且,编码端设备可以计算在以目标搜索步长(即,Lt)对时域信号#R和时域信号#L进行搜索处理时所确定的时域信号#R相对于时域信号#L的互相关函数(cp(i))的最大值
Figure PCTCN2015095090-appb-000009
其中,编码端设备可以对
Figure PCTCN2015095090-appb-000010
Figure PCTCN2015095090-appb-000011
进行比较,并根据比较结果,确定ITD参数。
例如,如果
Figure PCTCN2015095090-appb-000012
则编码端设备可以将
Figure PCTCN2015095090-appb-000013
所对应的索引值作为ITD参数。
再例如,如果
Figure PCTCN2015095090-appb-000014
则编码端设备可以将
Figure PCTCN2015095090-appb-000015
所对应的索引值的相反数作为ITD参数。
其中,Tmax表示ITD参数的极限值(或者说,时域信号#L与时域信号#R之间的获取时间差的最大值)可以根据上述采样率α确定,并且,其确定方法可以与现有技术相似,这里为了避免赘述,省略其详细说明。
方式2
编码端设备可以对上述时域信号#L进行时频变换处理以获得左声道的频域信号(即,第一声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#L)。可以对时域信号#R进行时频变换处理以获得右声道的频域信号(即,第二声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#R)
例如,在本发明实施例中,可以采用快速傅氏变换(FFT,Fast Fourier  Transformation)技术,基于以下式3,进行时频变换处理。
Figure PCTCN2015095090-appb-000016
  式3
其中,X(k)表示频域信号,FFT_LENGTH表示时频变换长度。x(n)表示时域信号(即,时域信号#L或时域信号#R),Length表示时域信号包括的采样点的总数量。
应理解,以上列举的时频变换处理的过程仅为示例性说明,本发明并不限定于此,该视频变换处理的方法和过程可以与现有技术相似,例如,还可以采用修正离散余弦变换(MDCT,Modified Discrete Cosine Transform)等技术。
其后,可以编码端设备可以根据如上所述确定的目标搜索步长(即,Lt),通过以下步骤对上述频域信号#L与频域信号#R进行搜索处理,即:
步骤a,编码端设备可以根据预设的带宽A,将频域信号的FFT_LENGTH个频点划分为Nsubband个(例如,1个)子带,其中,对于第k个子带Ak,其包含的频点为Ak-1≤b≤Ak-1;
步骤b,设置j=-Tmax
步骤c,根据以下式4计算频域信号#L与频域信号#R的相关函数mag(j)
Figure PCTCN2015095090-appb-000017
  式4
其中,XL(b)表示频域信号#L在第b个频点的信号值,XR(b)表示频域信号#R在第b个频点的信号值,FFT_LENGTH表示时频变换长度。
步骤d,编码端设备可以令j=j+Lt,并在j∈[-Tmax,Tmax]的范围内重复步骤c,
其中,Tmax表示ITD参数的极限值(或者说,左时域信号#L与时域信号#R之间的获取时间差的最大值)可以根据上述采样率α确定,并且,其确定方法可以与现有技术相似,这里为了避免赘述,省略其详细说明
从而,编码端设备可以确定第k个子带的ITD参数值为
Figure PCTCN2015095090-appb-000018
即mag(j)的最大值对应的索引值。
由此,可以得到左声道与右声道之间的一个或多个(根据如上所述确定 的子带的数量相对应)ITD参数值。
其后,编码端设备还可以对上述ITD参数值进行量化处理等,并将处理后的ITD参数值以及单声道信号(例如,上述时域信号#L、时域信号#R、频域信号#L或频域信号#R)发送给解码端设备(或者说,接收端设备)。
解码端设备可以根据单声道音频信号和ITD参数值,恢复出立体声音频信号。
情况2
该至少两个搜索复杂度与至少两个搜索范围一一对应,该至少两个搜索复杂度包括第三搜索复杂度和第四搜索复杂度,该至少两个搜索范围包括第一搜索范围和第二搜索范围,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,该第三搜索复杂度高于该第四搜索复杂度,以及
该根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:
确定与该目标搜索复杂度相对应的目标搜索范围;
在该目标搜索范围上,对该第一声道的信号及该第二声道的信号进行搜索处理。
具体地说,在本发明实施例中,上述M种搜索复杂度(即,M,M-1,…,1),可以与M个搜索范围(记做:FM,FM-1,FM-2,…,F1,其中,FM>FM-1>FM-2>…>F1)一一对应,即:
例如,搜索范围FM所对应的搜索复杂度为M,即,如果如上所述确定的目标搜索复杂度为M,则可以将该搜索复杂度M所对应的搜索范围FM设定为目标搜索范围。
再例如,搜索范围FM-1所对应的搜索复杂度为M-1,即,如果如上所述确定的目标搜索复杂度为M-1,则可以将该搜索复杂度M-1所对应的搜索范围FM-1设定为目标搜索范围。
再例如,搜索范围FM-2所对应的搜索复杂度为M-2,即,如果如上所述确定的目标搜索复杂度为M-2,则可以将该搜索复杂度M-2所对应的搜索范围FM-2设定为目标搜索范围。
再例如,搜索范围F2所对应的搜索复杂度为2,即,如果如上所述确定的目标搜索复杂度为2,则可以将该搜索复杂度2所对应的搜索范围F2设定 为目标搜索范围。
再例如,搜索范围F1所对应的搜索复杂度为1,即,如果如上所述确定的目标搜索复杂度为1,则可以将该搜索复杂度1所对应的搜索范围F1设定为目标搜索范围。
需要说明的是,在本发明实施例中,上述搜索范围FM,FM-1,FM-2,…,F1可以均为时域上的搜索范围,或者上述搜索范围FM,FM-1,FM-2,…,F1也可以均为频域上的搜索范围,本发明并未特别限定。
在本发明实施例中,可以将与搜索复杂度最高的频域上的搜索范围FM,确定为[-Tmax,Tmax]。
下面,对确定其他搜索复杂度时在频域上所对应的搜索范围的过程进行详细说明。
该确定与该目标搜索复杂度相对应的目标搜索范围,包括:
根据该第一声道的时域信号及该第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号是对应于同一时段的时域信号;
根据该目标搜索复杂度、该基准参数和极限值Tmax,确定该目标搜索范围,其中,该极限值Tmax是根据该时域信号的采样率确定的,该目标搜索范围属于[-Tmax,0],或该目标搜索范围属于[0,Tmax]。
具体地说,编码端设备可以根据该时域信号#L和时域信号#R,确定基准参数。其中,该基准参数可以与该时域信号#L和时域信号#R获取顺序(例如,输入至上述音频输入设备的先后顺序)相对应,随后,结合该基准参数的确定过程,对该对应关系进行详细说明。
在本发明实施例中,可以通过对时域信号#L和时域信号#R进行互相关处理来确定该基准参数(即,方式X),也可以通过搜索时域信号#L和时域信号#R的幅度最大值来确定该基准参数(即,方式Y),下面,分别对该方式X和方式Y进行详细说明。
方式X
可选地,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以 确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;
根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
具体地说,在本发明实施例中,编码端设备可以根据以下式5确定时域信号#L相对于时域信号#R的互相关函数cn(i),即:
Figure PCTCN2015095090-appb-000019
  式5
其中,Tmax表示ITD参数的极限值(或者说,时域信号#L与时域信号#R之间的获取时间差的最大值)可以根据上述采样率α确定,并且,其确定方法可以与现有技术相似,这里为了避免赘述,省略其详细说明。xR(j)表示时域信号#R在第j个采样点处的信号值,xL(j+i)表示时域信号#L在第j+i个采样点处的信号值,Length表示时域信号#R包括的采样点的总数量,或者说,时域信号#R的长度,例如,可以为一个帧的长度(即,20ms)或一个子帧的长度(例如,10ms或5ms等)。
并且,编码端设备可以确定该互相关函数cn(i)的最大值
Figure PCTCN2015095090-appb-000020
类似地,编码端设备可以根据以下式6确定时域信号#R相对于时域信号#L的互相关函数cp(i),即:
Figure PCTCN2015095090-appb-000021
  式6
并且,编码端设备可以确定该互相关函数cp(i)的最大值
Figure PCTCN2015095090-appb-000022
在本发明实施例中,编码端设备可以根据
Figure PCTCN2015095090-appb-000023
Figure PCTCN2015095090-appb-000024
之间的关系,通过以下方式X1或方式X2确定基准参数的值。
方式X1
如图2所示,如果
Figure PCTCN2015095090-appb-000025
则编码端设备可以确定时域信号#L是先于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下,可以将基准参数T置为1。
从而,在其后的判定过程中,编码端设备可以判定该基准参数大于0,从而确定搜索范围为[0,Tmax],即,当时域信号#L是先于时域信号#R获取时,ITD参数为正数,搜索范围为[0,Tmax](即,搜索范围属于[0,Tmax]的一例)。
或者,如果
Figure PCTCN2015095090-appb-000026
则编码端设备可以确定时域信号#L是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为0。
从而,在其后的判定过程中,编码端设备可以判定该基准参数不大于0,从而确定搜索范围为[-Tmax,0],即,当时域信号#L是后于时域信号#R获取时,ITD参数为负数,搜索范围为[-Tmax,0](即,搜索范围属于[-Tmax,0]的一例)。
从而,在包括两种或两种以上搜索复杂度时,能够从上述[-Tmax,0]和[0,Tmax]中确定出搜索复杂度为普通(M=2)时的频域上的搜索范围F2
方式X2
可选地,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者索引值的相反数。
具体地说,如图3所示,如果
Figure PCTCN2015095090-appb-000027
则编码端设备可以确定时域信号#L是先于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下,可以将基准参数T置为
Figure PCTCN2015095090-appb-000028
所对应的索引值。
从而,在其后的判定过程中,编码端设备在判定基准参数T大于0之后,可以进一步判定该基准参数T是否大于或等于Tmax/2,并根据判定结果确定搜索范围,例如,当T≥Tmax/2时,搜索范围为[Tmax/2,Tmax](即,搜索范围属于[0,Tmax]的一例)。当T<Tmax/2时,搜索范围为[0,Tmax/2](即,搜索范围属于[0,Tmax]的另一例)。
或者,如果
Figure PCTCN2015095090-appb-000029
则编码端设备可以确定时域信号#L 是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为
Figure PCTCN2015095090-appb-000030
所对应的索引值的相反数。
从而,在其后的判定过程中,编码端设备在判定基准参数T小于或等于0之后,可以进一步判定该基准参数T是否小于于或等于-Tmax/2,并根据判定结果确定搜索范围,例如,当T≤-Tmax/2时,搜索范围为[-Tmax,-Tmax/2](即,搜索范围属于[-Tmax,0]的一例)。当T>-Tmax/2时,搜索范围为[-Tmax/2,0](即,搜索范围属于[-Tmax,0]的另一例)。
从而,在包括三种或三种以上搜索复杂度时,能够从上述[-Tmax,-Tmax/2]、[-Tmax/2,0]、[0,Tmax/2]和[Tmax/2,Tmax]中确定出搜索复杂度为最低(M=1)时的频域上的搜索范围F3
方式Y
可选地,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;
根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
具体地说,在本发明实施例中,编码端设备可以检测时域信号#L的幅度值(记做:L(j))最大值max(L(j)),j∈[0,Length-1],并记录该max(L(j))所对应的索引值pleft,其中,Length表示时域信号#L包括的采样点的总数量。
并且,编码端设备可以检测时域信号#R的幅度值(记做:R(j))最大值max(R(j)),j∈[0,Length-1],并记录该max(R(j))所对应的索引值pright,其中,Length表示时域信号#R包括的采样点的总数量。
其后,编码端设备可以判定pleft与pright之间的大小关系。
如图4所示,如果pleft≥pright,则编码端设备可以确定时域信号#L是先 于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下,可以将基准参数T置为1。
从而,在其后的判定过程中,编码端设备可以判定该基准参数大于0,从而确定搜索范围为[0,Tmax],即,当时域信号#L是先于时域信号#R获取时,ITD参数为正数,搜索范围为[0,Tmax](即,搜索范围属于[0,Tmax]的一例)。
或者,如果pleft<pright,则编码端设备可以确定时域信号#L是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为0。
从而,在其后的判定过程中,编码端设备可以判定该基准参数不大于0,从而确定搜索范围为[-Tmax,0],即,当时域信号#L是后于时域信号#R获取时,ITD参数为负数,搜索范围为[-Tmax,0](即,搜索范围属于[-Tmax,0]的一例)。
从而,在包括两种或两种以上搜索复杂度时,能够从上述[-Tmax,0]和[0,Tmax]中确定出搜索复杂度为普通(M=2)时的频域上的搜索范围F2
应理解,以上列举的确定搜索范围的方法和搜索范围距具体数值仅为示例性说明,本发明并不限定于此,可以根据需要任意确定,只要确保FM<FM-1<FM-2<…<F1即可
编码端设备可以对时域信号#L进行时频变换处理以获得左声道的频域信号(即,第一声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#L)。可以对时域信号#R进行时频变换处理以获得右声道的频域信号(即,第二声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#R)
例如,在本发明实施例中,可以采用快速傅氏变换(FFT,Fast FourierTransformation)技术,基于以下式7,进行时频变换处理。
Figure PCTCN2015095090-appb-000031
  式7
其中,X(k)表示频域信号,FFT_LENGTH表示时频变换长度。x(n)表示 时域信号(即,时域信号#L或时域信号#R),Length表示时域信号包括的采样点的总数量。
应理解,以上列举的时频变换处理的过程仅为示例性说明,本发明并不限定于此,该视频变换处理的方法和过程可以与现有技术相似,例如,还可以采用修正离散余弦变换(MDCT,Modified Discrete Cosine Transform)等技术。
从而,编码端设备可以在如上所述确定的搜索范围内,对如上所述确定的频域信号#L和频域信号#R进行搜索处理,以确定左声道与右声道之间的ITD参数,例如,可以列举以下搜索处理的过程:
首先,编码端设备可以根据预设的带宽A,将频域信号的FFT_LENGTH个频点划分为Nsubband个(例如,1个)子带,其中,对于第k个子带Ak,其包含的频点为Ak-1≤b≤Ak-1,
在上述搜索范围内,根据以下式8计算频域信号#L的相关函数mag(j)
Figure PCTCN2015095090-appb-000032
  式8
其中,XL(b)表示频域信号#L在第b个频点的信号值,XR(b)表示频域信号#R在第b个频点的信号值,FFT_LENGTH表示时频变换长度,j的取值范围是如上所述确定的搜索范围,为了便于理解和说明,将该搜索范围记做[a,b]。
则第k个子带的ITD参数值为
Figure PCTCN2015095090-appb-000033
即mag(j)的最大值对应的索引值。
由此,可以得到左声道与右声道之间的一个或多个(根据如上所述确定的子带的数量相对应)ITD参数值。
其后,编码端设备还可以对上述ITD参数值进行量化处理等,并将处理后的ITD参数值以及对左右声道的信号进行例如下混处理而得到的单声道信号发送给解码端设备(或者说,接收端设备)。
解码端设备可以根据单声道音频信号和ITD参数值,恢复出立体声音频信号。
可选地,该方法还包括:
基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
具体地说,在本发明实施例中,在对ITD参数值进行量化处理等之前,编码端设备还可以对如上所述或缺的ITD参数值进行平滑处理,作为示例而非限定,编码端设备可以根据以下式5进行该平滑处理:
Tsm(k)=w1*Tsm [-1](k)+w2*T(k)  式5
其中,Tsm(k)表示第k个帧或第k个子帧所对应的平滑处理后的ITD参数值,Tsm [-1]表示第k-1个帧或第k-1个子帧所对应的平滑处理后的ITD参数值,T(k)表示第k个帧或第k个子帧所对应的未经平滑处理的ITD参数值,w1、w2为平滑因子,w1、w2可以设置为常数,或者w1、w2也可以根据Tsm [-1]和T(k)的差值设置,只要满足w1+w2=1即可。另外,当k=1时,Tsm [-1]可以为预设的数值。
需要说明的是,在本发明实施例的确定声道间时间差参数的方法中,上述平滑处理可以由编码端设备执行,也可以由解码端设备执行,本发明并未特别限定,即,编码端设备也可以不进行上述平滑处理而将如上所述获得的ITD参数值直接发送给解码端设备,并由解码端设备对该ITD参数值进行平滑处理,并且,该解码端设备所进行的平滑处理的方法和过程可以与上述解码端设备所进行的平滑处理的方法和过程相似,这里,为了避免赘述,省略其详细说明。
根据本发明实施例的确定声道间时间差参数的方法,通过从至少两个搜索复杂度中确定与当前信道质量相对应的目标搜索复杂度,并根据该目标搜索复杂度对对第一声道的信号及第二声道的信号进行搜索处理,能够使所确定的ITD参数的精度与信道质量相适应,从而,在当前信道质量较差的情况下,可以通过目标搜索复杂度,降低搜索处理的复杂度或计算量,进而,能够支持对计算资源的节约及对处理效率的提高。
上文中,结合图1至图4,详细描述了根据本发明实施例的确定声道间时间差参数的方法,下面,将结合图5,详细描述根据本发明实施例的根据本发明实施例的确定声道间时间差参数的装置。
图5示出了根据本发明实施例的确定声道间时间差参数的装置200的示意性框图。如图5所示,该装置200包括:
确定单元210,用于从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应;
处理单元220,用于根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
可选地,该确定单元210具体用于获取针对立体声信号的编码参数,该立体声信号是基于该第一声道的信号及该第二声道的信号生成的,该编码参数是根据当前的信道质量值确定的,该编码参数包括以下任一参数:编码比特率、编码比特数或用于指示该搜索复杂度的复杂度控制参数;用于根据该编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
可选地,该至少两个搜索复杂度与至少两个搜索步长一一对应,该至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,该至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,该第一搜索复杂度高于该第二搜索复杂度,以及该处理单元220具体用于确定与该目标搜索复杂度相对应的目标搜索步长;用于根据该目标搜索步长,对该第一声道的信号及该第二声道的信号进行搜索处理。
可选地,该至少两个搜索复杂度与至少两个搜索范围之间一一对应,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,该第三搜索复杂度高于该第四搜索复杂度,以及该处理单元220具体用于确定与该目标搜索复杂度相对应的目标搜索范围;用于在该目标搜索范围上,对该第一声道的信号及该第二声道的信号进行搜索处理。
可选地,该处理单元220具体用于根据该第一声道的时域信号及该第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;用于根据该目标搜索复杂度、该基准参数和极限值Tmax,确定该目标搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该目标搜索范围属于[-Tmax,0],或该目标搜索范围属于[0,Tmax]。
可选地,该处理单元220具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理 值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;用于根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
可选地,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
可选地,该处理单元220具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;用于根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
可选地,该处理单元220还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
根据本发明实施例的确定声道间时间差参数的装置200作为本发明实施例的确定声道间时间差参数的方法100的实施主体,可对应于本发明实施例的方法中的编码端设备,并且,该确定声道间时间差参数的装置200中的各单元及模块和上述其他操作和/或功能分别为了实现图1中的方法100的相应流程,为了简洁,在此不再赘述。
根据本发明实施例的确定声道间时间差参数的装置,通过从至少两个搜索复杂度中确定与当前信道质量相对应的目标搜索复杂度,并根据该目标搜索复杂度对对第一声道的信号及第二声道的信号进行搜索处理,能够使所确定的ITD参数的精度与信道质量相适应,从而,在当前信道质量较差的情况下,可以通过目标搜索复杂度,降低搜索处理的复杂度或计算量,进而,能够支持对计算资源的节约及对处理效率的提高。
上文中,结合图1至图4,详细描述了根据本发明实施例的确定声道间时间差参数的方法,下面,将结合图6,详细描述根据本发明实施例的确定声道间时间差参数的设备。
图6示出了根据本发明实施例的确定声道间时间差参数的设备300的示 意性框图。如图6所示,该设备300可以包括:
总线310;
与该总线相连的处理器320;
与该总线相连的存储器330;
其中,该处理器320通过该总线310,调用该存储器330中存储的程序,以用于从至少两个搜索复杂度中,确定目标搜索复杂度,其中,该至少两个搜索复杂度与至少两个信道质量值一一相对应;
用于根据该目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
可选地,该处理器320具体用于获取针对立体声信号的编码参数,该立体声信号是基于该第一声道的信号及该第二声道的信号生成的,该编码参数是根据当前的信道质量值确定的,该编码参数包括以下任一参数:编码比特率、编码比特数或用于指示该搜索复杂度的复杂度控制参数;
用于根据该编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
可选地,该至少两个搜索复杂度与至少两个搜索步长一一对应,该至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,该至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,该第一搜索复杂度高于该第二搜索复杂度,以及
该处理器320具体用于确定与该目标搜索复杂度相对应的目标搜索步长;
用于根据该目标搜索步长,对该第一声道的信号及该第二声道的信号进行搜索处理。
可选地,该至少两个搜索复杂度与至少两个搜索范围一一对应,该至少两个搜索复杂度包括第三搜索复杂度和第四搜索复杂度,该至少两个搜索范围包括第一搜索范围和第二搜索范围,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,该第三搜索复杂度高于该第四搜索复杂度,以及
该处理器320具体用于确定与该目标搜索复杂度相对应的目标搜索范围;
用于在该目标搜索范围上,对该第一声道的信号及该第二声道的信号进行搜索处理。
可选地,该处理器320具体用于根据该第一声道的时域信号及该第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;
用于根据该目标搜索复杂度、该基准参数和极限值Tmax,确定该目标搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该目标搜索范围属于[-Tmax,0],或该目标搜索范围属于[0,Tmax]。
可选地,该处理器320具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;
用于根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
可选地,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
可选地,该处理器320具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;
用于根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
可选地,该处理器320还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
在本发明实施例中,设备300的各个组件通过总线310耦合在一起,其中,总线310除包括数据总线之外,还包括电源总线、控制总线和状态信号 总线。但是为了清楚明起见,在图中将各种总线都标为总线310。
处理器320可以实现或者执行本发明方法实施例中的公开的各步骤及逻辑框图。处理器320可以是微处理器或者该处理器也可以是任何常规的处理器,解码器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用解码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器330,处理器读取存储器330中的信息,结合其硬件完成上述方法的步骤。
应理解,在本发明实施例中,该处理器320可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器320还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器330可以包括只读存储器和随机存取存储器,并向处理器320提供指令和数据。存储器330的一部分还可以包括非易失性随机存取存储器。例如,存储器330还可以存储设备类型的信息。
在实现过程中,上述方法的各步骤可以通过处理器320中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。
根据本发明实施例的确定声道间时间差参数的设备300作为本发明实施例的确定声道间时间差参数的方法100的实施主体,可对应于本发明实施例的方法中的编码端设备,并且,该确定声道间时间差参数的设备300中的各单元及模块和上述其他操作和/或功能分别为了实现图1中的方法100的相应流程,为了简洁,在此不再赘述。
根据本发明实施例的确定声道间时间差参数的设备,通过从至少两个搜索复杂度中确定与当前信道质量相对应的目标搜索复杂度,并根据该目标搜索复杂度对对第一声道的信号及第二声道的信号进行搜索处理,能够使所确定的ITD参数的精度与信道质量相适应,从而,在当前信道质量较差的情况 下,可以通过目标搜索复杂度,降低搜索处理的复杂度或计算量,进而,能够支持对计算资源的节约及对处理效率的提高。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种确定声道间时间差参数的方法,其特征在于,所述方法包括:
    从至少两个搜索复杂度中,确定目标搜索复杂度,其中,所述至少两个搜索复杂度与至少两个信道质量值一一相对应;
    根据所述目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与所述第一声道及所述第二声道相对应的第一声道间时间差ITD参数。
  2. 根据权利要求1所述的方法,其特征在于,所述从至少两个搜索复杂度中,确定目标搜索复杂度,包括:
    获取针对立体声信号的编码参数,所述立体声信号是基于所述第一声道的信号及所述第二声道的信号生成的,所述编码参数是根据当前的信道质量值确定的,所述编码参数包括以下任一参数:编码比特率、编码比特数或用于指示所述搜索复杂度的复杂度控制参数;
    根据所述编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
  3. 根据权利要求1或2所述的方法,其特征在于,所述至少两个搜索复杂度与至少两个搜索步长一一对应,所述至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,所述至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,所述第一搜索复杂度高于所述第二搜索复杂度,以及
    所述根据所述目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:
    确定与所述目标搜索复杂度相对应的目标搜索步长;
    根据所述目标搜索步长,对所述第一声道的信号及所述第二声道的信号进行搜索处理。
  4. 根据权利要求1或2所述的方法,其特征在于,所述至少两个搜索复杂度与至少两个搜索范围一一对应,所述至少两个搜索复杂度包括第三搜索复杂度和第四搜索复杂度,所述至少两个搜索范围包括第一搜索范围和第二搜索范围,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,所述第三搜索复杂度高于所述第四搜索复杂度,以及
    所述根据所述目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,包括:
    确定与所述目标搜索复杂度相对应的目标搜索范围;
    在所述目标搜索范围上,对所述第一声道的信号及所述第二声道的信号进行搜索处理。
  5. 根据权利要求4所述的方法,其特征在于,所述确定与所述目标搜索复杂度相对应的目标搜索范围,包括:
    根据所述第一声道的时域信号及所述第二声道的时域信号,确定基准参数,所述基准参数对应于所述第一声道的时域信号与所述第二声道的时域信号之间的获取顺序,其中,所述第一声道的时域信号及所述第二声道的时域信号对应于同一时段;
    根据所述目标搜索复杂度、所述基准参数和极限值Tmax,确定所述目标搜索范围,其中,所述极限值Tmax是根据所述第一声道的时域信号的采样率确定的,所述目标搜索范围属于[-Tmax,0],或所述目标搜索范围属于[0,Tmax]。
  6. 根据权利要求5所述的方法,其特征在于,所述根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
    对所述第一声道的时域信号及所述第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,所述第一互相关处理值是所述第一声道的时域信号相对于所述第二声道的时域信号的互相关函数在预设范围内的最大函数值,所述第二互相关处理值是所述第二声道的时域信号相对于所述第一声道的时域信号的互相关函数在所述预设范围内的最大函数值;
    根据所述第一互相关处理值及所述第二互相关处理值之间的大小关系,确定所述基准参数。
  7. 根据权利要求6所述的方法,其特征在于,所述基准参数是所述第一互相关处理值及所述第二互相关处理值中较大一方所对应的索引值或者所述索引值的相反数。
  8. 根据权利要求5所述的方法,其特征在于,所述根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
    对所述第一声道的时域信号及所述第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,所述第一索引值是与所述第 一声道的时域信号在预设范围内的最大幅度值相对应的索引值,所述第二索引值是与所述第二声道的时域信号在所述预设范围内的最大幅度值相对应的索引值;
    根据所述第一索引值与所述第二索引值之间的大小关系,确定所述基准参数。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    基于第二ITD参数,对所述第一ITD参数进行平滑处理,其中,所述第一ITD参数是第一时段的ITD参数,所述第二ITD参数是第二时段的ITD参数的平滑值,所述第二时段处于所述第一时段之前。
  10. 一种确定声道间时间差参数的装置,其特征在于,所述装置包括:
    确定单元,用于从至少两个搜索复杂度中,确定目标搜索复杂度,其中,所述至少两个搜索复杂度与至少两个信道质量值一一相对应;
    处理单元,用于根据所述目标搜索复杂度,对第一声道的信号及第二声道的信号进行搜索处理,以确定与所述第一声道及所述第二声道相对应的第一声道间时间差ITD参数。
  11. 根据权利要求10所述的装置,其特征在于,所述确定单元具体用于获取针对立体声信号的编码参数,所述立体声信号是基于所述第一声道的信号及所述第二声道的信号生成的,所述编码参数是根据当前的信道质量值确定的,所述编码参数包括以下任一参数:编码比特率、编码比特数或用于指示所述搜索复杂度的复杂度控制参数;
    用于根据所述编码参数,从至少两个搜索复杂度中,确定目标搜索复杂度。
  12. 根据权利要求10或11所述的装置,其特征在于,所述至少两个搜索复杂度与至少两个搜索步长一一对应,所述至少两个搜索复杂度包括第一搜索复杂度和第二搜索复杂度,所述至少两个搜索步长包括第一搜索步长和第二搜索步长,其中,与第一搜索复杂度相对应的第一搜索步长小于与第二搜索复杂度相对应的第二搜索步长,所述第一搜索复杂度高于所述第二搜索复杂度,以及
    所述处理单元具体用于确定与所述目标搜索复杂度相对应的目标搜索步长;
    用于根据所述目标搜索步长,对所述第一声道的信号及所述第二声道的信号进行搜索处理。
  13. 根据权利要求10或11所述的装置,其特征在于,所述至少两个搜索复杂度与至少两个搜索范围之间一一对应,其中,与第三搜索复杂度相对应的第一搜索范围大于与第四搜索复杂度相对应的第二搜索范围,所述第三搜索复杂度高于所述第四搜索复杂度,以及
    所述处理单元具体用于确定与所述目标搜索复杂度相对应的目标搜索范围;
    用于在所述目标搜索范围上,对所述第一声道的信号及所述第二声道的信号进行搜索处理。
  14. 根据权利要求13所述的装置,其特征在于,所述处理单元具体用于根据所述第一声道的时域信号及所述第二声道的时域信号,确定基准参数,所述基准参数对应于所述第一声道的时域信号与所述第二声道的时域信号之间的获取顺序,其中,所述第一声道的时域信号及所述第二声道的时域信号对应于同一时段;
    根据所述目标搜索复杂度、所述基准参数和极限值Tmax,确定所述目标搜索范围,其中,所述极限值Tmax是根据所述第一声道的时域信号的采样率确定的,所述目标搜索范围属于[-Tmax,0],或所述目标搜索范围属于[0,Tmax]。
  15. 根据权利要求14所述的装置,其特征在于,所述处理单元具体用于对所述第一声道的时域信号及所述第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,所述第一互相关处理值是所述第一声道的时域信号相对于所述第二声道的时域信号的互相关函数在预设范围内的最大函数值,所述第二互相关处理值是所述第二声道的时域信号相对于所述第一声道的时域信号的互相关函数在所述预设范围内的最大函数值;
    用于根据所述第一互相关处理值及所述第二互相关处理值之间的大小关系,确定所述基准参数。
  16. 根据权利要求15所述的装置,其特征在于,所述基准参数是所述第一互相关处理值及所述第二互相关处理值中较大一方所对应的索引值或者所述索引值的相反数。
  17. 根据权利要求14所述的装置,其特征在于,所述处理单元具体用 于对所述第一声道的时域信号及所述第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,所述第一索引值是与所述第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,所述第二索引值是与所述第二声道的时域信号在所述预设范围内的最大幅度值相对应的索引值;
    用于根据所述第一索引值与所述第二索引值之间的大小关系,确定所述基准参数。
  18. 根据权利要求10至17中任一项所述的装置,其特征在于,所述处理单元还用于基于第二ITD参数,对所述第一ITD参数进行平滑处理,其中,所述第一ITD参数是第一时段的ITD参数,所述第二ITD参数是第二时段的ITD参数的平滑值,所述第二时段处于所述第一时段之前。
PCT/CN2015/095090 2015-03-09 2015-11-20 确定声道间时间差参数的方法和装置 WO2016141731A1 (zh)

Priority Applications (10)

Application Number Priority Date Filing Date Title
MX2017011466A MX2017011466A (es) 2015-03-09 2015-11-20 Metodo y aparato para determinar parametro de diferencia de tiempo inter-canal.
KR1020177025506A KR20170116132A (ko) 2015-03-09 2015-11-20 채널 간 시차 파라미터를 결정하는 방법 및 장치
CA2977843A CA2977843A1 (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
JP2017547578A JP2018508047A (ja) 2015-03-09 2015-11-20 チャネル間時間差パラメータを決定するための方法および装置
SG11201706997PA SG11201706997PA (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
EP15884409.2A EP3255632B1 (en) 2015-03-09 2015-11-20 Method and apparatus for determining time difference parameter among sound channels
BR112017018819-8A BR112017018819A2 (zh) 2015-03-09 2015-11-20 Method and apparatus for determining the time difference between the channel parameters
RU2017134756A RU2682026C1 (ru) 2015-03-09 2015-11-20 Способ и устройство для определения параметра межканальной разности времени
AU2015385489A AU2015385489B2 (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
US15/696,716 US10388288B2 (en) 2015-03-09 2017-09-06 Method and apparatus for determining inter-channel time difference parameter

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510103379.3 2015-03-09
CN201510103379.3A CN106033672B (zh) 2015-03-09 2015-03-09 确定声道间时间差参数的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/696,716 Continuation US10388288B2 (en) 2015-03-09 2017-09-06 Method and apparatus for determining inter-channel time difference parameter

Publications (1)

Publication Number Publication Date
WO2016141731A1 true WO2016141731A1 (zh) 2016-09-15

Family

ID=56879889

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095090 WO2016141731A1 (zh) 2015-03-09 2015-11-20 确定声道间时间差参数的方法和装置

Country Status (12)

Country Link
US (1) US10388288B2 (zh)
EP (1) EP3255632B1 (zh)
JP (1) JP2018508047A (zh)
KR (1) KR20170116132A (zh)
CN (1) CN106033672B (zh)
AU (1) AU2015385489B2 (zh)
BR (1) BR112017018819A2 (zh)
CA (1) CA2977843A1 (zh)
MX (1) MX2017011466A (zh)
RU (1) RU2682026C1 (zh)
SG (1) SG11201706997PA (zh)
WO (1) WO2016141731A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3252756A4 (en) * 2015-03-09 2017-12-13 Huawei Technologies Co., Ltd. Method and device for determining inter-channel time difference parameter
TWI666630B (zh) * 2017-06-29 2019-07-21 大陸商華為技術有限公司 時延估計方法及裝置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210072736A (ko) * 2018-10-08 2021-06-17 돌비 레버러토리즈 라이쎈싱 코오포레이션 인코딩 및 디코딩 동작을 단순화하기 위해 상이한 포맷으로 캡처된 오디오 신호들을 축소된 수의 포맷으로 변환하는 것

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1273663A (zh) * 1998-05-26 2000-11-15 皇家菲利浦电子有限公司 具有改进的语音编码器的传输系统
CN1288557A (zh) * 1998-01-21 2001-03-21 诺基亚移动电话有限公司 解码方法和包括自适应后置滤波器的系统
CN1820306A (zh) * 2003-05-01 2006-08-16 诺基亚有限公司 可变比特率宽带语音编码中增益量化的方法和装置
CN101073109A (zh) * 2004-09-30 2007-11-14 艾利森电话股份有限公司 用于编解码器选择中自适应门限的方法及装置
US8077893B2 (en) * 2007-05-31 2011-12-13 Ecole Polytechnique Federale De Lausanne Distributed audio coding for wireless hearing aids
US8948891B2 (en) * 2009-08-12 2015-02-03 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding multi-channel audio signal by using semantic information

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0669811A (ja) * 1992-08-21 1994-03-11 Oki Electric Ind Co Ltd 符号化回路及び復号化回路
WO2003107591A1 (en) * 2002-06-14 2003-12-24 Nokia Corporation Enhanced error concealment for spatial audio
GB2453117B (en) 2007-09-25 2012-05-23 Motorola Mobility Inc Apparatus and method for encoding a multi channel audio signal
US20100290629A1 (en) * 2007-12-21 2010-11-18 Panasonic Corporation Stereo signal converter, stereo signal inverter, and method therefor
KR20100009981A (ko) * 2008-07-21 2010-01-29 성균관대학교산학협력단 첫번째 다중 경로 성분에서의 동기화를 통한 초광대역 무선 통신 수신기에서의 동기화 방법 및 이를 이용한 초광대역 무선 통신 수신기
WO2010037427A1 (en) * 2008-10-03 2010-04-08 Nokia Corporation Apparatus for binaural audio coding
CN101408615B (zh) * 2008-11-26 2011-11-30 武汉大学 双耳时间差itd临界感知特性的测量方法及其装置
CN101533641B (zh) * 2009-04-20 2011-07-20 华为技术有限公司 对多声道信号的声道延迟参数进行修正的方法和装置
CN102307323B (zh) * 2009-04-20 2013-12-18 华为技术有限公司 对多声道信号的声道延迟参数进行修正的方法
CN102422347B (zh) * 2009-05-20 2013-07-03 松下电器产业株式会社 编码装置、解码装置及编码和解码方法
US8463414B2 (en) * 2010-08-09 2013-06-11 Motorola Mobility Llc Method and apparatus for estimating a parameter for low bit rate stereo transmission
US9424852B2 (en) * 2011-02-02 2016-08-23 Telefonaktiebolaget Lm Ericsson (Publ) Determining the inter-channel time difference of a multi-channel audio signal
EP3182409B1 (en) * 2011-02-03 2018-03-14 Telefonaktiebolaget LM Ericsson (publ) Determining the inter-channel time difference of a multi-channel audio signal
JP5947971B2 (ja) * 2012-04-05 2016-07-06 華為技術有限公司Huawei Technologies Co.,Ltd. マルチチャネルオーディオ信号の符号化パラメータを決定する方法及びマルチチャネルオーディオエンコーダ
WO2013149671A1 (en) * 2012-04-05 2013-10-10 Huawei Technologies Co., Ltd. Multi-channel audio encoder and method for encoding a multi-channel audio signal
CN103534753B (zh) * 2012-04-05 2015-05-27 华为技术有限公司 用于信道间差估计的方法和空间音频编码装置
US9659569B2 (en) * 2013-04-26 2017-05-23 Nokia Technologies Oy Audio signal encoder
CN106033671B (zh) * 2015-03-09 2020-11-06 华为技术有限公司 确定声道间时间差参数的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1288557A (zh) * 1998-01-21 2001-03-21 诺基亚移动电话有限公司 解码方法和包括自适应后置滤波器的系统
CN1273663A (zh) * 1998-05-26 2000-11-15 皇家菲利浦电子有限公司 具有改进的语音编码器的传输系统
CN1820306A (zh) * 2003-05-01 2006-08-16 诺基亚有限公司 可变比特率宽带语音编码中增益量化的方法和装置
CN101073109A (zh) * 2004-09-30 2007-11-14 艾利森电话股份有限公司 用于编解码器选择中自适应门限的方法及装置
US8077893B2 (en) * 2007-05-31 2011-12-13 Ecole Polytechnique Federale De Lausanne Distributed audio coding for wireless hearing aids
US8948891B2 (en) * 2009-08-12 2015-02-03 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding multi-channel audio signal by using semantic information

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3252756A4 (en) * 2015-03-09 2017-12-13 Huawei Technologies Co., Ltd. Method and device for determining inter-channel time difference parameter
US10210873B2 (en) 2015-03-09 2019-02-19 Huawei Technologies Co., Ltd. Method and apparatus for determining inter-channel time difference parameter
TWI666630B (zh) * 2017-06-29 2019-07-21 大陸商華為技術有限公司 時延估計方法及裝置
US11304019B2 (en) 2017-06-29 2022-04-12 Huawei Technologies Co., Ltd. Delay estimation method and apparatus
US11950079B2 (en) 2017-06-29 2024-04-02 Huawei Technologies Co., Ltd. Delay estimation method and apparatus

Also Published As

Publication number Publication date
BR112017018819A2 (zh) 2018-04-24
EP3255632A4 (en) 2017-12-13
AU2015385489A1 (en) 2017-09-28
MX2017011466A (es) 2018-01-11
EP3255632B1 (en) 2020-01-08
RU2682026C1 (ru) 2019-03-14
KR20170116132A (ko) 2017-10-18
US10388288B2 (en) 2019-08-20
CA2977843A1 (en) 2016-09-15
AU2015385489B2 (en) 2019-04-04
CN106033672B (zh) 2021-04-09
US20170365265A1 (en) 2017-12-21
SG11201706997PA (en) 2017-09-28
CN106033672A (zh) 2016-10-19
JP2018508047A (ja) 2018-03-22
EP3255632A1 (en) 2017-12-13

Similar Documents

Publication Publication Date Title
JP7443423B2 (ja) マルチチャネル信号の符号化方法およびエンコーダ
US9479886B2 (en) Scalable downmix design with feedback for object-based surround codec
CN107146627B (zh) 对更高阶高保真度立体声响复制表示进行压缩和解压缩的方法和装置
CN112735447B (zh) 压缩和解压缩高阶高保真度立体声响复制信号表示的方法及装置
WO2016141732A1 (zh) 确定声道间时间差参数的方法和装置
JP2024036349A (ja) 遅延推定方法および遅延推定装置
JP7301154B2 (ja) 音声データの処理方法並びにその、装置、電子機器及びコンピュータプログラム
CN111316353A (zh) 确定空间音频参数编码和相关联的解码
KR101756838B1 (ko) 다채널 오디오 신호를 다운 믹스하는 방법 및 장치
WO2016141731A1 (zh) 确定声道间时间差参数的方法和装置
JP2023510556A (ja) オーディオ符号化および復号方法ならびにオーディオ符号化および復号デバイス
CN113948098A (zh) 一种立体声音频信号时延估计方法及装置
WO2019106221A1 (en) Processing of spatial audio parameters
WO2017202680A1 (en) Method and apparatus for voice or sound activity detection for spatial audio
CN108877815B (zh) 一种立体声信号处理方法及装置
RU2648632C2 (ru) Классификатор многоканального звукового сигнала
WO2017193550A1 (zh) 多声道信号的编码方法和编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15884409

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2977843

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 11201706997P

Country of ref document: SG

REEP Request for entry into the european phase

Ref document number: 2015884409

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: MX/A/2017/011466

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2017547578

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20177025506

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017018819

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2015385489

Country of ref document: AU

Date of ref document: 20151120

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2017134756

Country of ref document: RU

ENP Entry into the national phase

Ref document number: 112017018819

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170901