WO2021000723A1 - 一种立体声编码方法、立体声解码方法和装置 - Google Patents

一种立体声编码方法、立体声解码方法和装置 Download PDF

Info

Publication number
WO2021000723A1
WO2021000723A1 PCT/CN2020/096296 CN2020096296W WO2021000723A1 WO 2021000723 A1 WO2021000723 A1 WO 2021000723A1 CN 2020096296 W CN2020096296 W CN 2020096296W WO 2021000723 A1 WO2021000723 A1 WO 2021000723A1
Authority
WO
WIPO (PCT)
Prior art keywords
pitch period
channel signal
secondary channel
value
pitch
Prior art date
Application number
PCT/CN2020/096296
Other languages
English (en)
French (fr)
Inventor
苏谟特艾雅
高原
王宾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20835190.8A priority Critical patent/EP3975175A4/en
Priority to JP2021577947A priority patent/JP7337966B2/ja
Publication of WO2021000723A1 publication Critical patent/WO2021000723A1/zh
Priority to US17/563,538 priority patent/US20220122619A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1

Definitions

  • This application relates to the field of stereo technology, and in particular to a stereo encoding method, stereo decoding method and device.
  • stereo audio can no longer meet people's demand for high-quality audio.
  • stereo audio has the sense of orientation and distribution of each sound source, which can improve the clarity, intelligibility and sense of presence of information, and is therefore favored by people.
  • the decoding process is performed according to the received code stream to obtain the decoded stereo signal for playback.
  • stereo encoding and decoding techniques such as downmixing the time domain signal into two mono signals at the encoding end.
  • the left and right channels are downmixed into the primary channel signal and the secondary channel signal first.
  • the primary channel signal and the secondary channel signal are respectively encoded using a mono encoding method.
  • For the primary channel signal more bits are usually used for encoding; for the secondary channel signal, encoding is usually not performed.
  • decoding the main channel signal and the secondary channel signal are decoded separately according to the received code stream, and then time-domain upmixing is performed to obtain the decoded stereo signal.
  • the important feature that is different from mono signals is that the sound has sound and image information, which makes the sound more spatial.
  • the accuracy of the secondary channel signal can better reflect the spatial sense of the stereo signal, and the accuracy of the secondary channel coding also plays an important role in the stability of the stereo image.
  • the pitch period is an important parameter for the encoding of the primary channel signal and the secondary channel signal encoding.
  • the accuracy of the predicted value of the pitch period parameter will affect the overall stereo coding quality.
  • the stereo parameters and the main channel signal and the secondary channel signal can be obtained after analyzing the input signal.
  • the encoder usually only encodes the main channel signal, not the secondary channel signal, for example, directly use the pitch period of the main channel signal As the pitch period of the secondary channel signal. Since the secondary channel signal is not decoded, the spatial perception of the decoded stereo signal is poor.
  • the sound image stability is greatly affected by the difference between the pitch period parameters of the main channel signal and the actual secondary channel signal. , Thus reducing the encoding performance of stereo encoding.
  • the decoding performance of stereo decoding is also reduced.
  • the embodiments of the present application provide a stereo coding method, a stereo decoding method and a device, which are used to improve stereo coding and decoding performance.
  • an embodiment of the present application provides a stereo encoding method, including: performing down-mixing processing on the left channel signal of the current frame and the right channel signal of the current frame to obtain the main channel of the current frame Signal and the secondary channel signal of the current frame; when it is determined to differentially encode the pitch period of the secondary channel signal, use the estimated value of the pitch period of the primary channel signal for the secondary channel
  • the pitch period of the signal is differentially coded to obtain the pitch period index value of the secondary channel signal, and the pitch period index value of the secondary channel signal is used to generate a stereo coded stream to be transmitted.
  • the left channel signal of the current frame and the right channel signal of the current frame are first downmixed to obtain the primary channel signal of the current frame and the secondary channel signal of the current frame.
  • differentially encoding the pitch period of the secondary channel signal use the estimated value of the pitch period of the primary channel signal to differentially encode the pitch period of the secondary channel signal to obtain the pitch period index value of the secondary channel signal.
  • the pitch period index value of the secondary channel signal is used to generate the stereo coded stream to be sent.
  • the pitch period estimation value of the primary channel signal is used to perform differential encoding on the pitch period of the secondary channel signal, a small amount of bit resources can be used to allocate the pitch period of the secondary channel signal for differential encoding.
  • Performing differential encoding on the pitch period of the secondary channel signal can improve the spatiality and sound image stability of the stereo signal.
  • smaller bit resources are used to perform differential coding of the pitch period of the secondary channel signal. Therefore, the saved bit resources can be used for other stereo coding parameters, thereby improving the performance of the secondary channel. The coding efficiency ultimately improves the overall stereo coding quality.
  • the determining whether to perform differential encoding on the pitch period of the secondary channel signal includes: encoding the primary channel signal of the current frame to obtain the primary channel signal The estimated value of the pitch period of the signal; perform an open-loop pitch period analysis on the secondary channel signal of the current frame to obtain the estimated value of the open-loop pitch period of the secondary channel signal; determine the value of the primary channel signal Whether the difference between the estimated value of the pitch period and the estimated value of the open-loop pitch period of the secondary channel signal exceeds the preset secondary channel pitch period differential encoding threshold; when the difference exceeds the secondary channel signal When the channel pitch period differential encoding threshold, it is determined to perform differential encoding on the pitch period of the secondary channel signal; or, when the difference does not exceed the secondary channel pitch period differential encoding threshold, it is determined not to The pitch period of the secondary channel signal is differentially coded.
  • encoding may be performed according to the main channel signal, so as to obtain the estimated value of the pitch period of the main channel signal.
  • an open-loop pitch period analysis can be performed on the secondary channel signal, so that an estimated value of the open-loop pitch period of the secondary channel signal can be obtained.
  • the secondary channel pitch period differential coding threshold can be preset, and can be flexibly configured in combination with stereo coding scenarios.
  • the difference exceeds the secondary channel pitch period differential encoding threshold it is determined to perform differential encoding, and when the difference does not exceed the secondary channel pitch period differential encoding threshold, it is determined not to perform differential encoding.
  • the method when it is determined to perform differential encoding on the pitch period of the secondary channel signal, the method further includes: configuring the secondary channel pitch period differential encoding identifier in the current frame Is a preset first value, the stereo encoding bitstream carries the secondary channel pitch period differential encoding identifier, and the first value is used to indicate the differential encoding of the secondary channel signal's pitch period .
  • the encoding terminal obtains the secondary channel pitch period differential encoding identifier, and the value of the secondary channel pitch period differential encoding identifier can be configured according to whether the pitch period of the secondary channel signal is differentially encoded.
  • the secondary channel pitch The period differential coding flag is used to indicate whether to use differential coding for the pitch period of the secondary channel signal.
  • the secondary channel pitch period differential encoding identifier may have multiple values.
  • the secondary channel pitch period differential encoding identifier may be a preset first value or configured as a second value.
  • an example of the configuration method of the secondary channel pitch period differential encoding identifier is described. When it is determined to perform differential encoding on the pitch period of the secondary channel signal, the secondary channel pitch period differential encoding identifier is configured to the first value .
  • the method further includes: when it is determined that the pitch period of the secondary channel signal is not to be differentially encoded and the pitch period estimation value of the primary channel signal is not multiplexed as the secondary channel signal
  • the pitch period of the signal of the secondary channel is required, the pitch period of the secondary channel signal and the pitch period of the main channel signal are respectively encoded.
  • the pitch period of the secondary channel signal is not differentially encoded, nor is the estimated value of the pitch period of the primary channel signal multiplexed as the pitch period of the secondary channel signal.
  • the embodiment of the present application also The independent coding method of the pitch period of the secondary channel can be used to encode the pitch period of the secondary channel signal, so that the coding of the pitch period of the secondary channel signal can be realized.
  • the method further includes: when it is determined not to perform differential encoding on the pitch period of the secondary channel signal and multiplex the pitch period estimate of the primary channel signal as the secondary channel signal
  • the secondary channel signal pitch period multiplexing identifier is configured as a preset fourth value, and the secondary channel signal pitch period multiplexing identifier is carried in the stereo code stream
  • the fourth value is used to indicate that the estimated value of the pitch period of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
  • a pitch period multiplexing method may also be used in the embodiment of the present application.
  • the secondary channel pitch period is not encoded at the encoding end, and the secondary channel signal pitch period multiplexing identifier is carried in the stereo encoding bitstream, and the secondary channel signal is indicated by the secondary channel signal pitch period multiplexing identifier.
  • the pitch period of the main channel signal is multiplexed with the estimated value of the pitch period of the main channel signal, when the sub-channel signal pitch period multiplexing flag indicates that the pitch period of the sub-channel signal is multiplexed
  • the decoding end may decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
  • the pitch period estimation value of the primary channel signal is used to differentially encode the pitch period of the secondary channel signal to obtain the pitch period of the secondary channel signal
  • the index value includes: searching the closed-loop pitch period of the secondary channel according to the estimated value of the pitch period of the primary channel signal to obtain the estimated value of the pitch period of the secondary channel signal;
  • the signal pitch period search range adjustment factor determines the upper limit of the pitch period index value of the secondary channel signal; according to the pitch period estimate value of the primary channel signal, the pitch period estimate value of the secondary channel signal and the secondary channel signal
  • the upper limit of the pitch period index value of the secondary channel signal calculates the pitch period index value of the secondary channel signal.
  • the encoding end may perform a closed-loop pitch period search of the secondary channel according to the estimated value of the pitch period of the secondary channel signal to determine the estimated value of the pitch period of the secondary channel signal.
  • the pitch period search range adjustment factor of the secondary channel signal can be used to adjust the pitch period index value of the secondary channel signal to determine the upper limit of the pitch period index value of the secondary channel signal.
  • the upper limit of the pitch period index value of the secondary channel signal indicates the upper limit that the value of the pitch period index value of the secondary channel signal cannot exceed.
  • the pitch period index value of the secondary channel signal can be used to determine the pitch period index value of the secondary channel signal.
  • the encoding end determines the pitch period estimation value of the main channel signal, the pitch period estimation value of the secondary channel signal, and the upper limit of the pitch period index value of the secondary channel signal, according to the pitch period estimation value of the main channel signal,
  • the estimated value of the pitch period of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal are differentially coded, and the pitch period index value of the secondary channel signal is output.
  • the performing a closed-loop pitch period search of the secondary channel according to the estimated value of the pitch period of the primary channel signal to obtain the estimated value of the pitch period of the secondary channel signal includes : Determine the closed-loop pitch period reference value of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal and the number of subframes into which the secondary channel signal of the current frame is divided; using the The closed-loop pitch period reference value of the secondary channel signal is used as the starting point of the closed-loop pitch period search of the secondary channel signal, and the closed-loop pitch period search is performed with integer precision and fractional precision to obtain the secondary channel signal Estimated pitch period.
  • the number of subframes divided into the secondary channel signal of the current frame can be determined by the subframe configuration of the secondary channel signal, for example, the number of subframes can be divided into 4 subframes, or 3 subframes, specifically combined
  • the application scenario is determined.
  • the estimated value of the pitch period of the main channel signal the estimated value of the pitch period of the main channel signal and the number of sub-frames into which the secondary channel signal is divided can be used to calculate the closed-loop pitch period of the secondary channel signal Reference.
  • the closed-loop pitch period reference value of the secondary channel signal is a reference value determined according to the estimated value of the pitch period of the primary channel signal.
  • the closed-loop pitch period reference value of the secondary channel signal represents the pitch period of the primary channel signal
  • the estimated value is used as a reference to determine the closed-loop pitch period of the secondary channel signal.
  • the part is regarded as the integral part of the closed-loop pitch period of the secondary channel signal
  • the fractional part of the estimated value of the primary channel signal’s pitch period is regarded as the fractional part of the closed-loop pitch period of the secondary channel signal.
  • the main channel signal The estimated value of the pitch period is mapped to the integral part of the closed-loop pitch period and the fractional part of the closed-loop pitch period of the secondary channel signal.
  • the integral part of the closed-loop pitch period of the secondary channel is loc_T0
  • the fractional part of the closed-loop pitch period is loc_frac_prim.
  • the value of Z is 3, 4, or 5.
  • the upper limit of the pitch period index value of the secondary channel signal is calculated based on the estimated value of the pitch period of the primary channel signal, the estimated value of the pitch period of the secondary channel signal, and the upper limit of the pitch period index value of the secondary channel signal
  • the pitch period index value of the secondary channel signal includes: determining the closed-loop pitch period integer part loc_T0 of the secondary channel signal according to the pitch period estimation value of the primary channel signal, and the secondary channel
  • the pitch_soft_reuse represents the integer part of the pitch period estimate of the secondary channel signal
  • the pitch_frac_soft_reuse represents the fractional part of the pitch period estimate of the
  • the upper limit of the pitch period index value of the channel signal where N represents the number of subframes into which the secondary channel signal is divided, and the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the * represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
  • the method is applied to a stereo encoding scenario where the encoding rate of the current frame is lower than a preset rate threshold; the rate threshold is at least one of the following values: 13.2 kilobits Kbps, 16.4kbps, or 24.4kbps per second.
  • the rate threshold may be less than or equal to 13.2 kbps.
  • the rate threshold may also be 16.4 kbps or 24.4 kbps, and the specific value of the rate threshold may be determined according to application scenarios.
  • relatively low coding rates such as 24.4kbps and lower rates
  • independent coding of the pitch period of the secondary channel is not performed.
  • the estimated value of the pitch period of the main channel signal is used as a reference value, and the differential coding method is used to achieve the secondary channel. To encode the pitch period of the channel signal to improve the quality of stereo encoding.
  • an embodiment of the present application also provides a stereo decoding method, including: determining whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding code stream; When the pitch period of the signal is differentially decoded, the pitch period estimation value of the main channel of the current frame and the pitch period index value of the secondary channel of the current frame are obtained from the stereo encoding bitstream; The estimated value of the pitch period of the channel and the index value of the pitch period of the secondary channel are differentially decoded for the pitch period of the secondary channel signal to obtain the estimated value of the pitch period of the secondary channel signal, so The estimated value of the pitch period of the secondary channel signal is used to decode the stereo coded stream.
  • the stereo encoding code Obtain the pitch period estimation value of the main channel of the current frame and the pitch period index value of the secondary channel of the current frame in the current frame.
  • the pitch period of the secondary channel signal is differentially decoded to obtain an estimated value of the pitch period of the secondary channel signal, and the estimated value of the pitch period of the secondary channel signal is used to decode the stereo coded stream.
  • the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal can be used to compare the difference of the secondary channel signal.
  • the pitch period is differentially decoded, so that the estimated value of the pitch period of the secondary channel signal is obtained.
  • the estimated value of the pitch period of the secondary channel signal can be used to decode the stereo coded stream, so the spatial sense and sound of the stereo signal can be improved. Like stability.
  • the determining whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding bitstream includes: obtaining the secondary channel pitch from the current frame Period differential encoding identifier; when the secondary channel pitch period differential encoding identifier is a preset first value, it is determined to perform differential decoding on the pitch period of the secondary channel signal.
  • the secondary channel pitch period differential encoding identifier may have multiple values, for example, the secondary channel pitch period differential encoding identifier may be a preset first value, for example, the secondary channel pitch period The value of the differential coding flag is 1, and at this time, differential decoding of the pitch period of the secondary channel signal is performed.
  • the method further includes: when it is determined that the pitch period of the secondary channel signal is not to be differentially decoded, and the pitch period estimation value of the primary channel signal is not multiplexed as the When the pitch period of the secondary channel signal is used, the pitch period of the secondary channel signal is decoded from the stereo code stream.
  • the decoding end determines not to perform differential decoding on the pitch period of the secondary channel signal, and does not reuse the estimated value of the pitch period of the primary channel signal as the pitch period of the secondary channel signal.
  • the independent decoding method of the pitch period of the secondary channel can also be used to decode the pitch period of the secondary channel signal, so that the decoding of the pitch period of the secondary channel signal can be realized.
  • the method further includes: when it is determined not to perform differential decoding on the pitch period of the secondary channel signal, and multiplex the pitch period estimation value of the primary channel signal as the secondary channel signal.
  • the pitch period of the channel signal is used, the estimated value of the pitch period of the main channel signal is used as the pitch period of the secondary channel signal.
  • a pitch period multiplexing method may also be used in the embodiment of the present application.
  • the decoding end can be multiplexed according to the pitch period of the secondary channel signal Identifies that the pitch period of the primary channel signal is decoded as the pitch period of the secondary channel signal.
  • said performing differential decoding on the pitch period of the secondary channel signal according to the estimated value of the pitch period of the primary channel and the index value of the pitch period of the secondary channel It includes: determining the closed-loop pitch period reference value of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal and the number of subframes into which the secondary channel signal of the current frame is divided; The pitch period search range adjustment factor of the secondary channel signal determines the upper limit of the pitch period index value of the secondary channel signal; according to the closed-loop pitch period reference value of the secondary channel signal, the The pitch period index value and the upper limit of the pitch period index value of the secondary channel signal are used to calculate the pitch period estimation value of the secondary channel signal.
  • the estimated value of the pitch period of the primary channel signal is used to determine the closed-loop pitch period reference value of the secondary channel signal.
  • the pitch period search range adjustment factor of the secondary channel signal can be used to adjust the pitch period index value of the secondary channel signal to determine the upper limit of the pitch period index value of the secondary channel signal.
  • the upper limit of the pitch period index value of the secondary channel signal indicates the upper limit that the value of the pitch period index value of the secondary channel signal cannot exceed.
  • the pitch period index value of the secondary channel signal can be used to determine the pitch period index value of the secondary channel signal.
  • the decoding end determines the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal, and the upper limit of the pitch period index value of the secondary channel signal, it is based on the closed-loop pitch period of the secondary channel signal.
  • the period reference value, the pitch period index value of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal are differentially decoded, and the estimated value of the pitch period of the secondary channel signal is output.
  • the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal, and the pitch period index of the secondary channel signal includes: calculating the pitch period estimation value T0_pitch of the secondary channel signal in the following manner:
  • T0_pitch f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N; wherein, the f_pitch_prim represents the closed-loop pitch period reference value of the secondary channel signal, and the soft_reuse_index represents the pitch period index value of the secondary channel signal,
  • the N represents the number of subframes into which the secondary channel signal is divided, the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, and M is a non-zero real number, and the / Represents the division operator, the + represents the addition operator, and the-represents the subtraction operator.
  • the value of the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal is 2 or 3.
  • an embodiment of the present application further provides a stereo encoding device, including: a downmix module, configured to perform downmix processing on the left channel signal of the current frame and the right channel signal of the current frame to obtain The main channel signal of the current frame and the secondary channel signal of the current frame; a differential encoding module, configured to use the primary channel signal when it is determined to perform differential encoding on the pitch period of the secondary channel signal
  • the pitch period estimation value of the signal differentially encodes the pitch period of the secondary channel signal to obtain the pitch period index value of the secondary channel signal, and the pitch period index value of the secondary channel signal is used for Generate the stereo coded stream to be sent.
  • the stereo encoding device further includes: a main channel encoding module, configured to encode the main channel signal of the current frame to obtain the pitch period of the main channel signal Estimated value; an open-loop analysis module for performing open-loop pitch period analysis on the secondary channel signal of the current frame to obtain the estimated value of the open-loop pitch period of the secondary channel signal; a threshold judgment module, using To determine whether the difference between the estimated value of the pitch period of the primary channel signal and the estimated value of the open-loop pitch period of the secondary channel signal exceeds the preset secondary channel pitch period differential coding threshold, when When the difference value exceeds the secondary channel pitch period differential coding threshold, it is determined to perform differential coding on the pitch period of the secondary channel signal, and when the difference value does not exceed the secondary channel pitch period differential coding threshold When the threshold is used, it is determined not to perform differential encoding on the pitch period of the secondary channel signal.
  • a main channel encoding module configured to encode the main channel signal of the current frame to obtain the pitch period of the main channel signal Estimat
  • the stereo encoding device further includes: an identification configuration module, configured to: when it is determined that the pitch period of the secondary channel signal is to be differentially encoded, the secondary channel signal in the current frame
  • the primary channel pitch period differential encoding identifier is configured as a preset first value
  • the stereo encoding bitstream carries the secondary channel pitch period differential encoding identifier
  • the first value is used to indicate that the secondary channel is The pitch period of the channel signal is differentially coded.
  • the stereo encoding device further includes: an independent encoding module, wherein the independent encoding module is configured to not perform differential encoding on the pitch period of the secondary channel signal and not When multiplexing the estimated value of the pitch period of the primary channel signal as the pitch period of the secondary channel signal, encode the pitch period of the secondary channel signal and the pitch period of the primary channel signal respectively .
  • the stereo encoding device further includes: an identification configuration module, used when it is determined not to differentially encode the pitch period of the secondary channel signal and multiplex the primary channel signal
  • an identification configuration module used when it is determined not to differentially encode the pitch period of the secondary channel signal and multiplex the primary channel signal
  • the secondary channel signal pitch period multiplexing identifier is configured as a preset fourth value, and the secondary channel signal is carried in the stereo code stream.
  • a multiplexing identifier of the pitch period of the primary channel signal, and the fourth value is used to indicate that the estimated value of the pitch period of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
  • the differential encoding module includes: a closed-loop pitch period search module, configured to perform a closed-loop pitch period search of the secondary channel according to the estimated value of the pitch period of the primary channel signal to obtain The estimated value of the pitch period of the secondary channel signal; an index value upper limit determination module, configured to determine the upper limit of the pitch period index value of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal ; Index value calculation module for calculating the upper limit of the index value of the sub-channel signal based on the estimated value of the pitch period of the main channel signal, the estimated value of the pitch period of the secondary channel signal and the index value of the sub-channel signal The index value of the pitch period of the desired channel signal.
  • the closed-loop pitch period search module is configured to determine the number of sub-frames divided into sub-frames based on the estimated value of the pitch period of the primary channel signal and the secondary channel signal of the current frame, Determine the closed-loop pitch period reference value of the secondary channel signal; use the closed-loop pitch period reference value of the secondary channel signal as the starting point of the closed-loop pitch period search of the secondary channel signal, using integer precision and The score precision performs a closed-loop pitch period search to obtain the estimated value of the pitch period of the secondary channel signal.
  • the value of Z is: 3, or 4, or 5.
  • the index value calculation module is configured to determine the closed-loop pitch period integer part loc_T0 of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal, and the secondary channel signal
  • the closed-loop pitch period fraction loc_frac_prim of the secondary channel signal; the pitch period index value soft_reuse_index of the secondary channel signal is calculated in the following way:
  • soft_reuse_index (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M; wherein the pitch_soft_reuse represents the integer part of the pitch period estimate of the secondary channel signal, and the pitch_frac_soft_reuse represents the The fractional part of the estimated value of the pitch period of the secondary channel signal, the soft_reuse_index_high_limit represents the upper limit of the pitch period index value of the secondary channel signal, and the N represents the number of subframes in which the secondary channel signal is divided ,
  • the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the * represents the multiplication operator, the + represents the addition operator, and the- Represents the subtraction operator.
  • the stereo encoding device is applied to a stereo encoding scenario where the encoding rate of the current frame is lower than a preset rate threshold; the rate threshold is at least one of the following values: 13.2 Kilobits per second kbps, 16.4kbps, or 24.4kbps.
  • the component modules of the stereo encoding device can also perform the steps described in the first aspect and various possible implementations.
  • the first aspect and various possible implementations instruction of.
  • an embodiment of the present application further provides a stereo decoding device, including: a determination module, configured to determine whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding code stream; a value acquisition module, When it is determined to perform differential decoding on the pitch period of the secondary channel signal, obtain the estimated value of the pitch period of the primary channel of the current frame and the secondary channel of the current frame from the stereo encoding bitstream The pitch period index value of the; a differential decoding module for performing differential decoding on the pitch period of the secondary channel signal according to the estimated value of the pitch period of the primary channel and the pitch period index value of the secondary channel , To obtain the estimated value of the pitch period of the secondary channel signal, and the estimated value of the pitch period of the secondary channel signal is used to decode the stereo coded stream.
  • a determination module configured to determine whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding code stream
  • a value acquisition module When it is determined to perform differential decoding on the pitch period
  • the determining module is configured to obtain a secondary channel pitch period differential coding identifier from the current frame; when the secondary channel pitch period differential coding identifier is a preset first When it is one value, it is determined to perform differential decoding on the pitch period of the secondary channel signal.
  • the stereo decoding device further includes: an independent decoding module, wherein the independent decoding module is used for when it is determined not to perform differential decoding on the pitch period of the secondary channel signal, and When the estimated value of the pitch period of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, the pitch period of the secondary channel signal is decoded from the stereo code stream.
  • the stereo decoding device further includes: a pitch period multiplexing module, wherein the pitch period multiplexing module is used to determine not to perform the pitch period of the secondary channel signal.
  • the pitch period multiplexing module is used to determine not to perform the pitch period of the secondary channel signal.
  • the differential decoding module includes: a reference value determining sub-module, configured to divide the primary channel signal according to the estimated value of the pitch period of the primary channel signal and the secondary channel signal of the current frame The number of sub-frames of the secondary channel signal determines the closed-loop pitch period reference value of the secondary channel signal; the index value upper limit determination sub-module is used to determine the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal The upper limit of the pitch period index value of the channel signal; the estimated value calculation sub-module is used to calculate the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel, and the secondary sound The upper limit of the index value of the pitch period of the channel signal calculates the estimated value of the pitch period of the secondary channel signal.
  • the estimated value calculation submodule is configured to calculate the pitch period estimated value T0_pitch of the secondary channel signal in the following manner:
  • T0_pitch f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N; wherein, the f_pitch_prim represents the closed-loop pitch period reference value of the secondary channel signal, and the soft_reuse_index represents the pitch period index value of the secondary channel signal,
  • the N represents the number of subframes into which the secondary channel signal is divided, the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, and M is a non-zero real number, and the / Represents the division operator, the + represents the addition operator, and the-represents the subtraction operator.
  • the value of the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal is 2 or 3.
  • the component modules of the stereo decoding device can also perform the steps described in the foregoing second aspect and various possible implementations. For details, see the foregoing description of the second aspect and various possible implementations. instruction of.
  • an embodiment of the present application provides a stereo processing device.
  • the stereo processing device may include entities such as a stereo encoding device or a stereo decoding device or a chip, and the stereo processing device includes a processor.
  • the stereo processing may further include a memory; the memory is used to store instructions; the processor is used to execute the instructions in the memory, so that the stereo processing apparatus executes the aforementioned first aspect or second aspect. The method of any of the aspects.
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, which when run on a computer, causes the computer to execute the above-mentioned first or second aspect. The method described.
  • the embodiments of the present application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the method described in the first or second aspect.
  • the present application provides a chip system including a processor for supporting a stereo encoding device or a stereo decoding device to implement the functions involved in the above aspects, for example, sending or processing the functions involved in the above methods Data and/or information.
  • the chip system further includes a memory, and the memory is used to store program instructions and data necessary for the stereo encoding device or the stereo decoding device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of the composition structure of a stereo processing system provided by an embodiment of the application
  • FIG. 2a is a schematic diagram of the stereo encoder and the stereo decoder provided by an embodiment of the application applied to a terminal device;
  • 2b is a schematic diagram of the stereo encoder provided by an embodiment of the application applied to a wireless device or a core network device;
  • 2c is a schematic diagram of the stereo decoder provided by an embodiment of the application applied to a wireless device or a core network device;
  • Fig. 3a is a schematic diagram of a multi-channel encoder and a multi-channel decoder provided by an embodiment of the application applied to a terminal device;
  • FIG. 3b is a schematic diagram of a multi-channel encoder provided by an embodiment of the application applied to a wireless device or a core network device;
  • FIG. 3c is a schematic diagram of applying the multi-channel decoder provided by an embodiment of the application to a wireless device or a core network device;
  • FIG. 4 is a schematic diagram of an interaction process between a stereo encoding device and a stereo decoding device in an embodiment of the application;
  • FIG. 5 is a schematic flowchart of a stereo signal encoding provided by an embodiment of the application.
  • FIG. 6 is a flowchart of encoding the pitch period parameter of the primary channel signal and the pitch period parameter of the secondary channel signal provided by an embodiment of the application;
  • Fig. 7 is a comparison diagram of the pitch period quantization results obtained by adopting independent coding mode and differential coding mode
  • Figure 8 is a comparison diagram of the number of bits allocated to the fixed code table after adopting the independent coding mode and the differential coding mode;
  • FIG. 9 is a schematic diagram of a time-domain stereo coding method provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of the composition structure of a stereo encoding device provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of the composition structure of a stereo decoding device provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of the composition structure of another stereo encoding device provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of the composition structure of another stereo decoding apparatus provided by an embodiment of the application.
  • the embodiments of the present application provide a stereo encoding method, stereo decoding method and device, which improve stereo encoding and decoding performance.
  • the stereo processing system 100 may include: a stereo encoding device 101 and a stereo decoding device 102.
  • the stereo encoding device 101 can be used to generate a stereo encoding stream, and then the stereo encoding stream can be transmitted to the stereo decoding device 102 through the audio transmission channel, and the stereo decoding device 102 can receive the stereo encoding stream, and then execute the stereo decoding device 102.
  • the stereo decoding function finally get the stereo decoding bit stream.
  • the stereo encoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the stereo encoding device may be the aforementioned terminal device or wireless device or Stereo encoder for core network equipment.
  • the stereo decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the stereo decoding device can be the above-mentioned terminal device or the stereo of the wireless device or core network device. decoder.
  • the stereo encoder and the stereo decoder provided by the embodiments of this application are applied to a terminal device.
  • Each terminal device can include: stereo encoder, channel encoder, stereo decoder, channel decoder.
  • the channel encoder is used for channel encoding the stereo signal
  • the channel decoder is used for channel decoding the stereo signal.
  • the first terminal device 20 may include: a first stereo encoder 201, a first channel encoder 202, a first stereo decoder 203, and a first channel decoder 204.
  • the second terminal device 21 may include: a second stereo decoder 211, a second channel decoder 212, a second stereo encoder 213, and a second channel encoder 214.
  • the first terminal device 20 is connected to a wireless or wired first network communication device 22, the first network communication device 22 is connected to a wireless or wired second network communication device 23 through a digital channel, and the second terminal device 21 is connected to wireless or wired The second network communication device 23.
  • the aforementioned wireless or wired network communication equipment may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
  • the terminal device as the transmitting end performs stereo encoding on the collected stereo signal, and then performs channel encoding, and transmits it in the digital channel through the wireless network or the core network.
  • the terminal device as the receiving end performs channel decoding according to the received signal to obtain a stereo signal encoding code stream, and then the stereo signal is recovered through stereo decoding, which is played back by the receiving end terminal device.
  • the wireless device or core network device 25 includes: a channel decoder 251, other audio decoders 252, a stereo encoder 253, and a channel encoder 254.
  • the other audio decoders 252 refer to audio decoders other than the stereo decoder. Device.
  • the channel decoder 251 first performs channel decoding on the signal entering the device, then uses other audio decoders 252 for audio decoding (except for stereo decoding), and then uses the stereo encoder 253 for stereo Encoding, and finally channel encoding the stereo signal using the channel encoder 254, and then transmitting it after the channel encoding is completed.
  • the wireless device or core network device 25 includes: a channel decoder 251, a stereo decoder 255, other audio encoders 256, and a channel encoder 254, where the other audio encoders 256 refer to other audio encoders other than the stereo encoder Device.
  • the channel decoder 251 first performs channel decoding on the signal entering the device, then uses the stereo decoder 255 to decode the received stereo coded stream, and then uses other audio encoders 256 Perform audio coding (except for stereo coding), and finally use the channel encoder 254 to perform channel coding on the stereo signal, and then transmit it after the channel coding is completed.
  • wireless equipment or core network equipment if transcoding needs to be implemented, corresponding stereo encoding and decoding processing is required.
  • wireless devices refer to radio-frequency-related devices in communications
  • core network devices refer to devices related to the core network in communications.
  • the stereo encoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the stereo encoding device can be the aforementioned terminal device or wireless device. Or a multi-channel encoder for core network equipment.
  • the stereo decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the stereo decoding device can be multiple of the aforementioned terminal devices or wireless devices or core network devices. Channel decoder.
  • the multi-channel encoder and multi-channel decoder provided by the embodiments of this application are applied to terminal equipment.
  • Each terminal device may include: a multi-channel encoder, a channel encoder, Multi-channel decoder, channel decoder.
  • the channel encoder is used for channel encoding the multi-channel signal
  • the channel decoder is used for channel decoding the multi-channel signal.
  • the first terminal device 30 may include: a first multi-channel encoder 301, a first channel encoder 302, a first multi-channel decoder 303, and a first channel decoder 304.
  • the second terminal device 31 may include: a second multi-channel decoder 311, a second channel decoder 312, a second multi-channel encoder 313, and a second channel encoder 314.
  • the first terminal device 30 is connected to a wireless or wired first network communication device 32
  • the first network communication device 32 is connected to a wireless or wired second network communication device 33 through a digital channel
  • the second terminal device 31 is connected to wireless or wired The second network communication device 33.
  • the aforementioned wireless or wired network communication equipment may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
  • the terminal device as the transmitting end performs multi-channel coding on the collected multi-channel signal, and then performs channel coding and then transmits it in the digital channel through the wireless network or the core network.
  • the terminal device as the receiving end performs channel decoding according to the received signal to obtain a multi-channel signal encoding code stream, and then recovers the multi-channel signal through multi-channel decoding, which is played back by the terminal device as the receiving end.
  • FIG. 3b a schematic diagram of the application of the multi-channel encoder provided by the embodiment of this application to a wireless device or core network device, where the wireless device or core network device 35 includes a channel decoder 351 and other audio decoders 352
  • the multi-channel encoder 353 and the channel encoder 354 are similar to those in Figure 2b, and will not be repeated here.
  • FIG. 3c a schematic diagram of the multi-channel decoder provided by this embodiment of the application being applied to a wireless device or a core network device, where the wireless device or core network device 35 includes: a channel decoder 351 and a multi-channel decoder 355.
  • Other audio encoders 356 and channel encoders 354 are similar to those in FIG. 2c, and will not be repeated here.
  • the stereo encoding process can be a part of the multi-channel encoder, and the stereo decoding process can be a part of the multi-channel decoder.
  • the multi-channel encoding of the collected multi-channel signal can be After the dimensionality reduction process of the multi-channel signal, the stereo signal is obtained, and the obtained stereo signal is encoded; the decoding end encodes the code stream according to the multi-channel signal, decodes the stereo signal, and restores the multi-channel signal after upmixing. Therefore, the embodiments of the present application can also be applied to multi-channel encoders and multi-channel decoders in terminal equipment, wireless equipment, and core network equipment. In wireless or core network equipment, if transcoding needs to be implemented, corresponding multi-channel encoding and decoding processing is required.
  • a more important link is pitch period coding.
  • the voiced sound is generated by quasi-periodic pulse excitation, its time-domain waveform shows obvious periodicity. This period is called the pitch period.
  • the pitch period plays a very important role in producing high-quality voiced speech. This is because voiced speech is characterized as a quasi-periodic signal composed of samples separated by the pitch period.
  • the pitch period can also be expressed by the number of samples contained in a period, which is called pitch delay.
  • the pitch delay is an important parameter of the adaptive codebook.
  • Pitch period estimation mainly refers to the process of estimating the pitch period. Therefore, the accuracy of pitch period estimation directly determines the correctness of the excitation signal and also determines the synthesis quality of the speech signal. Less bit resources are used to represent the pitch period at low and medium bit rates, which is one of the reasons for the loss of speech coding quality.
  • the pitch period of the primary channel signal and the secondary channel signal have a strong similarity.
  • the embodiments of the present application can reasonably use the similarity of the pitch period to improve coding efficiency, which is important for the overall stereo coding quality at low and medium rates. factor.
  • the pitch period of the primary channel signal is correlated with the pitch period of the secondary channel signal.
  • the pitch period coding of the signal when the pitch period multiplexing conditions of the secondary channel signal are met, the differential coding method is used to reasonably predict the pitch period parameters in the secondary channel signal and perform differential coding, only a small amount of bit resource allocation is required It suffices to perform quantization coding on the pitch period of the secondary channel signal.
  • the embodiments of the present application can improve the spatial perception and sound image stability of the stereo signal.
  • the pitch period of the secondary channel signal uses smaller bit resources to ensure the accuracy of the pitch period prediction of the secondary channel signal, and the remaining bit resources are used for other stereo coding parameters, such as available
  • the coding efficiency of the secondary channel is improved, and the overall stereo coding quality is finally improved.
  • the pitch period differential coding method for the secondary channel signal is adopted, the pitch period of the primary channel signal is used as a reference value, and the bit resources of the secondary channel Redistribute to achieve the purpose of improving the quality of stereo encoding.
  • FIG. 4 it is a schematic diagram of an interaction flow between the stereo encoding device and the stereo decoding device in the embodiment of this application, where the following steps 401 to 403 can be executed by the stereo encoding device (hereinafter referred to as the encoding end).
  • the following steps 411 to 413 may be performed by a stereo decoding device (hereinafter referred to as the interface terminal), and mainly include the following processes:
  • the current frame refers to a stereo signal frame currently undergoing encoding processing in the encoding end.
  • the left channel signal of the current frame and the right channel signal of the current frame are obtained, and the left channel signal and The right channel signal is downmixed to obtain the main channel signal of the current frame and the secondary channel signal of the current frame.
  • the encoder side downmixes the time domain signal into two mono signals, and first downmixes the left and right channel signals into the main channel signal and the secondary channel signal.
  • L represents the left channel signal
  • R represents the right channel signal
  • the main channel signal can be 0.5*(L+R), which represents the relevant information between the two channels
  • the secondary channel signal can be 0.5*(LR), which represents the difference information between the two channels.
  • the stereo encoding method executed by the encoding end may be applied to a stereo encoding scenario where the encoding rate of the current frame is lower than a preset rate threshold.
  • the stereo decoding method performed by the decoder can be applied to a stereo decoding scenario where the decoding rate of the current frame is lower than the preset rate threshold.
  • the encoding rate of the current frame refers to the encoding rate adopted by the stereo signal of the current frame
  • the rate threshold refers to the minimum rate value set for the stereo signal.
  • This application can be executed when the encoding rate of the current frame is lower than the preset rate threshold.
  • the stereo encoding method provided in the embodiment can execute the stereo decoding method provided in the embodiment of the present application when the decoding rate of the current frame is lower than the preset rate threshold.
  • the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4 kbps, or 24.4 kbps.
  • the rate threshold may be less than or equal to 13.2 kbps.
  • the rate threshold may also be 16.4 kbps or 24.4 kbps, and the specific value of the rate threshold may be determined according to application scenarios.
  • relatively low coding rates such as 24.4kbps and lower rates
  • independent coding of the pitch period of the secondary channel is not performed.
  • the estimated value of the pitch period of the main channel signal is used as a reference value, and the differential coding method is used to achieve the secondary channel. To encode the pitch period of the channel signal to improve the quality of stereo encoding.
  • the next step can be based on the primary channel signal and secondary channel signal of the current frame to determine whether the secondary channel signal can be
  • the pitch period of the channel signal is differentially coded. For example, according to the signal characteristics of the main channel signal and the secondary channel signal of the current frame, determine whether to differentially encode the pitch period of the secondary channel signal.
  • the primary channel signal and the secondary channel signal can also be used.
  • the preset decision conditions are used to decide whether to perform differential coding on the pitch period of the secondary channel signal. There are many ways to use the primary channel signal and the secondary channel signal to determine whether to perform differential encoding, which will be described in detail in the subsequent embodiments.
  • the step 402 determining whether to perform differential encoding on the pitch period of the secondary channel signal includes:
  • encoding may be performed according to the main channel signal, so as to obtain the estimated value of the pitch period of the main channel signal.
  • the pitch period estimation uses a combination of open-loop pitch analysis and closed-loop pitch search, which improves the accuracy of pitch period estimation.
  • Various methods can be used to estimate the pitch period of the speech signal, such as autocorrelation function, short-term average amplitude difference, etc.
  • the pitch period estimation algorithm is based on the autocorrelation function.
  • the autocorrelation function has a peak at an integer multiple of the pitch period. This feature can be used to estimate the pitch period.
  • pitch period detection uses a fractional delay with 1/3 as the sampling resolution.
  • pitch period estimation includes two steps: open-loop pitch analysis and closed-loop pitch search.
  • Open-loop pitch analysis is used to roughly estimate the integer delay of a frame of speech to obtain a candidate integer delay.
  • the closed-loop pitch search estimates the pitch delay in its vicinity, and the closed-loop pitch search is performed once every subframe.
  • the open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and optimal open-loop integer delay are calculated respectively.
  • the open-loop pitch period analysis of the secondary channel signal can be performed to obtain the open-loop pitch period estimation value of the secondary channel signal.
  • the open-loop pitch period analysis The specific process will not be explained in detail.
  • the estimated value of the pitch period of the primary channel signal and the estimated value of the secondary channel signal can be calculated.
  • the difference between the estimated values of the open-loop pitch period is then judged whether the difference exceeds the preset secondary channel pitch period differential coding threshold.
  • the secondary channel pitch period differential coding threshold can be preset, and can be flexibly configured in combination with stereo coding scenarios. When the difference exceeds the secondary channel pitch period differential encoding threshold, it is determined to perform differential encoding, and when the difference does not exceed the secondary channel pitch period differential encoding threshold, it is determined not to perform differential encoding.
  • the method of determining whether to perform differential encoding on the pitch period of the secondary channel signal in the embodiment of the present application is not limited to the above-mentioned numerical value judgment based on the difference value and the secondary channel pitch period differential encoding threshold, for example, It can also be judged whether the result of dividing the difference by the secondary channel pitch period differential coding threshold is less than 1.
  • the estimated value of the pitch period of the primary channel signal and the estimated value of the open-loop pitch period of the secondary channel signal can be divided, and the result of the division can be compared with the secondary channel pitch period differential coding threshold. Value judgment.
  • the specific value of the differential coding threshold of the pitch period of the secondary channel can be determined in combination with the application scenario, and is not limited here.
  • the secondary channel pitch period differential coding decision is made according to the estimated value of the primary channel signal's pitch period and the estimated value of the secondary channel signal's open-loop pitch period.
  • DIFF represents the difference between the estimated value of the pitch period of the primary channel signal and the estimated value of the open-loop pitch period of the secondary channel signal
  • represents Take the absolute value of the difference between ⁇ (pitch[0]) and ⁇ (pitch[1])
  • ⁇ pitch[0] represents the estimated value of the pitch period of the main channel signal
  • ⁇ pitch[1] represents the secondary sound Estimated value of the open-loop pitch period of the channel signal.
  • judgment conditions that can be used in the embodiments of this application may not be limited to the above formulas.
  • a correction factor can be set , The correction factor is multiplied by
  • a conditional threshold constant can be added or subtracted to obtain the final DIFF.
  • step 403 after determining whether to perform differential encoding on the pitch period of the secondary channel signal, determine whether to perform step 403 according to the determined result. When it is determined to perform differential encoding on the pitch period of the secondary channel signal, The subsequent step 403 is triggered.
  • the method provided in the embodiments of the present application further includes:
  • the secondary channel pitch period differential encoding identifier In the current frame to the preset first value, and the stereo encoding code stream carries the secondary channel pitch Period differential encoding identifier, the first value is used to indicate differential encoding of the pitch period of the secondary channel signal.
  • the encoding terminal obtains the secondary channel pitch period differential encoding identifier, and the value of the secondary channel pitch period differential encoding identifier can be configured according to whether the pitch period of the secondary channel signal is differentially encoded.
  • the secondary channel pitch The period differential coding flag is used to indicate whether to use differential coding for the pitch period of the secondary channel signal.
  • the secondary channel pitch period differential encoding identifier may have multiple values.
  • the secondary channel pitch period differential encoding identifier may be a preset first value or configured as a second value.
  • the configuration method of the secondary channel pitch period differential encoding identifier is described.
  • the secondary channel pitch period differential encoding identifier is configured to the first value .
  • the decoder can determine that the pitch period of the secondary channel signal can be differentially decoded.
  • the value of the secondary channel pitch period differential coding identifier can be 0 or 1, the first value is 1, and the second value is 0.
  • the Pitch_reuse_flag is used to indicate the Pitch_reuse_flag for the differential coding flag of the pitch period of the secondary channel.
  • the method provided in the embodiments of the present application further includes:
  • the pitch period of the secondary channel signal is not differentially coded and the estimated value of the pitch period of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal
  • the pitch period of the secondary channel signal and the primary sound The pitch period of the channel signal is coded separately.
  • the pitch period of the secondary channel signal is not differentially encoded, nor is the estimated value of the pitch period of the primary channel signal multiplexed as the pitch period of the secondary channel signal.
  • the embodiment of the present application also The independent coding method of the pitch period of the secondary channel can be used to encode the pitch period of the secondary channel signal, so that the coding of the pitch period of the secondary channel signal can be realized.
  • the method provided in the embodiments of the present application further includes:
  • the secondary channel signal pitch The period multiplexing identifier is configured as a preset fourth value, and the secondary channel signal pitch period multiplexing identifier is carried in the stereo code stream, and the fourth value is used to indicate multiplexing of the primary sound
  • the estimated value of the pitch period of the channel signal is used as the pitch period of the secondary channel signal.
  • a pitch period multiplexing method may also be used in the embodiment of the present application. That is, the secondary channel pitch period is not encoded at the encoding end, and the secondary channel signal pitch period multiplexing identifier is carried in the stereo encoding bitstream, and the secondary channel signal is indicated by the secondary channel signal pitch period multiplexing identifier.
  • the decoding end may decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
  • the method provided in the embodiments of the present application further includes:
  • the secondary channel pitch period differential encoding identifier When it is determined not to perform differential encoding on the pitch period of the secondary channel signal, configure the secondary channel pitch period differential encoding identifier to the preset second value, and the stereo encoding code stream carries the secondary channel pitch period differential encoding identifier ,
  • the second value is used to indicate that the pitch period of the secondary channel signal is not to be differentially coded;
  • the secondary channel signal When it is determined that the estimated value of the pitch period of the primary channel signal is not to be multiplexed as the pitch period of the secondary channel signal, the secondary channel signal’s pitch period multiplexing flag is configured as a preset third value, and the stereo encoding code stream Carrying a secondary channel signal pitch period multiplexing identifier, the third value is used to indicate that the pitch period estimation value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal;
  • the secondary channel pitch period differential encoding identifier may have multiple values, for example, the secondary channel pitch period differential encoding identifier may be a preset first value or configured as a second value.
  • the secondary channel pitch period differential encoding identifier is configured to the second value .
  • the decoder can determine that the pitch period of the secondary channel signal can not be differentially decoded.
  • the value of the secondary channel pitch period differential coding flag can be Is 0 or 1, the first value is 1, and the second value is 0.
  • the second value is indicated by the secondary channel pitch period differential encoding identifier, so that the decoder can determine not to perform differential decoding on the pitch period of the secondary channel signal.
  • the secondary channel pitch period multiplexing identifier may have multiple values.
  • the secondary channel pitch period multiplexing identifier may be a preset fourth value or configured as a third value.
  • an example of the configuration method of the secondary channel pitch period multiplexing identifier is described. When it is determined not to multiplex the pitch period estimation value of the primary channel signal as the pitch period of the secondary channel signal, the secondary channel signal The multiplexing identifier of the required channel pitch period is configured as a third value.
  • the decoder can determine that the pitch period estimation value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, for example, the secondary channel pitch period
  • the value of the multiplexing identifier can be 0 or 1, the fourth value is 1, and the third value is 0.
  • the encoding end may use independent
  • the encoding method is to separately encode the pitch period of the secondary channel signal and the pitch period of the main channel signal.
  • the pitch period independent coding method of the secondary channel can be used to determine the pitch period of the secondary channel signal. Encode.
  • the pitch period multiplexing method can also be used.
  • the stereo encoding method executed by the encoder can be applied to stereo encoding scenarios where the encoding rate of the current frame is lower than the preset rate threshold. If the pitch period of the secondary channel signal is not used for differential encoding, the secondary channel signal can also be used for differential encoding.
  • the multiplexing method of the channel pitch period that is, the secondary channel pitch period is not encoded at the encoding end, and the secondary channel signal pitch period multiplexing identifier is carried in the stereo encoding code stream, and the secondary channel signal pitch period is passed
  • the multiplexing flag indicates whether the pitch period of the secondary channel signal is multiplexed with the estimated value of the pitch period of the primary channel signal.
  • the decoding end can decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
  • the method provided in the embodiments of the present application further includes:
  • the secondary channel pitch period differential encoding identifier When it is determined not to perform differential encoding on the pitch period of the secondary channel signal, configure the secondary channel pitch period differential encoding identifier to the preset second value, and the stereo encoding code stream carries the secondary channel pitch period differential encoding identifier ,
  • the second value is used to indicate that the pitch period of the secondary channel signal is not to be differentially coded;
  • the secondary channel signal’s pitch period multiplexing identifier is configured as a preset fourth value, and the stereo encoding code stream carries The secondary channel signal pitch period multiplexing identifier, and the fourth value is used to indicate that the estimated value of the pitch period of the multiplexed primary channel signal is used as the pitch period of the secondary channel signal.
  • the secondary channel pitch period differential encoding identifier may have multiple values, for example, the secondary channel pitch period differential encoding identifier may be a preset first value or configured as a second value.
  • the secondary channel pitch period differential encoding identifier is configured to the second value .
  • the decoder can determine that the pitch period of the secondary channel signal can not be differentially decoded.
  • the value of the secondary channel pitch period differential coding flag can be Is 0 or 1, the first value is 1, and the second value is 0.
  • the second value is indicated by the secondary channel pitch period differential encoding identifier, so that the decoder can determine not to perform differential decoding on the pitch period of the secondary channel signal.
  • the secondary channel pitch period multiplexing identifier may have multiple values.
  • the secondary channel pitch period multiplexing identifier may be a preset fourth value or configured as a third value.
  • the encoder side determines that the pitch period of the secondary channel signal is not to be differentially coded, and the estimated value of the pitch period of the primary channel signal is reused as the pitch period of the secondary channel signal, configure the secondary channel signal pitch period multiplexing
  • the value of the identifier is the fourth value.
  • the secondary channel signal The channel pitch period multiplexing identifier is configured as a fourth value.
  • the decoder can determine the estimated value of the multiplexed primary channel signal’s pitch period as the secondary channel signal’s pitch period, for example, the secondary channel’s pitch period complex
  • the value of the identifier can be 0 or 1, the fourth value is 1, and the third value is 0.
  • the pitch period estimation value of the primary channel signal can be used to perform differential encoding on the pitch period of the secondary channel signal.
  • the above-mentioned differential coding uses the estimated value of the pitch period of the main channel signal, and takes into account the similarity of the pitch period between the main channel signal and the secondary channel signal.
  • the estimated value of the pitch period is encoded, and the estimated value of the pitch period of the secondary channel signal can be used to decode the secondary channel signal more accurately, thereby improving the spatial perception and sound image stability of the stereo signal.
  • the pitch period of the secondary channel signal is independently coded, in the embodiment of the present application, the pitch period of the secondary channel signal is differentially coded, which can reduce the use when independently coding the pitch period of the secondary channel signal.
  • the bit resource overhead is allocated to other stereo coding parameters to realize accurate secondary channel pitch period coding and improve the overall stereo coding quality.
  • encoding may be performed according to the main channel signal, so as to obtain the estimated value of the pitch period of the main channel signal.
  • the pitch period estimation uses a combination of open-loop pitch analysis and closed-loop pitch search, which improves the accuracy of pitch period estimation.
  • Various methods can be used to estimate the pitch period of the speech signal, such as autocorrelation function, short-term average amplitude difference, etc.
  • the pitch period estimation algorithm is based on the autocorrelation function.
  • the autocorrelation function has a peak at an integer multiple of the pitch period. This feature can be used to estimate the pitch period.
  • pitch period detection uses a fractional delay with 1/3 as the sampling resolution.
  • pitch period estimation includes two steps: open-loop pitch analysis and closed-loop pitch search.
  • the open-loop pitch analysis is used to roughly estimate the integer delay of a frame of speech to obtain a candidate integer delay.
  • the closed-loop pitch search estimates the pitch delay in its vicinity, and the closed-loop pitch search is performed once every subframe.
  • the open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and optimal open-loop integer delay are calculated respectively.
  • step 403 uses the estimated value of the pitch period of the primary channel signal to differentially encode the pitch period of the secondary channel signal to obtain the secondary sound
  • the index value of the pitch period of the channel signal including:
  • the pitch period index value of the secondary channel signal is calculated according to the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal.
  • the encoder first performs a closed-loop pitch period search of the secondary channel according to the estimated value of the pitch period of the secondary channel signal to determine the estimated value of the pitch period of the secondary channel signal.
  • the closed-loop pitch period search of the secondary channel based on the estimated value of the pitch period of the primary channel signal to obtain the estimated value of the pitch period of the secondary channel signal includes:
  • the number of subframes divided into the secondary channel signal of the current frame can be determined by the subframe configuration of the secondary channel signal, for example, the number of subframes can be divided into 4 subframes, or 3 subframes, specifically combined
  • the application scenario is determined.
  • the estimated value of the pitch period of the main channel signal the estimated value of the pitch period of the main channel signal and the number of sub-frames into which the secondary channel signal is divided can be used to calculate the closed-loop pitch period of the secondary channel signal Reference.
  • the closed-loop pitch period reference value of the secondary channel signal is a reference value determined according to the estimated value of the pitch period of the primary channel signal.
  • the closed-loop pitch period reference value of the secondary channel signal represents the pitch period of the primary channel signal
  • the estimated value is used as a reference to determine the closed-loop pitch period of the secondary channel signal.
  • one of the methods is to directly use the pitch period of the main channel signal as the closed-loop pitch period reference value of the secondary channel signal, that is, select 4 values from the pitch period in the 5 subframes of the main channel signal As the reference value of the closed-loop pitch period of the 4 sub-frames of the secondary channel signal.
  • Another method is to use an interpolation method to map the pitch period in the 5 subframes of the main channel signal to the closed-loop pitch period reference value of the 4 subframes of the secondary channel signal.
  • the closed-loop pitch period reference value of the secondary channel signal is used as the starting point of the closed-loop pitch period search of the secondary channel signal, and the closed-loop pitch period search is carried out with integer precision and down-sampling fractional precision, and finally through calculation and interpolation The correlation is obtained to obtain the estimated value of the pitch period of the secondary channel signal.
  • the estimated value of the pitch period of the secondary channel signal see the examples in the subsequent embodiments for details.
  • the pitch period search range adjustment factor of the secondary channel signal can be used to adjust the pitch period index value of the secondary channel signal to determine the upper limit of the pitch period index value of the secondary channel signal.
  • the upper limit of the pitch period index value of the secondary channel signal indicates the upper limit that the value of the pitch period index value of the secondary channel signal cannot exceed.
  • the pitch period index value of the secondary channel signal can be used to determine the pitch period index value of the secondary channel signal.
  • the closed-loop pitch period reference of the secondary channel signal is determined according to the estimated value of the pitch period of the primary channel signal and the number of subframes divided into the secondary channel signal of the current frame Values include:
  • the closed-loop pitch period reference value f_pitch_prim of the secondary channel signal is calculated as follows:
  • f_pitch_prim loc_T0+loc_frac_prim/N;
  • N represents the number of subframes into which the secondary channel signal is divided.
  • the part is regarded as the integral part of the closed-loop pitch period of the secondary channel signal
  • the fractional part of the estimated value of the primary channel signal’s pitch period is regarded as the fractional part of the closed-loop pitch period of the secondary channel signal.
  • the main channel signal The estimated value of the pitch period is mapped to the integral part of the closed-loop pitch period and the fractional part of the closed-loop pitch period of the secondary channel signal.
  • the integral part of the closed-loop pitch period of the secondary channel is loc_T0
  • the fractional part of the closed-loop pitch period is loc_frac_prim.
  • N represents the number of subframes into which the secondary channel signal is divided.
  • the value of N can be 3, 4, or 5, etc., and the specific value depends on the application scenario.
  • determining the upper limit of the pitch period index value of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal includes:
  • soft_reuse_index_high_limit 0.5+2 Z ;
  • Z is the pitch period search range adjustment factor of the secondary channel signal, and the value of Z is: 3, or 4, or 5.
  • soft_reuse_index_high_limit 0.5+2 Z to obtain soft_reuse_index_high_limit
  • Z can be 3, or 4, or 5.
  • the specific value of Z is not limited here, and it depends on the application scenario.
  • the encoding end determines the pitch period estimation value of the main channel signal, the pitch period estimation value of the secondary channel signal, and the upper limit of the pitch period index value of the secondary channel signal, according to the pitch period estimation value of the main channel signal,
  • the estimated value of the pitch period of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal are differentially coded, and the pitch period index value of the secondary channel signal is output.
  • the secondary sound is calculated based on the estimated value of the pitch period of the primary channel signal, the estimated value of the pitch period of the secondary channel signal, and the upper limit of the index value of the pitch period of the secondary channel signal.
  • the index value of the pitch period of the channel signal including:
  • the pitch period index value soft_reuse_index of the secondary channel signal is calculated as follows:
  • soft_reuse_index (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
  • pitch_soft_reuse represents the integer part of the estimated value of the pitch period of the secondary channel signal
  • pitch_frac_soft_reuse represents the fractional part of the estimated value of the pitch period of the secondary channel signal
  • soft_reuse_index_high_limit represents the upper limit of the pitch period index value of the secondary channel signal
  • N represents The number of subframes that the secondary channel signal is divided into
  • M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal
  • M is a non-zero real number
  • * represents the multiplication operator
  • + represents the addition operator
  • N represents the number of subframes into which the secondary channel signal is divided, for example, the value of N can be 3, 4, or 5, M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, and M is non
  • M is non
  • a real number of zero, for example, the value of M can be 2 or 3, and the values of N and M depend on the application scenario and are not limited here.
  • the calculation of the pitch period index value of the secondary channel signal in the embodiment of the present application may not be limited to the above formula, for example, calculated in (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M After the result, you can also set the correction factor, which is multiplied by (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M, which can be used as the final output soft_reuse_index.
  • soft_reuse_index (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M, and a correction factor can be added.
  • the specific value of the correction factor is not limited.
  • the final soft_reuse_index can also be calculated.
  • the stereo encoded bitstream generated by the encoding end may be stored in a computer-readable storage medium.
  • the pitch period estimation value of the primary channel signal is used to differentially encode the pitch period of the secondary channel signal, and the pitch period index value of the secondary channel signal can be obtained, and the pitch period of the secondary channel signal The index value is used to indicate the pitch period of the secondary channel signal.
  • the pitch period index value of the secondary channel signal can also be used to generate a stereo coded stream to be sent. After the encoding end generates the stereo encoding stream, the stereo encoding stream can be output, and sent to the decoding end through the audio transmission channel.
  • the decoding end can determine whether to perform differential decoding on the secondary channel signal according to the indication information carried by the stereo encoding bitstream.
  • the pitch period of the signal is differentially decoded.
  • the decoder can also determine whether to perform differential decoding on the pitch period of the secondary channel signal according to the pre-configuration result.
  • step 411 determines whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding code stream, including:
  • the secondary channel pitch period differential coding identifier is the preset first value, it is determined to perform differential decoding on the pitch period of the secondary channel signal.
  • the secondary channel pitch period differential encoding identifier may have multiple values.
  • the secondary channel pitch period differential encoding identifier may be a preset first value or a second value.
  • the value of the secondary channel pitch period differential coding identifier can be 0 or 1, the first value is 1, and the second value is 0.
  • the execution of step 412 is triggered.
  • the secondary channel pitch period differential encoding is identified as Pitch_reuse_flag.
  • the secondary channel pitch period differential encoding identifier Pitch_reuse_flag is obtained; when the pitch period of the secondary channel signal can be differentially decoded, the Pitch_reuse_flag is 1, and the differential decoding method in the embodiment of this application is executed , When the pitch period of the secondary channel signal cannot be differentially decoded, Pitch_reuse_flag is 0, and the independent decoding method is executed.
  • the differential decoding process in step 412 and step 413 is executed only when the Pitch_reuse_flag is all 1 is satisfied.
  • the method provided in the embodiments of the present application further includes:
  • the secondary channel is decoded from the stereo coded stream The pitch period of the signal.
  • the decoding end determines not to perform differential decoding on the pitch period of the secondary channel signal, and does not reuse the estimated value of the pitch period of the primary channel signal as the pitch period of the secondary channel signal.
  • the independent decoding method of the pitch period of the secondary channel can also be used to decode the pitch period of the secondary channel signal, so that the decoding of the pitch period of the secondary channel signal can be realized.
  • the method provided in the embodiments of the present application further includes:
  • the estimated value of the pitch period of the primary channel signal is taken as the secondary The pitch period of the channel signal.
  • a pitch period multiplexing method may also be used in the embodiment of the present application.
  • the decoding end can be multiplexed according to the pitch period of the secondary channel signal Identifies that the pitch period of the primary channel signal is decoded as the pitch period of the secondary channel signal.
  • the stereo decoding method executed by the decoding end may further include the following steps according to the value of the differential encoding identifier of the secondary channel pitch period:
  • the secondary channel pitch period differential encoding identifier is the preset second value
  • the secondary channel signal pitch period multiplexing identifier carried in the stereo encoding code stream is the preset third value
  • the pitch period of the channel signal is decoded differentially, and the pitch period estimation value of the main channel signal is not multiplexed as the pitch period of the secondary channel signal, and the pitch period of the secondary channel signal is decoded from the stereo code stream.
  • the stereo decoding method executed by the decoding end may further include the following steps according to the value of the differential encoding identifier of the secondary channel pitch period:
  • the secondary channel pitch period differential coding identifier is the preset second value
  • the secondary channel signal pitch period multiplexing identifier carried in the stereo encoding code stream is the preset fourth value
  • the secondary channel pitch period differential coding flag is the second value, it is determined not to perform the differential decoding process in step 412 and step 413, and further analyze the secondary channel signal pitch period multiplexing carried in the stereo coding bitstream. Identification, through the secondary channel signal pitch cycle multiplexing identification indicating whether the pitch cycle of the secondary channel signal is multiplexed with the pitch cycle estimation value of the primary channel signal, when the secondary channel signal pitch cycle multiplexing identification value is The fourth value indicates that the pitch period of the secondary channel signal is multiplexed with the estimated value of the pitch period of the primary channel signal.
  • the decoding end can use the secondary channel signal’s pitch period multiplexing identifier to change the pitch period of the primary channel signal It is decoded as the pitch period of the secondary channel signal.
  • the decoding end can determine to execute the differential decoding method or the independent decoding method according to the secondary channel pitch period differential encoding identifier carried in the stereo encoding code stream.
  • the independent decoding method of the pitch period of the secondary channel can be used to determine the pitch period of the secondary channel signal.
  • the pitch period multiplexing method can also be used.
  • the stereo decoding method executed by the decoder can be applied to the stereo decoding scene where the decoding rate of the current frame is lower than the preset rate threshold.
  • the stereo encoding code stream carries the secondary channel signal pitch cycle multiplexing identifier, pass the secondary The multiplexing flag of the key channel signal pitch period indicates whether the pitch period of the sub-channel signal is multiplexed with the estimated value of the pitch period of the main channel signal.
  • the sub-channel signal pitch period multiplexing flag indicates the pitch of the sub-channel signal
  • the decoding end can decode the pitch period of the main channel signal as the pitch period of the secondary channel signal according to the minor channel signal pitch period multiplexing identifier.
  • the decoding end after the encoding end sends the stereo encoding code stream, the decoding end first receives the stereo encoding code stream through the audio transmission channel, and then performs channel decoding according to the stereo encoding code stream. Differential decoding of the pitch period of the current frame can be obtained from the stereo encoding stream to obtain the pitch period index value of the secondary channel signal of the current frame, and the pitch period of the main channel signal of the current frame can also be obtained from the stereo encoding stream estimated value.
  • the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal can be used , Perform differential decoding on the pitch period of the secondary channel signal to achieve accurate secondary channel pitch period decoding and improve the overall stereo decoding quality.
  • step 413 determines the pitch of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal and the pitch period index value of the secondary channel signal. Perform differential decoding periodically, including:
  • the estimated value of the pitch period of the secondary channel signal is calculated according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal.
  • the estimated value of the pitch period of the primary channel signal is used to determine the closed-loop pitch period reference value of the secondary channel signal.
  • the pitch period search range adjustment factor of the secondary channel signal can be used to adjust the pitch period index value of the secondary channel signal to determine the upper limit of the pitch period index value of the secondary channel signal.
  • the upper limit of the pitch period index value of the secondary channel signal indicates the upper limit that the value of the pitch period index value of the secondary channel signal cannot exceed.
  • the pitch period index value of the secondary channel signal can be used to determine the pitch period index value of the secondary channel signal.
  • the decoding end determines the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal, and the upper limit of the pitch period index value of the secondary channel signal, it is based on the closed-loop pitch period of the secondary channel signal.
  • the period reference value, the pitch period index value of the secondary channel signal and the upper limit of the pitch period index value of the secondary channel signal are differentially decoded, and the estimated value of the pitch period of the secondary channel signal is output.
  • the secondary channel signal's closed-loop pitch period reference value, the secondary channel signal's pitch period index value, and the secondary channel signal's pitch period index value upper limit are calculated based on The estimated value of the pitch period of the desired channel signal, including:
  • the estimated value of the pitch period T0_pitch of the secondary channel signal is calculated as follows:
  • T0_pitch f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
  • f_pitch_prim represents the closed-loop pitch period reference value of the secondary channel signal
  • soft_reuse_index represents the pitch period index value of the secondary channel signal
  • N represents the number of sub-frames divided into the secondary channel signal
  • M represents the secondary channel signal.
  • the adjustment factor of the upper limit of the pitch period index value of the signal, M is a non-zero real number
  • / represents the division operator
  • + represents the addition operator
  • N represents the number of sub-frames into which the secondary channel signal is divided, for example, the value of N can be 3, 4, or 5, and M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, such as M
  • M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, such as M
  • the value can be 2 or 3.
  • the value of N and M depends on the application scenario and is not limited here.
  • the calculation of the pitch period estimation value of the secondary channel signal in the embodiment of the present application may not be limited to the above formula.
  • a correction factor may be set, This correction factor is multiplied by f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N, which can be used as the final output T0_pitch.
  • f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N a correction factor can be added.
  • the specific value of the correction factor is not limited, and the final T0_pitch can also be calculated.
  • the integer part of the pitch period estimation value of the secondary channel signal can be further calculated according to the pitch period estimation value T0_pitch of the secondary channel signal.
  • INT (T0_pitch) represents the rounding operation of T0_pitch
  • T0 is the integer part of the pitch period of the decoded secondary channel
  • T0_frac is the fractional part of the pitch period of the decoded secondary channel.
  • the pitch period estimation value of the primary channel signal is used in the embodiment of the present application to differentially encode the pitch period of the secondary channel signal, a small amount of bit resources can be allocated to the secondary channel signal.
  • the pitch period of the stereo signal can be differentially coded. By performing differential coding on the pitch period of the secondary channel signal, the spatial sense and sound image stability of the stereo signal can be improved.
  • smaller bit resources are used to perform differential coding of the pitch period of the secondary channel signal. Therefore, the saved bit resources can be used for other stereo coding parameters, thereby improving the performance of the secondary channel. The coding efficiency ultimately improves the overall stereo coding quality.
  • the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal can be used to perform the differential decoding on the secondary channel signal.
  • the pitch period of the signal is differentially decoded, so the estimated value of the pitch period of the secondary channel signal is obtained.
  • the estimated value of the pitch period of the secondary channel signal can be used to decode the stereo coded stream, so the spatial perception of the stereo signal can be improved. Harmony and image stability.
  • the pitch period encoding scheme for the secondary channel signal determines whether the pitch period of the secondary channel signal can be differentially encoded during the secondary channel signal pitch period encoding process.
  • a differential coding method oriented to the pitch period of the secondary channel signal is used to encode the pitch period of the secondary channel signal, and a small amount of bit resources are used for differential encoding, and the saved bits are allocated to other channels.
  • Stereo coding parameters to achieve accurate secondary channel signal pitch period coding, improve the overall stereo coding quality.
  • the stereo signal may be an original stereo signal, a stereo signal composed of two signals contained in a multi-channel signal, or a stereo signal composed of multiple signals contained in a multi-channel signal.
  • Stereo encoding can constitute an independent stereo encoder, and can also be used in the core encoding part of a multi-channel encoder. It is designed to perform stereo signals on two-channel signals composed of multiple signals contained in multi-channel signals. coding.
  • the embodiment of the present application uses an encoding rate of a stereo signal of 24.4 kbps as an example for description. It is understandable that the embodiment of the present application is not limited to implementation at a coding rate of 24.4 kbps, and can also be applied to lower-rate stereo encoding.
  • the embodiment of this application proposes a method for determining pitch period coding in stereo coding.
  • the stereo coding can be time-domain stereo coding, frequency-domain stereo coding, or time-frequency stereo coding, which is not done in this embodiment. limited.
  • frequency domain stereo coding describes the following describes the coding and decoding process of stereo coding, focusing on the coding process of the pitch period in the secondary channel signal coding in the subsequent steps. specifically:
  • S01 Perform time domain preprocessing on the left and right channel time domain signals.
  • the stereo signal of the current frame includes the left channel time domain signal of the current frame and the right channel time domain signal of the current frame.
  • the left channel time domain signal of the current frame is denoted as x L (n)
  • the left and right channel time domain signals of the current frame are short for the left channel time domain signals of the current frame and the right channel time domain signals of the current frame.
  • Performing time domain preprocessing on the left and right channel time domain signals of the current frame may specifically include: performing high-pass filtering on the left and right channel time domain signals of the current frame respectively to obtain the left and right channel time domain preprocessed in the current frame Signal, the left time domain signal preprocessed in the current frame is denoted x L_HP (n), and the right time domain signal preprocessed in the current frame is denoted x R_HP (n).
  • the left and right channel time domain signals preprocessed in the current frame are the abbreviations for the left channel time domain signals preprocessed in the current frame and the right channel time domain signals preprocessed in the current frame.
  • the high-pass filtering process can be an infinite impulse response (IIR) filter with a cut-off frequency of 20 Hz, or other types of filters.
  • IIR infinite impulse response
  • the transfer function of a high-pass filter with a sampling rate of 16KHz and a cut-off frequency of 20Hz is:
  • b 0 0.994461788958195
  • b 1 -1.988923577916390
  • b 2 0.994461788958195
  • a 1 1.988892905899653
  • a 2 -0.988954249933127
  • z is the transformation factor in the Z transform domain.
  • the corresponding time domain filter is:
  • x L_HP (n) b 0 *x L (n)+b 1 *x L (n-1)+b 2 *x L (n-2)-a 1 *x L_HP (n-1)-a 2 *x L_HP (n-2),
  • the time-domain preprocessing of the left and right channel time-domain signals of the current frame is not a necessary step. If there is no time domain preprocessing step, the left and right channel signals used for time delay estimation are the left and right channel signals in the original stereo signal.
  • the left and right channel signals in the original stereo signal refer to the collected pulse code modulation (PCM) signal after analog-to-digital conversion.
  • the sampling rate of the signal can include 8KHz, 16KHz, 32KHz, 44.1KHz and 48KHz.
  • the preprocessing may also include other processing, such as pre-emphasis processing, which is not limited in this embodiment of the application.
  • S02 Perform time domain analysis according to the preprocessed left and right channel signals.
  • time-domain analysis may include transient detection and the like.
  • the transient detection may be to perform energy detection on the left and right channel time-domain signals after the current frame preprocessing, to detect whether the current frame has a sudden energy change. For example, calculation of the current time domain signal energy E cur_L left channel frame after pretreatment; left channel time domain according to the energy E pre_L left channel time domain signal before and after pretreatment and a pretreatment of the current frame The absolute value of the difference between the signal energy E cur_L performs transient detection to obtain the transient detection result of the left channel time domain signal after the current frame preprocessing. Similarly, the same method can also be used to perform transient detection on the preprocessed right channel time domain signal of the current frame.
  • Time domain analysis can include other time domain analysis in addition to transient detection, for example, it can include time domain inter-channel time difference (ITD) determination, time domain delay alignment processing, and pre-band extension. Processing etc.
  • ITD time domain inter-channel time difference
  • the preprocessed left channel signal may be subjected to discrete Fourier transform to obtain the left channel frequency domain signal; the preprocessed right channel signal is subjected to discrete Fourier transform to obtain the right sound Channel frequency domain signal.
  • discrete Fourier transform to obtain the left channel frequency domain signal
  • the preprocessed right channel signal is subjected to discrete Fourier transform to obtain the right sound Channel frequency domain signal.
  • two consecutive discrete Fourier transforms are generally processed by the method of overlap and addition, and sometimes the input signal of the discrete Fourier transform is filled with zeros.
  • Each subframe performs a discrete Fourier transform.
  • ITD parameters There are many methods for determining ITD parameters, which may be performed only in the frequency domain, may only be performed in the time domain, or may be determined by a time-frequency combination method, which is not limited in the embodiment of the present application.
  • the left and right channel correlation coefficients can be used to extract the ITD parameters.
  • the ITD parameter value is the opposite of the index value corresponding to max(Cn(i)), where the codec specifies the index table corresponding to the max(Cn(i)) value by default; otherwise the ITD parameter value is max( Cp(i)) corresponds to the index value.
  • ITD parameters can also be determined in the frequency domain based on the left and right channel frequency domain signals, for example: discrete Fourier transform (DFT), fast Fourier transform (Fast Fourier Transformation, FFT), modified discrete cosine transform ( Modified Discrete Cosine Transform (MDCT) and other time-frequency transformation technologies, transform time-domain signals into frequency-domain signals.
  • DFT discrete Fourier transform
  • FFT fast Fourier transform
  • MDCT Modified Discrete Cosine Transform
  • MDCT Modified Discrete Cosine Transform
  • XCORR i (k) L i (k)*R * i (k).
  • the amplitude value can be calculated in the search range -T max ⁇ j ⁇ T max :
  • the ITD parameter value is That is, the index value corresponding to the value with the largest amplitude value.
  • the ITD parameters need to be subjected to residual coding and entropy coding in the encoder, and then written into the stereo coding stream.
  • the time shift adjustment can also be performed once for the entire frame. Among them, after the frame is divided, the time shift adjustment is performed according to each subframe, and if the frame is not divided, the time shift adjustment is performed according to each frame.
  • frequency domain stereo parameters can include but are not limited to: inter-channel phase difference (IPD) parameters, inter-channel level difference (also known as inter-channel amplitude difference) (inter-channel level difference, ILD) ) Parameters, sub-band edge gain, etc., which are not limited in the embodiment of this application.
  • IPD inter-channel phase difference
  • ILD inter-channel level difference
  • Parameters sub-band edge gain, etc., which are not limited in the embodiment of this application.
  • the primary channel signal and secondary channel signal of the current frame can be calculated according to the left channel frequency domain signal of the current frame and the right channel frequency domain signal of the current frame; the corresponding low frequency band can be preset according to the current frame
  • the left channel frequency domain signal of each subband and the right channel frequency domain signal of each subband corresponding to the preset low frequency band of the current frame are calculated, and the main channel signal and the main channel signal of each subband corresponding to the preset low frequency band of the current frame are calculated.
  • Secondary channel signal also can calculate the primary channel signal and secondary sound of each subframe of the current frame based on the left channel frequency domain signal of each subframe of the current frame and the right channel frequency domain signal of each subframe of the current frame Channel signal; can also preset the left channel frequency domain signal of each subband corresponding to the low frequency band in each subframe of the current frame and preset the right channel frequency domain signal of each subband corresponding to the low frequency band in each subframe of the current frame Signal, calculate the primary channel signal and the secondary channel signal of each subband corresponding to the preset low frequency band in each subframe of the current frame.
  • the main channel signal can be obtained by adding the two signals
  • the secondary channel signal can be obtained by subtracting the two signals.
  • the main channel signal and the secondary channel signal of each sub-frame are converted to the time domain through the inverse transform of the discrete Fourier transform, and the sub-frame is performed The superimposed and added processing is performed to obtain the time domain main channel signal and the secondary channel signal of the current frame.
  • step S07 the process of obtaining the primary channel signal and the secondary channel signal in step S07 is called down-mixing processing.
  • step S08 the primary channel signal and the secondary channel signal are processed.
  • the main channel signal can be encoded according to the parameter information obtained in the encoding of the primary channel signal and the secondary channel signal of the previous frame and the total number of bits of the primary channel signal encoding and the secondary channel signal encoding. Perform bit allocation with secondary channel signal encoding. Then, the main channel signal and the secondary channel signal are coded separately according to the result of bit allocation.
  • the encoding of the primary channel signal and the encoding of the secondary channel signal can use any mono audio encoding technology.
  • the ACELP encoding method is used to encode the primary channel signal and the secondary channel signal obtained by the downmix processing.
  • ACELP coding methods usually include: determining linear prediction coefficients (linear prediction coefficient, LPC) and converting them into line spectral frequency parameters (line spectral frequency, LSF) for quantization coding; searching for adaptive code excitation to determine pitch period and adaptive codebook Gain, and respectively quantize and encode the pitch period and adaptive codebook gain; search for the algebraic code excitation to determine the pulse index and gain of the algebraic code excitation, and perform quantization and coding for the pulse index and gain of the algebraic code excitation respectively.
  • LPC linear prediction coefficients
  • LSF line spectral frequency
  • FIG. 6 a flow chart of encoding the pitch period parameter of the primary channel signal and the pitch period parameter of the secondary channel signal provided by this embodiment of the application.
  • the process shown in FIG. 6 includes the following steps S09 to S12.
  • the process of encoding the pitch period parameter of the primary channel signal and the pitch period parameter of the secondary channel signal is:
  • the pitch period estimation adopts the combination of open-loop pitch analysis and closed-loop pitch search, which improves the accuracy of pitch period estimation.
  • Many methods can be used to estimate the pitch period of speech, such as autocorrelation function and short-term average amplitude difference.
  • the pitch period estimation algorithm is based on the autocorrelation function.
  • the autocorrelation function has a peak at an integer multiple of the pitch period. This feature can be used to estimate the pitch period.
  • pitch period detection uses a fractional delay with 1/3 as the sampling resolution.
  • pitch period estimation includes two steps: open-loop pitch analysis and closed-loop pitch search.
  • the open-loop pitch analysis is used to roughly estimate the integer delay of a frame of speech to obtain a candidate integer delay.
  • the closed-loop pitch search estimates the pitch delay in its vicinity, and the closed-loop pitch search is performed once every subframe.
  • the open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and optimal open-loop integer delay are calculated respectively.
  • the estimated value of the pitch period of the main channel signal obtained through the above steps, in addition to being used as the pitch period encoding parameter of the main channel signal, will also be used as the pitch period reference value of the secondary channel signal.
  • the secondary channel pitch period differential coding decision is made according to the estimated value of the primary channel's pitch period and the estimated value of the secondary channel signal's open-loop pitch period.
  • the decision conditions are:
  • DIFF represents the difference between the estimated value of the pitch period of the primary channel signal and the estimated value of the open-loop pitch period of the secondary channel signal
  • represents Take the absolute value of the difference between ⁇ (pitch[0]) and ⁇ (pitch[1])
  • ⁇ pitch[0] represents the estimated value of the pitch period of the main channel signal
  • ⁇ pitch[1] represents the secondary sound Estimated value of the open-loop pitch period of the channel signal.
  • the Pitch_reuse_flag is used to indicate the Pitch_reuse_flag for the differential coding flag of the pitch period of the secondary channel.
  • the pitch period multiplexing method of the secondary channel signal can also be used, that is, the pitch period of the secondary channel signal is not encoded at the encoding end.
  • the decoding end decodes the pitch period of the primary channel signal as the pitch period of the secondary channel signal.
  • the specific steps of pitch period differential coding of the secondary channel signal include:
  • S121 Perform a closed-loop pitch period search of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal, and determine the estimated value of the pitch period of the secondary channel signal.
  • S12101 Determine a reference value of the closed-loop pitch period of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal.
  • the coding rate of 24.4 kbps is taken as an example.
  • the pitch period coding is performed in subframes, the main channel signal is divided into 5 subframes, and the secondary channel signal is divided into 4 subframes.
  • One method is to directly use the pitch period of the main channel signal as the reference value of the pitch period of the secondary channel signal, that is, from the main channel signal.
  • Four values of the pitch period in the 5 subframes of the channel signal are selected as reference values for the pitch period of the 4 subframes of the secondary channel signal.
  • Another method is to use an interpolation method to map the pitch period in the 5 subframes of the primary channel signal to the pitch period reference value of the 4 subframes of the secondary channel signal.
  • S12102 Perform a closed-loop pitch period search of the secondary channel signal according to the reference value of the pitch period of the secondary channel signal to determine the pitch period of the secondary channel signal. Specifically: use the closed-loop pitch period reference value of the secondary channel signal as the starting point for the closed-loop pitch period search of the secondary channel signal, use integer precision and down-sampling fraction precision to perform the closed-loop pitch period search, and normalize by calculation interpolation The correlation obtains the estimated value of the pitch period of the secondary channel signal.
  • one of the methods is to use 2 bits for the pitch period coding of the secondary channel signal, specifically:
  • loc_T0 Using loc_T0 as the starting point for searching, perform an integer precision search on the pitch period of the secondary channel signal within the range of [loc_T0-1, loc_T0+1], and each search point uses loc_frac_prim as the initial value, at [loc_frac_prim+2,loc_frac_prim+ 3] or [loc_frac_prim, loc_frac_prim-3] or [loc_frac_prim-2, loc_frac_prim+1], perform a fractional precision search on the pitch period of the secondary channel signal, and calculate the interpolated normalized correlation corresponding to each search point, Calculate the similarity corresponding to multiple search points in one frame. When the interpolated normalized correlation achieves the maximum value, the search point is the estimated value of the optimal secondary channel signal pitch period.
  • the integer part is pitch_soft_reuse
  • the score Part is pitch_frac_soft_reuse.
  • another method is to use 3bits to 5bits to encode the pitch period encoding of the secondary channel signal, specifically:
  • the search radius half_range is 1, 2, and 4 respectively.
  • loc_T0 as the starting point for searching, perform an integer precision search for the pitch period of the secondary channel signal within the range of [loc_T0-half_range, loc_T0+half_range], and then use loc_frac_prim as the initial value for each search point.
  • loc_frac_prim as the initial value for each search point.
  • loc_frac_prim the interpolation normalized correlation corresponding to each search point is calculated.
  • the search The point is the estimated value of the pitch period of the optimal secondary channel signal, where the integer part is pitch_soft_reuse and the fractional part is pitch_frac_soft_reuse.
  • S122 Perform differential encoding using the pitch period of the primary channel signal and the pitch period of the secondary channel signal. It can include the following processes:
  • the upper limit of the sub-channel signal pitch period index is calculated by the following formula:
  • Z is the adjustment factor of the search range of the pitch period of the secondary channel.
  • Z is the adjustment factor of the search range of the pitch period of the secondary channel.
  • Z 3,4,5.
  • the sub-channel signal pitch period index represents the result of performing differential encoding on the difference between the reference value of the sub-channel signal pitch period obtained in the foregoing steps and the optimal sub-channel signal pitch period estimated value.
  • the sub-channel signal pitch period index value soft_reuse_index is calculated by the following formula:
  • soft_reuse_index (4*pitch_soft_reuse+pitch_frac_soft_reuse)-(4*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/2.
  • S1223 Perform differential encoding on the pitch period index of the secondary channel signal.
  • the embodiment of the present application adopts the pitch period code method of the secondary channel signal, each coded frame is divided into 4 subframes, and the pitch period of each subframe is differentially coded.
  • 22 bits or 18 bits can be saved and allocated to other coding parameters for quantization coding.
  • the saved bit overhead can be allocated to a fixed codebook (fixed codebook).
  • the effect of saving the coding overhead of the secondary channel signal in the embodiment of the present application will be illustrated.
  • the number of pitch period coding bits allocated to the 4 subframes are 10 and 6 respectively. ,9,6, which means that each frame needs 31bits to encode.
  • each subframe only needs 3 bits for differential encoding, and 1 bit is needed to indicate whether to differential the pitch period of the secondary channel signal.
  • FIG. 8 it is a comparison diagram of the number of bits allocated to the fixed code table after independent encoding and differential encoding.
  • the solid line is the number of bits allocated to the fixed code table after independent encoding
  • the dotted line is the number of bits allocated to the fixed code table after differential encoding.
  • the number of bits in the fixed code table It can be seen from FIG. 8 that a large amount of bit resources saved by using the pitch period differential coding for the secondary channel signal are allocated to the quantization coding of the fixed code table, so that the coding quality of the secondary channel signal is improved.
  • the secondary channel signal pitch period multiplexing flag can also be used to indicate the pitch period of the secondary channel signal
  • the decoding end can decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
  • the Pitch_reuse_flag is used to indicate the Pitch_reuse_flag for the differential coding flag of the pitch period of the secondary channel.
  • the pitch period coding is performed in subframes, the main channel is divided into 5 subframes, and the secondary channel is divided into 4 subframes.
  • One method is to directly use the pitch period of the main channel as the reference value of the pitch period of the secondary channel, that is, from the main channel Four values of the pitch period in the 5 subframes are selected as reference values for the pitch period of the 4 subframes of the secondary channel.
  • Another method is to use an interpolation method to map the pitch period in the 5 sub-frames of the main channel to the pitch period reference value of the 4 sub-frames in the secondary channel.
  • S1402 Calculate the reference value of the closed-loop pitch period of the secondary channel.
  • the reference value f_pitch_prim of the closed-loop pitch period of the secondary channel is calculated using the following formula:
  • the upper limit of the sub-channel pitch period index is calculated by the following formula:
  • Z is the adjustment factor of the search range of the pitch period of the secondary channel.
  • Z can be 3, 4, or 5.
  • T0_pitch f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/2.0)/4.0.
  • T0_frac (T0_pitch-T0)*4.0.
  • INT (T0_pitch) represents the rounding operation of T0_pitch
  • T0 is the integer part of the pitch period of the decoded secondary channel
  • T0_frac is the fractional part of the pitch period of the decoded secondary channel.
  • FIG. 9 a schematic diagram of a time-domain stereo coding method provided by an embodiment of this application, specifically:
  • S21 Perform time domain preprocessing on the stereo time domain signal to obtain preprocessed stereo left and right channel signals.
  • the stereo signal of the current frame includes the left channel time domain signal of the current frame and the right channel time domain signal of the current frame.
  • the left channel time domain signal of the current frame is denoted as x L (n)
  • time domain preprocessing on the left and right channel time domain signals of the current frame. Specifically, it may include high-pass filtering processing on the left and right channel time domain signals of the current frame to obtain the left and right channels preprocessed in the current frame.
  • the left channel time domain signal after the current frame preprocessing is denoted as
  • the left and right channel signals used for time delay estimation are the left and right channel signals in the original stereo signal.
  • the left and right channel signals in the original stereo signal refer to the collected PCM signals after A/D conversion.
  • the sampling rate of the signal may include 8KHz, 16KHz, 32KHz, 44.1KHz and 48KHz.
  • the pre-processing may also include other processing, such as pre-emphasis processing, which is not limited in the embodiment of the present application.
  • S22 Perform time delay estimation according to the preprocessed left and right channel time domain signals of the current frame to obtain the estimated inter-channel delay difference of the current frame.
  • the cross-correlation function between the left and right channels can be calculated based on the time-domain signals of the left and right channels after the current frame is preprocessed. Then, the maximum value of the cross-correlation function is searched as the estimated inter-channel delay difference of the current frame.
  • T max corresponds to the maximum value of the inter-channel delay difference at the current sampling rate
  • T min corresponds to the minimum value of the inter-channel delay difference at the current sampling rate.
  • T max and T min are preset real numbers, and T max >T min .
  • T max is equal to 40
  • T min is equal to -40
  • the maximum value of the correlation coefficient c(i) between the left and right channels is searched in the range of T min ⁇ i ⁇ T max to obtain the corresponding value
  • the index value, as the estimated inter-channel delay difference of the current frame, is recorded as cur_itd.
  • time delay estimation in the embodiments of the present application. For example, it may also be based on the preprocessed left and right channel time domain signals of the current frame or based on the left and right channel time domain signals of the current frame.
  • the domain signal calculates the cross-correlation function between the left and right channels.
  • It may also include, performing inter-frame smoothing processing on the inter-channel delay difference estimated based on the previous M frames (M is an integer greater than or equal to 1) and the inter-channel delay difference estimated in the current frame, using the smoothed inter-channel delay difference
  • the delay difference is the final estimated inter-channel delay difference of the current frame.
  • the channel delay difference estimated in the current frame is searched for the maximum value of the cross-correlation coefficient c(i) between the left and right channels within the range of T min ⁇ i ⁇ T max to obtain the index value corresponding to the maximum value.
  • S23 Perform time delay alignment processing on the stereo left and right channel signals according to the estimated time delay difference between the channels in the current frame to obtain the time delay aligned stereo signal.
  • the embodiments of the present application there are many methods for performing delay alignment processing on stereo left and right channel signals. For example, according to the estimated inter-channel delay difference of the current frame and the inter-channel delay difference of the previous frame, the stereo One or two of the left and right channel signals are compressed or stretched, so that there is no delay difference between the two channels in the time-delay aligned stereo signal obtained after processing.
  • the embodiment of the present application is not limited to the delay alignment processing method described above.
  • the time domain signal of the left channel after the current frame delay is aligned is denoted as x′ L (n)
  • x′ R (n) The time domain signal of the right channel after the current frame time delay is aligned.
  • quantizing the inter-channel delay difference for example, quantizing the inter-channel delay difference estimated in the current frame to obtain a quantization index, and then encoding the quantization index.
  • the quantization index is coded and written into the code stream.
  • the method of calculating the channel combination scale factor in the embodiment of the present application. First, calculate the frame energy of the left and right channels according to the time domain signals of the left and right channels after the current frame delay is aligned.
  • the frame energy rms_L of the left channel of the current frame satisfies:
  • the frame energy rms_R of the right channel of the current frame satisfies:
  • x′ L (n) is the time domain signal of the left channel after the current frame delay is aligned
  • x′ R (n) is the time domain signal of the right channel after the current frame time delay is aligned.
  • the channel combination scale factor of the current frame is calculated.
  • the calculated channel combination scale factor of the current frame is quantized to obtain the quantization index ratio_idx corresponding to the scale factor and the quantized channel combination scale factor ratio qua of the current frame:
  • ratio qua ratio_tabl[ratio_idx]
  • ratio_tabl is a scalar quantized codebook.
  • the quantization coding can use any of the scalar quantization methods in the embodiments of the present application, such as uniform scalar quantization, or non-uniform scalar quantization, and the number of coding bits can be 5 bits. The specific method is not described here.
  • the embodiments of the present application are not limited to the above-mentioned channel combination scale factor calculation and quantization coding methods.
  • S26 Perform time-domain down-mixing processing on the time-delay aligned stereo signal according to the channel combination scale factor to obtain a primary channel signal and a secondary channel signal.
  • any time-domain downmixing process in the embodiments of the present application can be used for implementation. But it should be noted that it is necessary to select the corresponding time-domain down-mixing processing method according to the calculation method of the channel combination scale factor, and perform the time-domain down-mixing processing on the stereo signal after the time delay is aligned to obtain the main channel signal and the secondary channel signal. Channel signal.
  • the above method of calculating the channel combination scale factor in step 5 is not used, and the corresponding time-domain down-mixing process can be: performing the time-domain down-mixing process according to the channel combination scale factor ratio, the first channel combination
  • the main channel signal Y(n) and the secondary channel signal X(n) obtained after the time-domain downmix processing corresponding to the solution satisfy:
  • the embodiments of the present application are not limited to the time-domain downmixing processing method described above.
  • step S27 For the content included in step S27, please refer to the description of step S10 to step S12 in the foregoing embodiment for details, which will not be repeated here.
  • a stereo encoding device 1000 provided by an embodiment of the present application may include: a downmixing module 1001, a determining module 1002, and a differential encoding module 1003, where:
  • the downmix module 1001 is used to perform downmix processing on the left channel signal of the current frame and the right channel signal of the current frame to obtain the main channel signal of the current frame and the secondary sound of the current frame Road signal
  • the determining module 1002 is configured to determine whether to perform differential encoding on the pitch period of the secondary channel signal
  • the differential encoding module 1003 is configured to, when determining to perform differential encoding on the pitch period of the secondary channel signal, use the pitch period estimate value of the primary channel signal to differentiate the pitch period of the secondary channel signal Encoding to obtain the pitch period index value of the secondary channel signal, and the pitch period index value of the secondary channel signal is used to generate a stereo coded stream to be transmitted.
  • the determining module includes:
  • the main channel encoding module is configured to encode the main channel signal of the current frame to obtain the estimated value of the pitch period of the main channel signal;
  • An open-loop analysis module configured to perform an open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an estimated value of the open-loop pitch period of the secondary channel signal;
  • Threshold judgment module for judging whether the difference between the estimated value of the pitch period of the primary channel signal and the estimated value of the open-loop pitch period of the secondary channel signal exceeds a preset secondary channel pitch period difference Encoding threshold, when the difference exceeds the secondary channel pitch period differential encoding threshold, it is determined to perform differential encoding, and when the difference does not exceed the secondary channel pitch period differential encoding threshold, it is determined not to perform differential encoding .
  • the stereo encoding device further includes: an identification configuration module, configured to perform differential encoding on the pitch period of the secondary channel signal, and perform the differential encoding of the secondary channel signal in the current frame.
  • the primary channel pitch period differential encoding identifier is configured as a preset first value
  • the stereo encoding bitstream carries the secondary channel pitch period differential encoding identifier
  • the first value is used to indicate that the secondary channel is The pitch period of the channel signal is differentially coded.
  • the stereo encoding device further includes: an independent encoding module, wherein:
  • the independent encoding module is used for when it is determined not to perform differential encoding on the pitch period of the secondary channel signal and not multiplex the pitch period estimate of the primary channel signal as the pitch period of the secondary channel signal At this time, the pitch period of the secondary channel signal and the pitch period of the main channel signal are respectively encoded.
  • the identification configuration module is further configured to perform differential encoding of the pitch period of the secondary channel signal when it is determined that the pitch period of the secondary channel signal is not to be differentially encoded
  • the identifier is configured as a preset second value
  • the stereo encoding bitstream carries the secondary channel pitch period differential encoding identifier
  • the second value is used to indicate that the pitch period of the secondary channel signal is not to be performed.
  • Differential coding when it is determined not to multiplex the estimated value of the pitch period of the primary channel signal as the pitch period of the secondary channel signal, configure the secondary channel signal pitch period multiplexing identifier as a preset third Value, the stereo encoding bitstream carries the secondary channel signal pitch period multiplexing identifier, and the third value is used to indicate that the pitch period estimation value of the primary channel signal is not multiplexed as the secondary channel signal The pitch period of the channel signal;
  • the independent encoding module is configured to separately encode the pitch period of the secondary channel signal and the pitch period of the main channel signal.
  • the identification configuration module is configured to, when it is determined not to perform differential encoding on the pitch period of the secondary channel signal, and multiplex the pitch period estimation value of the primary channel signal as the
  • the secondary channel signal pitch period multiplexing flag is configured as a preset fourth value, and the secondary channel signal pitch period complex is carried in the stereo encoding bitstream.
  • the fourth value is used to indicate that the estimated value of the pitch period of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
  • the identification configuration module is configured to, when it is determined that the pitch period of the secondary channel signal is not to be differentially encoded, mark the secondary channel pitch period differential encoding Configured as a preset second value, the stereo encoding bitstream carries the secondary channel pitch period differential encoding identifier, and the second value is used to indicate that the pitch period of the secondary channel signal is not to be differentiated Encoding; when it is determined that the estimated value of the pitch period of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, the secondary channel signal pitch period multiplexing flag is configured as a preset fourth value, And carry the secondary channel signal pitch period multiplexing identifier in the stereo encoding bitstream, and the fourth value is used to indicate that the estimated value of the pitch period of the primary channel signal is multiplexed as the secondary sound The pitch period of the channel signal.
  • the differential encoding module includes:
  • a closed-loop pitch period search module configured to search for the closed-loop pitch period of the secondary channel according to the estimated value of the pitch period of the primary channel signal to obtain the estimated value of the pitch period of the secondary channel signal;
  • An index value upper limit determination module configured to determine the upper limit of the pitch period index value of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
  • the index value calculation module is configured to calculate the secondary channel signal's pitch period estimate value, the secondary channel signal's pitch period estimate value, and the secondary channel signal's pitch period index upper limit value. The pitch period index value of the channel signal.
  • the closed-loop pitch period search module is configured to determine the number of sub-frames divided into sub-frames based on the estimated value of the pitch period of the primary channel signal and the secondary channel signal of the current frame, Determine the closed-loop pitch period reference value of the secondary channel signal; use the closed-loop pitch period reference value of the secondary channel signal as the starting point of the closed-loop pitch period search of the secondary channel signal, using integer precision and The score precision performs a closed-loop pitch period search to obtain the estimated value of the pitch period of the secondary channel signal.
  • the closed-loop pitch period search module is configured to determine the closed-loop pitch period integer part loc_T0 of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal, and the The closed-loop pitch period fraction loc_frac_prim of the secondary channel signal; the closed-loop pitch period reference value f_pitch_prim of the secondary channel signal is calculated as follows:
  • f_pitch_prim loc_T0+loc_frac_prim/N;
  • the N represents the number of subframes in which the secondary channel signal is divided.
  • the index value upper limit determination module is configured to calculate the pitch period index value upper limit soft_reuse_index_high_limit of the secondary channel signal in the following manner;
  • soft_reuse_index_high_limit 0.5+2 Z ;
  • the Z is the pitch period search range adjustment factor of the secondary channel signal, and the value of Z is: 3, or 4, or 5.
  • the index value calculation module is configured to determine the closed-loop pitch period integer part loc_T0 of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal, and the secondary channel signal
  • the closed-loop pitch period fraction loc_frac_prim of the secondary channel signal; the pitch period index value soft_reuse_index of the secondary channel signal is calculated in the following way:
  • soft_reuse_index (N*pitch_soft_reuse+pitch_frac_soft_reuse)-(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
  • the pitch_soft_reuse represents the integer part of the pitch period estimate of the secondary channel signal
  • the pitch_frac_soft_reuse represents the fractional part of the pitch period estimate of the secondary channel signal
  • the soft_reuse_index_high_limit represents the secondary channel signal.
  • the upper limit of the pitch period index value of the channel signal where N represents the number of subframes into which the secondary channel signal is divided, and the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number
  • the * represents a multiplication operator
  • the + represents an addition operator
  • the stereo encoding device is applied to a stereo encoding scenario where the encoding rate of the current frame is lower than a preset rate threshold;
  • the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4 kbps, or 24.4 kbps.
  • a stereo decoding device 1100 provided by an embodiment of the present application may include: a determination module 1101, a value acquisition module 1102, and a differential decoding module 1103, where:
  • the determining module 1101 is configured to determine whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo encoding code stream;
  • the value obtaining module 1102 is used to obtain the estimated value of the pitch period of the main channel signal of the current frame and the current frame from the stereo code stream when it is determined to perform differential decoding on the pitch period of the secondary channel signal.
  • the differential decoding module 1103 is configured to perform differential decoding on the pitch period of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal and the pitch period index value of the secondary channel signal to obtain The estimated value of the pitch period of the secondary channel signal, and the estimated value of the pitch period of the secondary channel signal is used to decode the stereo coded stream.
  • the determining module is configured to obtain a secondary channel pitch period differential coding identifier from the current frame; when the secondary channel pitch period differential coding identifier is a preset first When it is one value, it is determined to perform differential decoding on the pitch period of the secondary channel signal.
  • the stereo decoding device further includes: an independent decoding module, wherein:
  • An independent decoding module used when it is determined that the pitch period of the secondary channel signal is not to be differentially decoded, and the pitch period estimation value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal , Decoding the pitch period of the secondary channel signal from the stereo encoding bitstream.
  • an independent decoding module is used for when the secondary channel pitch period differential encoding identifier is a preset second value, and the secondary channel signal pitch period multiplexing identifier carried in the stereo encoding code stream is When the preset third value is used, it is determined not to perform differential decoding on the pitch period of the secondary channel signal, and not to multiplex the estimated value of the pitch period of the primary channel signal as the pitch period of the secondary channel signal , Decoding the pitch period of the secondary channel signal from the stereo encoding bitstream.
  • the stereo decoding device further includes: a pitch period multiplexing module, wherein:
  • the pitch period multiplexing module is used for when it is determined not to perform differential decoding on the pitch period of the secondary channel signal and multiplex the pitch period estimate of the primary channel signal as the pitch of the secondary channel signal In the case of period, the estimated value of the pitch period of the primary channel signal is used as the pitch period of the secondary channel signal.
  • the pitch period multiplexing module is configured to: when the sub-channel pitch period differential coding identifier is a preset second value, and the pitch period of the sub-channel signal carried in the stereo code stream When the multiplexing flag is the preset fourth value, it is determined not to perform differential decoding on the pitch period of the secondary channel signal, and the estimated value of the pitch period of the primary channel signal is used as the pitch of the secondary channel signal cycle.
  • the differential decoding module includes:
  • the reference value determining sub-module is configured to determine the closed-loop pitch of the secondary channel signal according to the estimated value of the pitch period of the primary channel signal and the number of subframes into which the secondary channel signal of the current frame is divided Period reference value;
  • An index value upper limit determination submodule configured to determine the upper limit of the pitch period index value of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
  • the estimated value calculation sub-module is configured to calculate the upper limit of the pitch period index value of the secondary channel signal based on the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel The estimated value of the pitch period of the secondary channel signal.
  • the estimated value calculation submodule is configured to calculate the pitch period estimated value T0_pitch of the secondary channel signal in the following manner:
  • T0_pitch f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
  • the f_pitch_prim represents the closed-loop pitch period reference value of the secondary channel signal
  • the soft_reuse_index represents the pitch period index value of the secondary channel signal
  • the N represents that the secondary channel signal is divided
  • the M represents the adjustment factor of the upper limit of the pitch period index value of the secondary channel signal
  • M is a non-zero real number
  • the / represents the division operator
  • the + represents the addition operation
  • the pitch period estimation value of the primary channel signal is used in the embodiment of the present application to differentially encode the pitch period of the secondary channel signal, a small amount of bit resources can be allocated to the secondary channel signal.
  • the pitch period of the stereo signal can be differentially coded.
  • the spatial sense and sound image stability of the stereo signal can be improved.
  • smaller bit resources are used to perform differential coding of the pitch period of the secondary channel signal. Therefore, the saved bit resources can be used for other stereo coding parameters, thereby improving the performance of the secondary channel. The coding efficiency ultimately improves the overall stereo coding quality.
  • the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal can be used to perform the differential decoding on the secondary channel signal.
  • the pitch period of the signal is differentially decoded, so the estimated value of the pitch period of the secondary channel signal is obtained.
  • the estimated value of the pitch period of the secondary channel signal can be used to decode the stereo coded stream, so the spatial perception of the stereo signal can be improved. Harmony and image stability.
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes a part or all of the steps recorded in the foregoing method embodiment.
  • the stereo coding device 1200 includes:
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 (the number of processors 1203 in the stereo encoding device 1200 may be one or more, and one processor is taken as an example in FIG. 12).
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or in other ways. In FIG. 12, a bus connection is taken as an example.
  • the memory 1204 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1203. A part of the memory 1204 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1203 controls the operation of the stereo encoding device, and the processor 1203 may also be referred to as a central processing unit (CPU).
  • the various components of the stereo encoding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 1203 or instructions in the form of software.
  • the above-mentioned processor 1203 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Other programmable logic devices discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the above method in combination with its hardware.
  • the receiver 1201 can be used to receive input digital or character information, and generate signal input related to the related settings and function control of the stereo encoding device.
  • the transmitter 1202 can include display devices such as a display screen, and the transmitter 1202 can be used to output through an external interface Number or character information.
  • the processor 1203 is configured to execute the stereo encoding method executed by the stereo encoding apparatus shown in FIG. 4 of the foregoing embodiment.
  • the stereo decoding device 1300 includes:
  • the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 (the number of processors 1303 in the stereo decoding device 1300 may be one or more, and one processor is taken as an example in FIG. 13).
  • the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or in other ways. Among them, the bus connection is taken as an example in FIG. 13.
  • the memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A part of the memory 1304 may also include NVRAM.
  • the memory 1304 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1303 controls the operation of the stereo decoding device, and the processor 1303 may also be referred to as a CPU.
  • the various components of the stereo decoding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the above embodiments of the present application may be applied to the processor 1303 or implemented by the processor 1303.
  • the processor 1303 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 1303 or instructions in the form of software.
  • the aforementioned processor 1303 may be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and completes the steps of the foregoing method in combination with its hardware.
  • the processor 1303 is configured to execute the stereo decoding method executed by the stereo decoding device shown in FIG. 4 of the foregoing embodiment.
  • the chip when the stereo encoding device or the stereo decoding device is a chip in the terminal, the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, Input/output interface, pin or circuit, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the wireless communication method of any one of the foregoing first aspect.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory). -only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • the processor mentioned in any one of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method of the first aspect or the second aspect.
  • the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or dedicated Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), etc.

Abstract

一种立体声编码方法、立体声解码方法和装置,用于提高立体声的编解码性能,该编码方法包括:对当前帧的左声道信号和当前帧的右声道信号进行下混处理,以得到当前帧的主要声道信号和当前帧的次要声道信号(401);当确定对次要声道信号的基音周期进行差分编码时,使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,以得到次要声道信号的基音周期索引值,次要声道信号的基音周期索引值用于生成待发送的立体声编码码流(403)。

Description

一种立体声编码方法、立体声解码方法和装置
本申请要求于2019年6月29日提交中国专利局、申请号为201910581398.5、发明名称为“一种立体声编码方法、立体声解码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及立体声技术领域,尤其涉及一种立体声编码方法、立体声解码方法和装置。
背景技术
目前,单声道音频已无法满足人们对高质量音频的需求。相对于单声道音频,立体声音频具有各声源的方位感和分布感,能够提高信息的清晰度、可懂度及临场感,因而备受人们青睐。
为了利用有限的带宽更好地传输立体声信号,通常需要先对立体声信号进行编码,然后将编码处理后得到的码流通过信道传输到解码端。在解码端根据接收到的码流进行解码处理,以得到解码后的立体声信号,用于回放。
立体声编解码技术有很多不同的实现方法,例如在编码端将时域信号下混为两路单声道信号。通常先将左右声道下混为主要声道信号以及次要声道信号。然后,分别对主要声道信号及次要声道信号采用单声道编码方法进行编码。对于主要声道信号,通常用较多的比特数进行编码;对于次要声道信号,通常不进行编码。解码时,通常是根据接收到的码流分别解码主要声道信号和次要声道信号,然后进行时域上混处理,以得到解码后的立体声信号。
对于立体声信号来说,区别于单声道信号的重要特征就是声音具有声像信息,使得声音空间感更强。在立体声信号中,次要声道信号的准确性能够更好地体现立体声信号的空间感,同时次要声道编码的准确性对立体声声像的稳定性也起着很重要的作用。
在立体声编码中,基音周期作为体现人类语音产生的重要特征,是主要声道信号编码和次要声道信号编码的重要参数。基音周期参数预测值的准确性会影响整个立体声的编码质量。在时域或频域下的立体声编码中,对输入信号进行分析后可以获得立体声参数及主要声道信号和次要声道信号。在编码速率比较低的情况下(例如24.4kbps及更低速率),编码器通常只对主要声道信号进行编码,而不对次要声道信号进行编码,例如直接使用主要声道信号的基音周期作为次要声道信号的基音周期。由于没有对次要声道信号进行解码,使得解码立体声信号的空间感较差,声像稳定性受主要声道信号的基音周期参数与实际次要声道信号的基音周期参数差异的影响很大,因此降低了立体声编码的编码性能。相应的,也降低了立体声解码的解码性能。
发明内容
本申请实施例提供了一种立体声编码方法、立体声解码方法和装置,用于提高立体声的编解码性能。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种立体声编码方法,包括:对当前帧的左声道信号和所述当前帧的右声道信号进行下混处理,以得到所述当前帧的主要声道信号和所述当前帧的次要声道信号;当确定对所述次要声道信号的基音周期进行差分编码时,使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,所述次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
在本申请实施例中,首先对当前帧的左声道信号和当前帧的右声道信号进行下混处理,以得到当前帧的主要声道信号和当前帧的次要声道信号,当确定对次要声道信号的基音周期进行差分编码时,使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,以得到次要声道信号的基音周期索引值,次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。本申请实施例中由于使用了主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,可以使用少量比特资源分配给次要声道信号的基音周期进行差分编码,通过对次要声道信号的基音周期进行差分编码,可以提高立体声信号的空间感和声像稳定性。另外,本申请实施例中采用较小的比特资源进行了次要声道信号的基音周期的差分编码,因此可以将节省的比特资源用于立体声的其他编码参数,进而提升了次要声道的编码效率,最终提升了整体的立体声编码质量。
在一种可能的实现方式中,所述确定是否对所述次要声道信号的基音周期进行差分编码,包括:对所述当前帧的主要声道信号进行编码,以得到所述主要声道信号的基音周期估计值;对所述当前帧的次要声道信号进行开环基音周期分析,以得到所述次要声道信号的开环基音周期估计值;判断所述主要声道信号的基音周期估计值和所述次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值;当所述差值超过所述次要声道基音周期差分编码阈值时,确定对所述次要声道信号的基音周期进行差分编码;或,当所述差值没有超过所述次要声道基音周期差分编码阈值时,确定不对所述次要声道信号的基音周期进行差分编码。
在本申请实施例中,可以根据主要声道信号进行编码,从而得到主要声道信号的基音周期估计值。在获取到当前帧的次要声道信号之后,可以对次要声道信号进行开环基音周期分析,从而可以得到次要声道信号的开环基音周期估计值。获取主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之后,可以计算主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之间的差值,接下来判断该差值是否超过预设的次要声道基音周期差分编码阈值。其中,次要声道基音周期差分编码阈值可以预先设定,并可以结合立体声编码场景进行灵活配置。当差值超过次要声道基音周期差分编码阈值时确定进行差分编码,当差值没有超过次要声道基音周期差分编码阈值时确定不进行差分编码。
在一种可能的实现方式中,当确定对所述次要声道信号的基音周期进行差分编码时,所述方法还包括:将所述当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第一值用于指示对所述次要声道信号的基音周期进行差分编码。其中,编码端获取次要声道基音周期差分编 码标识,次要声道基音周期差分编码标识的取值可根据是否对次要声道信号的基音周期进行差分编码进行配置,次要声道基音周期差分编码标识用于指示是否对次要声道信号的基音周期采用差分编码。次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,或者配置为第二值。接下来对次要声道基音周期差分编码标识的配置方法进行举例说明,当确定对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为第一值。
在一种可能的实现方式中,所述方法还包括:当确定不对所述次要声道信号的基音周期进行差分编码且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。其中,不对次要声道信号的基音周期进行差分编码,也不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,在这种情况下,本申请实施例中还可以使用次要声道的基音周期独立编码方法,对次要声道信号的基音周期进行编码,因此可以实现对次要声道信号的基音周期的编码。
在一种可能的实现方式中,所述方法还包括:当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。其中,在不对次要声道信号的基音周期进行差分编码时,本申请实施例中还可以采用基音周期复用的方法。即在编码端不对次要声道基音周期进行编码,而在立体声编码码流中携带次要声道信号基音周期复用标识,通过次要声道信号基音周期复用标识指示次要声道信号的基音周期是否复用主要声道信号的基音周期估计值,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
在一种可能的实现方式中,所述使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,包括:根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值;根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值。其中,编码端可以根据次要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以确定次要声道信号的基音周期估计值。次要声道信号的基音周期搜索范围调整因子可用于调整次要声道信号的基音周期索引值,以确定出次要声道信号的基音周期索引值上限。该次要声道信号的基音周期索引值上限表示了次要声道信号的基音周期索引值的取值不能超过的上限值。次要声道信号的基音周期索引值可用于确定次要声道信号的基音周期索引值。编码端在确定出主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限之后,根据主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索 引值上限进行差分编码,输出次要声道信号的基音周期索引值。
在一种可能的实现方式中,所述根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值,包括:根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;使用所述次要声道信号的闭环基音周期参考值作为所述次要声道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值。其中,当前帧的次要声道信号被划分的子帧个数可以通过次要声道信号的子帧配置来确定,例如可以被划分4个子帧个数,或者3个子帧个数,具体结合应用场景确定。在获取到主要声道信号的基音周期估计值之后,可以使用该主要声道信号的基音周期估计值和次要声道信号被划分的子帧个数来计算次要声道信号的闭环基音周期参考值。次要声道信号的闭环基音周期参考值是根据主要声道信号的基音周期估计值来确定的参考值,该次要声道信号的闭环基音周期参考值表示了以主要声道信号的基音周期估计值作为参考来确定的次要声道信号的闭环基音周期。
在一种可能的实现方式中,所述根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值,包括:根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的闭环基音周期参考值f_pitch_prim:f_pitch_prim=loc_T0+loc_frac_prim/N;其中,所述N表示所述次要声道信号被划分的子帧个数。具体的,根据主要声道信号的基音周期估计值首先确定次要声道信号的闭环基音周期整数部分和闭环基音周期分数部分,举例说明如下,直接将主要声道信号的基音周期估计值的整数部分作为次要声道信号的闭环基音周期整数部分,将主要声道信号的基音周期估计值的分数部分作为次要声道信号的闭环基音周期分数部分,还可以采用插值方法将主要声道信号的基音周期估计值映射为次要声道信号的闭环基音周期整数部分和闭环基音周期分数部分。例如,通过以上方法均可以得到次要声道的闭环基音周期整数部分为loc_T0,闭环基音周期分数部分为loc_frac_prim。
在一种可能的实现方式中,所述根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限,包括:通过如下方式计算出所述次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;soft_reuse_index_high_limit=0.5+2 Z;其中,所述Z为所述次要声道信号的基音周期搜索范围调整因子。
在一种可能的实现方式中,所述Z的取值为3、或者4、或者5。
在一种可能的实现方式中,所述根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值,包括:根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的基音周期索引值soft_reuse_index:soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse) ﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;其中,所述pitch_soft_reuse表示所述次要声道信号的基音周期估计值的整数部分,所述pitch_frac_soft_reuse表示所述次要声道信号的基音周期估计值的分数部分,所述soft_reuse_index_high_limit表示所述次要声道信号的基音周期索引值上限,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述*表示相乘运算符,所述+表示相加运算符,所述﹣表示相减运算符。
在一种可能的实现方式中,所述方法应用于所述当前帧的编码速率低于预设的速率阈值的立体声编码场景;所述速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。其中,速率阈值可以为小于或等于13.2kbps,例如速率阈值还可以为16.4kbps、或者24.4kbps,速率阈值的具体取值可以根据应用场景来确定。在编码速率比较低的情况下(如24.4kbps及更低速率)不进行次要声道基音周期独立编码,利用主要声道信号的基音周期估计值作为参考值,采用差分编码方法实现了对次要声道信号的基音周期编码,提升立体声编码质量目的。
第二方面,本申请实施例还提供一种立体声解码方法,包括:根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码;当确定对所述次要声道信号的基音周期进行差分解码时,从所述立体声编码码流中获取当前帧的主要声道的基音周期估计值和所述当前帧的次要声道的基音周期索引值;根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,以得到所述次要声道信号的基音周期估计值,所述次要声道信号的基音周期估计值用于对所述立体声编码码流进行解码。
在本申请实施例中,首先根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码,当对次要声道信号的基音周期进行差分解码时,从立体声编码码流中获取当前帧的主要声道的基音周期估计值和当前帧的次要声道的基音周期索引值,根据主要声道的基音周期估计值和次要声道的基音周期索引值,对次要声道信号的基音周期进行差分解码,以得到次要声道信号的基音周期估计值,次要声道信号的基音周期估计值用于对立体声编码码流进行解码。本申请实施例中在可以对次要声道信号的基音周期进行差分解码时,可以使用主要声道信号的基音周期估计值和次要声道信号的基音周期索引值对次要声道信号的基音周期进行差分解码,因此得到次要声道信号的基音周期估计值,使用该次要声道信号的基音周期估计值可以对立体声编码码流进行解码,因此可以提高立体声信号的空间感和声像稳定性。
在一种可能的实现方式中,所述根据接收到的立体声编码码流确定是否对所述次要声道信号的基音周期进行差分解码,包括:从所述当前帧中获取次要声道基音周期差分编码标识;当所述次要声道基音周期差分编码标识为预设的第一值时,确定对所述次要声道信号的基音周期进行差分解码。在本申请实施例中,次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,例如,次要声道基音周期差分编码标识的取值为1,此时执行对次要声道信号的基音周期的差分解码。
在一种可能的实现方式中,所述方法还包括:当确定不对所述次要声道信号的基音周 期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,从所述立体声编码码流中解码所述次要声道信号的基音周期。其中,解码端确定不对次要声道信号的基音周期进行差分解码,也不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,在这种情况下,本申请实施例中还可以使用次要声道的基音周期独立解码方法,对次要声道信号的基音周期进行解码,因此可以实现对次要声道信号的基音周期的解码。
在一种可能的实现方式中,所述方法还包括:当确定不对所述次要声道信号的基音周期进行差分解码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。其中,解码端确定不对次要声道信号的基音周期进行差分解码时,本申请实施例中还可以采用基音周期复用的方法。例如,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
在一种可能的实现方式中,所述根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,包括:根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;根据所述次要声道信号的闭环基音周期参考值、所述次要声道的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值。具体的,使用主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期参考值,详见前述的计算过程。次要声道信号的基音周期搜索范围调整因子可用于调整次要声道信号的基音周期索引值,以确定出次要声道信号的基音周期索引值上限。该次要声道信号的基音周期索引值上限表示了次要声道信号的基音周期索引值的取值不能超过的上限值。次要声道信号的基音周期索引值可用于确定次要声道信号的基音周期索引值。解码端在确定出次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限之后,根据次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限进行差分解码,输出次要声道信号的基音周期估计值。
在一种可能的实现方式中,所述根据所述次要声道信号的闭环基音周期参考值、所述次要声道信号的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值,包括:通过如下方式计算出所述次要声道信号的基音周期估计值T0_pitch:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;其中,所述f_pitch_prim表示所述次要声道信号的闭环基音周期参考值,所述soft_reuse_index表示所述次要声道信号的基音周期索引值,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述/表示相除运算符,所述+表示相加运算符,所述﹣表示相减运算符。
在一种可能的实现方式中,所述次要声道信号的基音周期索引值上限的调整因子的取 值为2,或者3。
第三方面,本申请实施例还提供一种立体声编码装置,包括:下混模块,用于对当前帧的左声道信号和所述当前帧的右声道信号进行下混处理,以得到所述当前帧的主要声道信号和所述当前帧的次要声道信号;差分编码模块,用于当确定对所述次要声道信号的基音周期进行差分编码时,使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,所述次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
在一种可能的实现方式中,所述立体声编码装置,还包括:主要声道编码模块,用于对所述当前帧的主要声道信号进行编码,以得到所述主要声道信号的基音周期估计值;开环分析模块,用于对所述当前帧的次要声道信号进行开环基音周期分析,以得到所述次要声道信号的开环基音周期估计值;阈值判断模块,用于判断所述主要声道信号的基音周期估计值和所述次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值,当所述差值超过所述次要声道基音周期差分编码阈值时,确定对所述次要声道信号的基音周期进行差分编码,当所述差值没有超过所述次要声道基音周期差分编码阈值时,确定不对所述次要声道信号的基音周期进行差分编码。
在一种可能的实现方式中,所述立体声编码装置,还包括:标识配置模块,用于当确定对所述次要声道信号的基音周期进行差分编码时,将所述当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第一值用于指示对所述次要声道信号的基音周期进行差分编码。
在一种可能的实现方式中,所述立体声编码装置,还包括:独立编码模块,其中,所述独立编码模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。
在一种可能的实现方式中,所述立体声编码装置,还包括:标识配置模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
在一种可能的实现方式中,所述差分编码模块,包括:闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值;索引值上限确定模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;索引值计算模块,用于根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值。
在一种可能的实现方式中,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;使用所述次要声道信号的闭环基音周期参考值作为所述次要声 道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值。
在一种可能的实现方式中,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的闭环基音周期参考值f_pitch_prim:f_pitch_prim=loc_T0+loc_frac_prim/N;其中,所述N表示所述次要声道信号被划分的子帧个数。
在一种可能的实现方式中,所述索引值上限确定模块,用于通过如下方式计算出所述次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;soft_reuse_index_high_limit=0.5+2 Z;其中,所述Z为所述次要声道信号的基音周期搜索范围调整因子。
在一种可能的实现方式中,所述Z的取值为:3、或者4、或者5。
在一种可能的实现方式中,所述索引值计算模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的基音周期索引值soft_reuse_index:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;其中,所述pitch_soft_reuse表示所述次要声道信号的基音周期估计值的整数部分,所述pitch_frac_soft_reuse表示所述次要声道信号的基音周期估计值的分数部分,所述soft_reuse_index_high_limit表示所述次要声道信号的基音周期索引值上限,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述*表示相乘运算符,所述+表示相加运算符,所述﹣表示相减运算符。
在一种可能的实现方式中,所述立体声编码装置应用于所述当前帧的编码速率低于预设的速率阈值的立体声编码场景;所述速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。
在本申请的第三方面中,立体声编码装置的组成模块还可以执行前述第一方面以及各种可能的实现方式中所描述的步骤,详见前述对第一方面以及各种可能的实现方式中的说明。
第四方面,本申请实施例还提供一种立体声解码装置,包括:确定模块,用于根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码;值获取模块,用于当确定对所述次要声道信号的基音周期进行差分解码时,从所述立体声编码码流中获取当前帧的主要声道的基音周期估计值和所述当前帧的次要声道的基音周期索引值;差分解码模块,用于根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,以得到所述次要声道信号的基音周期估计值,所述次要声道信号的基音周期估计值用于对所述立体声编码码流进行解码。
在一种可能的实现方式中,所述确定模块,用于从所述当前帧中获取次要声道基音周 期差分编码标识;当所述次要声道基音周期差分编码标识为预设的第一值时,确定对所述次要声道信号的基音周期进行差分解码。
在一种可能的实现方式中,所述立体声解码装置,还包括:独立解码模块,其中,所述独立解码模块,用于当确定不对所述次要声道信号的基音周期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,从所述立体声编码码流中解码所述次要声道信号的基音周期。
在一种可能的实现方式中,所述立体声解码装置,还包括:基音周期复用模块,其中,所述基音周期复用模块,用于当确定不对所述次要声道信号的基音周期进行差分解码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
在一种可能的实现方式中,所述差分解码模块,包括:参考值确定子模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;索引值上限确定子模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;估计值计算子模块,用于根据所述次要声道信号的闭环基音周期参考值、所述次要声道的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值。
在一种可能的实现方式中,所述估计值计算子模块,用于通过如下方式计算出所述次要声道信号的基音周期估计值T0_pitch:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;其中,所述f_pitch_prim表示所述次要声道信号的闭环基音周期参考值,所述soft_reuse_index表示所述次要声道信号的基音周期索引值,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述/表示相除运算符,所述+表示相加运算符,所述﹣表示相减运算符。
在一种可能的实现方式中,所述次要声道信号的基音周期索引值上限的调整因子的取值为2,或者3。
在本申请的第四方面中,立体声解码装置的组成模块还可以执行前述第二方面以及各种可能的实现方式中所描述的步骤,详见前述对第二方面以及各种可能的实现方式中的说明。
第五方面,本申请实施例提供一种立体声处理装置,该立体声处理装置可以包括立体声编码装置或者立体声解码装置或者芯片等实体,所述立体声处理装置包括:处理器。可选的,该立体声处理还可以包括存储器;所述存储器用于存储指令;所述处理器用于执行所述存储器中的所述指令,使得所述立体声处理装置执行如前述第一方面或第二方面中任一项所述的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第七方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运 行时,使得计算机执行上述第一方面或第二方面所述的方法。
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持立体声编码装置或者立体声解码装置实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存立体声编码装置或者立体声解码装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1为本申请实施例提供的立体声处理系统的组成结构示意图;
图2a为本申请实施例提供的立体声编码器和立体声解码器应用于终端设备的示意图;
图2b为本申请实施例提供的立体声编码器应用于无线设备或者核心网设备的示意图;
图2c为本申请实施例提供的立体声解码器应用于无线设备或者核心网设备的示意图;
图3a为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图;
图3b为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图;
图3c为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图;
图4为本申请实施例中立体声编码装置和立体声解码装置之间的一种交互流程示意图;
图5为本申请实施例提供的一种立体声信号编码的流程示意图;
图6为本申请实施例提供的主要声道信号的基音周期参数和次要声道信号的基音周期参数进行编码的流程图;
图7为采用独立编码方式和差分编码方式得到的基音周期量化结果的比较图;
图8为采用独立编码方式和差分编码方式之后分配给固定码表的比特数的比较图;
图9为本申请实施例提供的时域立体声编码方法的示意图;
图10为本申请实施例提供的一种立体声编码装置的组成结构示意图;
图11为本申请实施例提供的一种立体声解编码装置的组成结构示意图;
图12为本申请实施例提供的另一种立体声编码装置的组成结构示意图;
图13为本申请实施例提供的另一种立体声解编码装置的组成结构示意图。
具体实施方式
本申请实施例提供了一种立体声编码方法、立体声解码方法和装置,提高立体声的编解码性能。
下面结合附图,对本申请的实施例进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没 有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请实施例的技术方案可以应用于各种的立体声处理系统,如图1所示,为本申请实施例提供的立体声处理系统的组成结构示意图。立体声处理系统100可以包括:立体声编码装置101和立体声解码装置102。其中,立体声编码装置101可用于生成立体声编码码流,然后该立体声编码码流可以通过音频传输通道传输给立体声解码装置102,立体声解码装置102可以接收到立体声编码码流,然后执行立体声解码装置102的立体声解码功能,最后得到立体声解码码流。
在本申请的实施例中,该立体声编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如立体声编码装置可以是上述终端设备或者无线设备或者核心网设备的立体声编码器。同样的,该立体声解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如立体声解码装置可以是上述终端设备或者无线设备或者核心网设备的立体声解码器。
如图2a所示,为本申请实施例提供的立体声编码器和立体声解码器应用于终端设备的示意图。对于每个终端设备都可以包括:立体声编码器、信道编码器、立体声解码器、信道解码器。具体的,信道编码器用于对立体声信号进行信道编码,信道解码器用于对立体声信号进行信道解码。例如,在第一终端设备20中可以包括:第一立体声编码器201、第一信道编码器202、第一立体声解码器203、第一信道解码器204。在第二终端设备21中可以包括:第二立体声解码器211、第二信道解码器212、第二立体声编码器213、第二信道编码器214。第一终端设备20连接无线或者有线的第一网络通信设备22,第一网络通信设备22和无线或者有线的第二网络通信设备23之间通过数字信道连接,第二终端设备21连接无线或者有线的第二网络通信设备23。其中,上述无线或者有线的网络通信设备可以泛指信号传输设备,例如通信基站,数据交换设备等。
在音频通信中,作为发送端的终端设备对采集到的立体声信号进行立体声编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号进行信道解码,以得到立体声信号编码码流,然后经过立体声解码恢复出立体声信号,由接收端的终端设备进回放。
如图2b所示,为本申请实施例提供的立体声编码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、其他音频解码器252、立体声编码器253、信道编码器254,其中,其他音频解码器252是指除立体声解码器以外的其他音频解码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用其他音频解码器252进行音频解码(除了立体声解码),然后使用立体声编码器253进行立体声编码,最后使用信道编码器254对立体声信号进行信道编码,完成信道编码之后再传输出去。
如图2c所示,为本申请实施例提供的立体声解码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、立体声解码器255、其他音频编码器256、信道编码器254,其中,其他音频编码器256是指除立体声编码器以外的其他音频编码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用立体声解码器255对接收到的立体声编码码流进行 解码,然后使用其他音频编码器256进行音频编码(除了立体声编码),最后使用信道编码器254对立体声信号进行信道编码,完成信道编码之后再传输出去。在无线设备或者核心网设备中,如果需要实现转码,则需要进行相应的立体声编解码处理。其中,无线设备指的是通信中的射频相关的设备,核心网设备指的是通信中核心网相关的设备。
在本申请的一些实施例中,该立体声编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如立体声编码装置可以是上述终端设备或者无线设备或者核心网设备的多声道编码器。同样的,该立体声解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如立体声解码装置可以是上述终端设备或者无线设备或者核心网设备的多声道解码器。
如图3a所示,为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图,对于每个终端设备都可以包括:多声道编码器、信道编码器、多声道解码器、信道解码器。具体的,信道编码器用于对多声道信号进行信道编码,信道解码器用于对多声道信号进行信道解码。例如,在第一终端设备30中可以包括:第一多声道编码器301、第一信道编码器302、第一多声道解码器303、第一信道解码器304。在第二终端设备31中可以包括:第二多声道解码器311、第二信道解码器312、第二多声道编码器313、第二信道编码器314。第一终端设备30连接无线或者有线的第一网络通信设备32,第一网络通信设备32和无线或者有线的第二网络通信设备33之间通过数字信道连接,第二终端设备31连接无线或者有线的第二网络通信设备33。其中,上述无线或者有线的网络通信设备可以泛指信号传输设备,例如通信基站,数据交换设备等。音频通信中作为发送端的终端设备对采集到的多声道信号进行多声道编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号,进行信道解码,以得到多声道信号编码码流,然后经过多声道解码恢复出多声道信号,由作为接收端的终端设备进回放。
如图3b所示,为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、其他音频解码器352、多声道编码器353、信道编码器354,与前述图2b类似,此处不再赘述。
如图3c所示,为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、多声道解码器355、其他音频编码器356、信道编码器354,与前述图2c类似,此处不再赘述。
其中,立体声编码处理可以是多声道编码器中的一部分,立体声解码处理可以是多声道解码器中的一部分,例如,对采集到的多声道信号进行多声道编码可以是将采集到的多声道信号经过降维处理后得到立体声信号,对得到的立体声信号进行编码;解码端根据多声道信号编码码流,解码得到立体声信号,经过上混处理后恢复出多声道信号。因此,本申请实施例也可应用于终端设备、无线设备、核心网设备中的多声道编码器和多声道解码器。在无线或者核心网设备中,如果需要实现转码,则需要进行相应的多声道编解码处理。
在申请实施例中,在对立体声编码方法中,较重要的一个环节就是基音周期编码。因为浊音是由准周期脉冲激励产生的,所以它的时域波形呈现出明显的周期性,这个周期称为基音周期。基音周期对产生高质量的浊音语音发挥着十分重要的作用,这是因为浊音语 音被表征为由基音周期分隔的样点组成的准周期信号。在语音处理中,基音周期也可以用一个周期内包含的样本数来表示,此时被称为基音延迟。基音延迟是自适应码本的重要参数。
基音周期估计主要是指对基音周期的估计过程,因此基音周期估计的准确性直接决定了激励信号的正确性,也就决定了语音信号的合成质量。在中低码率下用于表示基音周期的比特资源较少,是造成了语音编码质量折损的原因之一。主要声道信号和次要声道信号的基音周期有着很强的相似性,本申请实施例可以合理地利用基音周期的相似性,提升编码效率,是影响中低速率下整个立体声编码质量的重要因素。
在本申请实施例中,对于在频域或时频结合情况下进行的参数立体声编码,主要声道信号的基音周期和次要声道信号的基音周期之间具有相关性,针对次要声道信号的基音周期编码,在满足次要声道信号的基音周期复用条件时通过差分编码方法,对次要声道信号中的基音周期参数进行合理预测并进行差分编码,只需要少量比特资源分配给次要声道信号的基音周期进行量化编码即可,本申请实施例可以提高立体声信号的空间感和声像稳定性。另外,本申请实施例中次要声道信号的基音周期采用较小的比特资源,保证了次要声道信号的基音周期预测的准确性,将剩余比特资源用于其他立体声编码参数,例如可用于固定码表等编码参数,进而提升了次要声道的编码效率,最终提升了整体的立体声编码质量。
本申请实施例中针对次要声道信号的基音周期编码,采用面向次要声道信号的基音周期差分编码方法,利用主要声道信号的基音周期作为参考值,并对次要声道比特资源重新分配,实现提升立体声编码质量目的。接下来基于前述的系统架构以及立体声编码装置和立体声解码装置,对本申请实施例提供的立体声编码方法和立体声解码方法进行说明。如图4所示,为本申请实施例中立体声编码装置和立体声解码装置之间的一种交互流程示意图,其中,下述步骤401至步骤403可以由立体声编码装置(如下简称编码端)执行,下述步骤411至步骤413可以由立体声解码装置(如下简称界面端)执行,主要包括如下过程:
401、对当前帧的左声道信号和当前帧的右声道信号进行下混处理,以得到当前帧的主要声道信号和当前帧的次要声道信号。
在本申请实施例中,当前帧是指在编码端中当前进行编码处理的一个立体声信号帧,首先获取当前帧的左声道信号和当前帧的右声道信号,通过对左声道信号和右声道信号进行下混处理,可以得到当前帧的主要声道信号和当前帧的次要声道信号。举例说明,立体声编解码技术也有很多不同的实现,例如编码端将时域信号下混为两路单声道信号,先将左右声道信号下混为主要声道信号以及次要声道信号,其中,L表示左声道信号,R表示右声道信号,则主要声道信号可以为0.5*(L+R),表征了两个声道之间的相关信息;次要声道信号可以为0.5*(L-R),表征了两个声道之间的差异信息。
需要说明的是,后续实施例中将详细说明频域立体声编码中的下混过程以及时域立体声编码中的下混过程。
在本申请的一些实施例中,编码端执行的立体声编码方法可以应用于当前帧的编码速率低于预设的速率阈值的立体声编码场景。解码端执行的立体声解码方法可以应用于当前 帧的解码速率低于预设的速率阈值的立体声解码场景。其中,当前帧的编码速率是指当前帧的立体声信号采用的编码速率,速率阈值是指针对立体声信号设置的最小速率值,在当前帧的编码速率低于预设的速率阈值时可以执行本申请实施例提供的立体声编码方法,在当前帧的解码速率低于预设的速率阈值时可以执行本申请实施例提供的立体声解码方法。
进一步的,在本申请的一些实施例中,速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。
其中,速率阈值可以为小于或等于13.2kbps,例如速率阈值还可以为16.4kbps、或者24.4kbps,速率阈值的具体取值可以根据应用场景来确定。在编码速率比较低的情况下(如24.4kbps及更低速率)不进行次要声道基音周期独立编码,利用主要声道信号的基音周期估计值作为参考值,采用差分编码方法实现了对次要声道信号的基音周期编码,提升立体声编码质量目的。
402、确定是否对次要声道信号的基音周期进行差分编码。
在本申请实施例中,获取到当前帧的主要声道信号和当前帧的次要声道信号之后,接下来可以根据当前帧的主要声道信号和次要声道信号判断是否能够对次要声道信号的基音周期进行差分编码。例如,根据当前帧的主要声道信号和次要声道信号的信号特性来确定是否对次要声道信号的基音周期进行差分编码,又如还可以使用主要声道信号、次要声道信号和预设的判决条件来判决是否对次要声道信号的基音周期进行差分编码。使用主要声道信号、次要声道信号来确定是否进行差分编码的方式有多种,后续实施例中分别进行详细说明。
在本申请实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码,包括:
对当前帧的主要声道信号进行编码,以得到主要声道信号的基音周期估计值;
对当前帧的次要声道信号进行开环基音周期分析,以得到次要声道信号的开环基音周期估计值;
判断主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值;
当差值超过次要声道基音周期差分编码阈值时,确定对所述次要声道信号的基音周期进行差分编码;或,
当差值没有超过次要声道基音周期差分编码阈值时,确定不对所述次要声道信号的基音周期进行差分编码。
在本申请实施例中,在步骤401中得到当前帧的主要声道信号之后,可以根据主要声道信号进行编码,从而得到主要声道信号的基音周期估计值。具体的,在主要声道编码中,基音周期估计采用开环基音分析和闭环基音搜索相结合,提高了基音周期估计的准确度。语音信号的基音周期估计可以采用多种方法,例如可以采用自相关函数,短时平均幅度差等。基音周期估计算法以自相关函数为基础。自相关函数在基音周期的整数倍位置上出现峰值,利用这个特点可以完成基音周期估计。为了提高基音预测的准确性,更好地逼近语音实际的基音周期,基音周期检测采用以1/3为采样分辨率的分数延迟。为了减少基音周期估计的运算量,基音周期估计包括开环基音分析和闭环基音搜索两个步骤。利用开环基 音分析对一帧语音的整数延迟进行粗略估计得到一个候选的整数延迟,闭环基音搜索在其附近对基音延迟进行细致估计,闭环基音搜索每一子帧执行一次。开环基音分析每帧进行一次,分别计算自相关、归一化处理和计算最佳的开环整数延迟。通过以上过程可以得到主要声道信号的基音周期估计值。
在获取到当前帧的次要声道信号之后,可以对次要声道信号进行开环基音周期分析,从而可以得到次要声道信号的开环基音周期估计值,对于开环基音周期分析的具体过程,不再详细说明。
在本申请实施例中,获取主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之后,可以计算主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之间的差值,接下来判断该差值是否超过预设的次要声道基音周期差分编码阈值。其中,次要声道基音周期差分编码阈值可以预先设定,并可以结合立体声编码场景进行灵活配置。当差值超过次要声道基音周期差分编码阈值时确定进行差分编码,当差值没有超过次要声道基音周期差分编码阈值时确定不进行差分编码。
需要说明的是,本申请实施例中确定是否对次要声道信号的基音周期进行差分编码的方式,不局限于上述通过差值和次要声道基音周期差分编码阈值进行数值大小判断,例如还可以将该差值与次要声道基音周期差分编码阈值相除的结果是否小于1来判断。又如,还可以将主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值进行相除,将得到的相除结果与和次要声道基音周期差分编码阈值进行数值大小判断。另外,次要声道基音周期差分编码阈值的具体取值可以结合应用场景来确定,此处不做限定。
举例说明如下,在次要声道编码中,根据主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值进行次要声道基音周期差分编码判决,例如可使用的判决条件为:DIFF=|∑(pitch[0])-∑(pitch[1])|。
其中,DIFF表示主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之间的差值,|∑(pitch[0])-∑(pitch[1])|表示对∑(pitch[0])和∑(pitch[1])之间的差值取绝对值,∑pitch[0]表示主要声道信号的基音周期估计值,∑pitch[1]表示次要声道信号的开环基音周期估计值。
不限定的是,本申请实施例中可使用的判决条件可以不限于上述公式,例如在|∑(pitch[0])-∑(pitch[1])|计算出结果之后,还可以设置修正因子,该修正因子再乘以|∑(pitch[0])-∑(pitch[1])|的结果,可以作为最终输出的DIFF。又如,DIFF=|∑(pitch[0])-∑(pitch[1])|中的等式右边,还可以加上或者减去一个条件阈值常量,从而得到最终的DIFF。
在本申请实施例中,确定是否对次要声道信号的基音周期进行差分编码之后,根据确定出的结果判断是否执行步骤403,当确定对次要声道信号的基音周期进行差分编码时,触发执行后续的步骤403。
在本申请的一些实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码之后,本申请实施例提供的方法还包括:
当确定对次要声道信号的基音周期进行差分编码时,将当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,立体声编码码流中携带次要声道基音周期差分编码标 识,第一值用于指示对次要声道信号的基音周期进行差分编码。
其中,编码端获取次要声道基音周期差分编码标识,次要声道基音周期差分编码标识的取值可根据是否对次要声道信号的基音周期进行差分编码进行配置,次要声道基音周期差分编码标识用于指示是否对次要声道信号的基音周期采用差分编码。
在本申请实施例中,次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,或者配置为第二值。接下来对次要声道基音周期差分编码标识的配置方法进行举例说明,当确定对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为第一值。通过次要声道基音周期差分编码标识指示第一值,可以使得解码端确定可以对次要声道信号的基音周期进行差分解码。例如,次要声道基音周期差分编码标识的取值可以为0或者1,第一值为1,第二值为0。
举例说明如下,次要声道基音周期差分编码标识用Pitch_reuse_flag来指示。DIFF_THR为预设的次要声道基音周期差分编码阈值,根据不同的编码速率确定次要声道基音周期差分编码阈值为{1,3,6}中的某一值。例如,当DIFF>DIFF_THR时Pitch_reuse_flag=1,此时判别当前帧采用次要声道信号的基音周期差分编码。当DIFF≤DIFF_THR时Pitch_reuse_flag=0,此时不进行基音周期差分编码,采用次要声道信号的独立编码。
在本申请的一些实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码之后,本申请实施例提供的方法还包括:
当确定不对次要声道信号的基音周期进行差分编码且不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期时,对次要声道信号的基音周期和主要声道信号的基音周期分别进行编码。
其中,不对次要声道信号的基音周期进行差分编码,也不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,在这种情况下,本申请实施例中还可以使用次要声道的基音周期独立编码方法,对次要声道信号的基音周期进行编码,因此可以实现对次要声道信号的基音周期的编码。
在本申请的一些实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码之后,本申请实施例提供的方法还包括:
当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
其中,在不对次要声道信号的基音周期进行差分编码时,本申请实施例中还可以采用基音周期复用的方法。即在编码端不对次要声道基音周期进行编码,而在立体声编码码流中携带次要声道信号基音周期复用标识,通过次要声道信号基音周期复用标识指示次要声道信号的基音周期是否复用主要声道信号的基音周期估计值,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的 基音周期进行解码。
在本申请的一些实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码之后,本申请实施例提供的方法还包括:
当确定不对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为预设的第二值,立体声编码码流中携带次要声道基音周期差分编码标识,第二值用于指示不对次要声道信号的基音周期进行差分编码;
当确定不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第三值,立体声编码码流中携带次要声道信号基音周期复用标识,第三值用于指示不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期;
对次要声道信号的基音周期和主要声道信号的基音周期分别进行编码。
其中,次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,或者配置为第二值。接下来对次要声道基音周期差分编码标识的配置方法进行举例说明,当确定不对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为第二值。通过次要声道基音周期差分编码标识指示第二值,可以使得解码端确定可以对次要声道信号的基音周期不进行差分解码,例如,次要声道基音周期差分编码标识的取值可以为0或者1,第一值为1,第二值为0。通过次要声道基音周期差分编码标识指示第二值,可以使得解码端确定不对次要声道信号的基音周期进行差分解码。
次要声道基音周期复用标识可以具有多种取值,例如次要声道基音周期复用标识可以为预设的第四值,或者配置为第三值。接下来对次要声道基音周期复用标识的配置方法进行举例说明,当确定不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道基音周期复用标识配置为第三值。通过次要声道基音周期复用标识指示第三值,可以使得解码端确定不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,例如,次要声道基音周期复用标识的取值可以为0或者1,第四值为1,第三值为0。在编码端确定不对次要声道信号的基音周期进行差分编码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,编码端可以采用独立编码的方式,即对次要声道信号的基音周期和主要声道信号的基音周期分别进行编码。
需要说明的是,在本申请实施例中,当确定不对次要声道信号的基音周期进行差分编码时,可以使用次要声道的基音周期独立编码方法,对次要声道信号的基音周期进行编码。另外,当确定不对次要声道信号的基音周期进行差分编码时,还可以采用基音周期复用的方法。其中,编码端执行的立体声编码方法可以应用于当前帧的编码速率低于预设的速率阈值的立体声编码场景,若不采用次要声道信号的基音周期进行差分编码,则还可以采用次要声道基音周期复用的方法,即在编码端不对次要声道基音周期进行编码,而在立体声编码码流中携带次要声道信号基音周期复用标识,通过次要声道信号基音周期复用标识指示次要声道信号的基音周期是否复用主要声道信号的基音周期估计值,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声 道信号的基音周期进行解码。
在本申请的一些实施例中,步骤402确定是否对次要声道信号的基音周期进行差分编码之后,本申请实施例提供的方法还包括:
当确定不对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为预设的第二值,立体声编码码流中携带次要声道基音周期差分编码标识,第二值用于指示不对次要声道信号的基音周期进行差分编码;
当确定复用主要声道信号的基音周期估计值作为次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,立体声编码码流中携带次要声道信号基音周期复用标识,第四值用于指示复用主要声道信号的基音周期估计值作为次要声道信号的基音周期。
其中,次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,或者配置为第二值。接下来对次要声道基音周期差分编码标识的配置方法进行举例说明,当确定不对次要声道信号的基音周期进行差分编码时,将次要声道基音周期差分编码标识配置为第二值。通过次要声道基音周期差分编码标识指示第二值,可以使得解码端确定可以对次要声道信号的基音周期不进行差分解码,例如,次要声道基音周期差分编码标识的取值可以为0或者1,第一值为1,第二值为0。通过次要声道基音周期差分编码标识指示第二值,可以使得解码端确定不对次要声道信号的基音周期进行差分解码。
次要声道基音周期复用标识可以具有多种取值,例如次要声道基音周期复用标识可以为预设的第四值,或者配置为第三值。在编码端确定不对次要声道信号的基音周期进行差分编码、且复用主要声道信号的基音周期估计值作为次要声道信号的基音周期时,配置次要声道信号基音周期复用标识的取值为第四值。接下来对次要声道基音周期复用标识的配置方法进行举例说明,当确定复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道基音周期复用标识配置为第四值。通过次要声道基音周期复用标识指示第四值,可以使得解码端确定复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,例如,次要声道基音周期复用标识的取值可以为0或者1,第四值为1,第三值为0。
403、当确定对次要声道信号的基音周期进行差分编码时,使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,以得到次要声道信号的基音周期索引值,次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
在本申请实施例中,在确定出可以对次要声道信号的基音周期进行差分编码时,可以使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,由于上述的差分编码使用了主要声道信号的基音周期估计值,考虑到了主要声道信号和次要声道信号之间的基音周期相似性,通过进行差分编码,准确的对次要声道信号的基音周期估计值进行编码,使用该次要声道信号的基音周期估计值可以更准确地解码得到次要声道信号,因此可以提高立体声信号的空间感和声像稳定性。另外,若对次要声道信号的基音周期进行独立编码,本申请实施例中对次要声道信号的基音周期进行差分编码,可以减少对次要声道信号的基音周期进行独立编码时使用的比特资源开销,将节省的比特分配给其他立体 声编码参数,实现准确的次要声道基音周期编码,提高整体立体声编码质量。
在本申请实施例中,在步骤401中得到当前帧的主要声道信号之后,可以根据主要声道信号进行编码,从而得到主要声道信号的基音周期估计值。具体的,在主要声道编码中,基音周期估计采用开环基音分析和闭环基音搜索相结合,提高了基音周期估计的准确度。语音信号的基音周期估计可以采用多种方法,例如可以采用自相关函数,短时平均幅度差等。基音周期估计算法以自相关函数为基础。自相关函数在基音周期的整数倍位置上出现峰值,利用这个特点可以完成基音周期估计。为了提高基音预测的准确性,更好地逼近语音实际的基音周期,基音周期检测采用以1/3为采样分辨率的分数延迟。为了减少基音周期估计的运算量,基音周期估计包括开环基音分析和闭环基音搜索两个步骤。利用开环基音分析对一帧语音的整数延迟进行粗略估计得到一个候选的整数延迟,闭环基音搜索在其附近对基音延迟进行细致估计,闭环基音搜索每一子帧执行一次。开环基音分析每帧进行一次,分别计算自相关、归一化处理和计算最佳的开环整数延迟。通过以上过程可以得到主要声道信号的基音周期估计值。
接下来对本申请实施例中差分编码的具体过程进行说明,具体的,步骤403使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,包括:
根据主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到次要声道信号的基音周期估计值;
根据次要声道信号的基音周期搜索范围调整因子确定次要声道信号的基音周期索引值上限;
根据主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出次要声道信号的基音周期索引值。
其中,编码端首先根据次要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以确定次要声道信号的基音周期估计值。接下来对闭环基音周期搜索的具体过程进行详细说明。在本申请的一些实施例中,根据主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到次要声道信号的基音周期估计值,包括:
根据主要声道信号的基音周期估计值和当前帧的次要声道信号被划分的子帧个数,确定次要声道信号的闭环基音周期参考值;
使用次要声道信号的闭环基音周期参考值作为次要声道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索,以得到次要声道信号的基音周期估计值。
其中,当前帧的次要声道信号被划分的子帧个数可以通过次要声道信号的子帧配置来确定,例如可以被划分4个子帧个数,或者3个子帧个数,具体结合应用场景确定。在获取到主要声道信号的基音周期估计值之后,可以使用该主要声道信号的基音周期估计值和次要声道信号被划分的子帧个数来计算次要声道信号的闭环基音周期参考值。次要声道信号的闭环基音周期参考值是根据主要声道信号的基音周期估计值来确定的参考值,该次要声道信号的闭环基音周期参考值表示了以主要声道信号的基音周期估计值作为参考来确定的次要声道信号的闭环基音周期。举例说明如下,其中一种方法是直接将主要声道信号的 基音周期作为次要声道信号的闭环基音周期参考值,即从主要声道信号的5个子帧中的基音周期选出4个值作为次要声道信号的4个子帧的闭环基音周期参考值。另一种方法是采用插值方法将主要声道信号的5个子帧中的基音周期映射为次要声道信号的4个子帧的闭环基音周期参考值。
具体的,以次要声道信号的闭环基音周期参考值作为次要声道信号的闭环基音周期搜索的起始点,采用整数精度和下采样分数精度进行闭环基音周期搜索,最后通过计算内插归一化相关性得到次要声道信号的基音周期估计值。次要声道信号的基音周期估计值的计算过程,详见后续实施例中的举例说明。
次要声道信号的基音周期搜索范围调整因子可用于调整次要声道信号的基音周期索引值,以确定出次要声道信号的基音周期索引值上限。该次要声道信号的基音周期索引值上限表示了次要声道信号的基音周期索引值的取值不能超过的上限值。次要声道信号的基音周期索引值可用于确定次要声道信号的基音周期索引值。
进一步的,在本申请的一些实施例中,根据主要声道信号的基音周期估计值和当前帧的次要声道信号被划分的子帧个数,确定次要声道信号的闭环基音周期参考值,包括:
根据主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期整数部分loc_T0,和次要声道信号的闭环基音周期分数部分loc_frac_prim;
通过如下方式计算出次要声道信号的闭环基音周期参考值f_pitch_prim:
f_pitch_prim=loc_T0+loc_frac_prim/N;
其中,N表示次要声道信号被划分的子帧个数。
具体的,根据主要声道信号的基音周期估计值首先确定次要声道信号的闭环基音周期整数部分和闭环基音周期分数部分,举例说明如下,直接将主要声道信号的基音周期估计值的整数部分作为次要声道信号的闭环基音周期整数部分,将主要声道信号的基音周期估计值的分数部分作为次要声道信号的闭环基音周期分数部分,还可以采用插值方法将主要声道信号的基音周期估计值映射为次要声道信号的闭环基音周期整数部分和闭环基音周期分数部分。例如,通过以上方法均可以得到次要声道的闭环基音周期整数部分为loc_T0,闭环基音周期分数部分为loc_frac_prim。
N表示次要声道信号被划分的子帧个数,例如N的取值可以为3,或者4,或者5等,具体取值取决于应用场景。通过上述公式可以计算出次要声道信号的闭环基音周期参考值,不限定的是,本申请实施例中计算次要声道信号的闭环基音周期参考值可以不限于上述公式,例如在loc_T0+loc_frac_prim/N计算出结果之后,还可以设置修正因子,该修正因子再乘以loc_T0+loc_frac_prim/N的结果,可以作为最终输出的f_pitch_prim。又如,f_pitch_prim=loc_T0+loc_frac_prim/N中的等式右边,还可以将N替换为N-1,同样也可以计算出最终的f_pitch_prim。
在本申请的一些实施例中,根据次要声道信号的基音周期搜索范围调整因子确定次要声道信号的基音周期索引值上限,包括:
通过如下方式计算出次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;
soft_reuse_index_high_limit=0.5+2 Z
其中,Z为次要声道信号的基音周期搜索范围调整因子,Z的取值为:3、或者4、或者5。
其中,计算差分编码中次要声道信号的基音周期索引上限,需要首先确定次要声道信号的基音周期搜索范围调整因子Z,然后通过如下计算式:soft_reuse_index_high_limit=0.5+2 Z,以得到soft_reuse_index_high_limit,例如Z可取3、或者4、或者5,对于Z的具体取值此处不做限定,具体取决于应用场景。
编码端在确定出主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限之后,根据主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限进行差分编码,输出次要声道信号的基音周期索引值。
进一步的,在本申请的一些实施例中,根据主要声道信号的基音周期估计值、次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出次要声道信号的基音周期索引值,包括:
根据主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期整数部分loc_T0,和次要声道信号的闭环基音周期分数部分loc_frac_prim;
通过如下方式计算出次要声道信号的基音周期索引值soft_reuse_index:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
其中,pitch_soft_reuse表示次要声道信号的基音周期估计值的整数部分,pitch_frac_soft_reuse表示次要声道信号的基音周期估计值的分数部分,soft_reuse_index_high_limit表示次要声道信号的基音周期索引值上限,N表示次要声道信号被划分的子帧个数,M表示次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,*表示相乘运算符,+表示相加运算符,﹣表示相减运算符。
具体的,首先根据主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期整数部分loc_T0,和次要声道信号的闭环基音周期分数部分loc_frac_prim,详见前述的计算过程。N表示次要声道信号被划分的子帧个数,例如N的取值可以为3,或者4,或者5,M表示次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,例如M的取值可以为2,或者3,对于N和M的取值取决于应用场景,此处不做限定。
不限定的是,本申请实施例中计算次要声道信号的基音周期索引值可以不限于上述公式,例如在(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M计算出结果之后,还可以设置修正因子,该修正因子再乘以(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M的结果,可以作为最终输出的soft_reuse_index。
又如,soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M中的等式右边,还可以再加上一个修正因子,该修正因子的具体取值不做限定,同样也可以计算出最终的soft_reuse_index。
在本申请实施例中,编码端生成的立体声编码码流可以存储在计算机可读存储介质中。
在申请实施例中,使用主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,可以得到次要声道信号的基音周期索引值,次要声道信号的基音周期索引值用于表示次要声道信号的基音周期。在得到次要声道信号的基音周期索引值之后,还可以将次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。当编码端生成立体声编码码流之后,可以将该立体声编码码流输出,并经过音频传输通道,发送至解码端。
411、根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码。
在本申请实施例中,根据接收到的立体声编码码流判断是否对次要声道信号的基音周期进行差分解码,例如解码端可以根据立体声编码码流携带的指示信息确定是否对次要声道信号的基音周期进行差分解码。又如,在立体声信号的传输环境预先配置完成之后,就可以预先配置是否进行差分解码,从而解码端还可以根据预先配置的结果确定是否对次要声道信号的基音周期进行差分解码。
在本申请的一些实施例中,步骤411根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码,包括:
从当前帧中获取次要声道基音周期差分编码标识;
当次要声道基音周期差分编码标识为预设的第一值时,确定对次要声道信号的基音周期进行差分解码。
在本申请实施例中,次要声道基音周期差分编码标识可以具有多种取值,例如次要声道基音周期差分编码标识可以为预设的第一值,或者为第二值。例如,次要声道基音周期差分编码标识的取值可以为0或者1,第一值为1,第二值为0。例如当次要声道基音周期差分编码标识的取值为1时,触发执行步骤412。
举例说明如下,次要声道基音周期差分编码标识为Pitch_reuse_flag。例如在次要声道解码中,获取次要声道基音周期差分编码标识Pitch_reuse_flag;当可以对次要声道信号的基音周期进行差分解码时Pitch_reuse_flag为1,执行本申请实施例中的差分解码方法,当不能够对次要声道信号的基音周期进行差分解码时,Pitch_reuse_flag为0,执行独立解码方法。例如,在本申请实施例中,只有当满足Pitch_reuse_flag均为1时,才执行步骤412和步骤413中的差分解码过程。
在本申请的一些实施例中,本申请实施例提供的方法还包括:
当确定不对次要声道信号的基音周期进行差分解码、且不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期时,从立体声编码码流中解码次要声道信号的基音周期。
其中,解码端确定不对次要声道信号的基音周期进行差分解码,也不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,在这种情况下,本申请实施例中还可以使用次要声道的基音周期独立解码方法,对次要声道信号的基音周期进行解码,因此可以实现对次要声道信号的基音周期的解码。
在本申请的一些实施例中,本申请实施例提供的方法还包括:
当确定不对次要声道信号的基音周期进行差分解码且复用主要声道信号的基音周期估 计值作为次要声道信号的基音周期时,将主要声道信号的基音周期估计值作为次要声道信号的基音周期。
其中,解码端确定不对次要声道信号的基音周期进行差分解码时,本申请实施例中还可以采用基音周期复用的方法。例如,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
在本申请的另一些实施例中,根据次要声道基音周期差分编码标识的取值,解码端执行的立体声解码方法还可以包括如下步骤:
当次要声道基音周期差分编码标识为预设的第二值时、且立体声编码码流中携带的次要声道信号基音周期复用标识为预设的第三值时,确定不对次要声道信号的基音周期进行差分解码、且不复用主要声道信号的基音周期估计值作为次要声道信号的基音周期,从立体声编码码流中解码次要声道信号的基音周期。
在本申请的另一些实施例中,根据次要声道基音周期差分编码标识的取值,解码端执行的立体声解码方法还可以包括如下步骤:
当次要声道基音周期差分编码标识为预设的第二值、且立体声编码码流中携带的次要声道信号基音周期复用标识为预设的第四值时,确定不对次要声道信号的基音周期进行差分解码,将主要声道信号的基音周期估计值作为次要声道信号的基音周期。
其中,次要声道基音周期差分编码标识是第二值时,确定不执行步骤412和步骤413中的差分解码过程,进一步的解析立体声编码码流中携带的次要声道信号基音周期复用标识,通过次要声道信号基音周期复用标识指示次要声道信号的基音周期是否复用主要声道信号的基音周期估计值,当次要声道信号基音周期复用标识的取值为第四值时,指示次要声道信号的基音周期复用主要声道信号的基音周期估计值,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。当次要声道信号基音周期复用标识的取值为第三值时,指示次要声道信号的基音周期不复用主要声道信号的基音周期估计值,从立体声编码码流中解码次要声道信号的基音周期,可以对次要声道信号的基音周期和主要声道信号的基音周期分别进行解码,即对次要声道信号的基音周期进行独立解码。解码端根据立体声编码码流中携带的次要声道基音周期差分编码标识可以确定执行差分解码方法或者独立解码方法。
需要说明的是,在本申请实施例中,当对次要声道信号的基音周期不进行差分解码时,可以使用次要声道的基音周期独立解码方法,对次要声道信号的基音周期进行解码。另外,当对次要声道信号的基音周期不进行差分解码时,还可以采用基音周期复用的方法。其中,解码端执行的立体声解码方法可以应用于当前帧的解码速率低于预设的速率阈值的立体声解码场景,若在立体声编码码流中携带次要声道信号基音周期复用标识,通过次要声道信号基音周期复用标识指示次要声道信号的基音周期是否复用主要声道信号的基音周期估计值,当次要声道信号基音周期复用标识指示次要声道信号的基音周期复用主要声道信号的基音周期估计值时,在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
412、当确定对次要声道信号的基音周期进行差分解码时,从立体声编码码流中获取当 前帧的主要声道的基音周期估计值和当前帧的次要声道的基音周期索引值。
在本申请实施例中,编码端发送立体声编码码流之后,解码端首先通过音频传输通道接收到该立体声编码码流,然后根据该立体声编码码流进行信道解码,若需要对次要声道信号的基音周期进行差分解码,可以从立体声编码码流中获取到当前帧的次要声道信号的基音周期索引值,还可以从立体声编码码流中获取到当前帧的主要声道信号的基音周期估计值。
413、根据主要声道的基音周期估计值和次要声道的基音周期索引值,对次要声道信号的基音周期进行差分解码,以得到次要声道信号的基音周期估计值,次要声道信号的基音周期估计值用于对立体声编码码流进行解码。
在本申请实施例中,在步骤411中确定出需要对次要声道信号的基音周期进行差分解码时,可以使用主要声道信号的基音周期估计值和次要声道信号的基音周期索引值,对次要声道信号的基音周期进行差分解码,实现准确的次要声道基音周期解码,提高整体立体声解码质量。
接下来对本申请实施例中差分解码的具体过程进行说明,具体的,步骤413根据主要声道信号的基音周期估计值和次要声道信号的基音周期索引值,对次要声道信号的基音周期进行差分解码,包括:
根据主要声道信号的基音周期估计值和当前帧的次要声道信号被划分的子帧个数,确定次要声道信号的闭环基音周期参考值;
根据次要声道信号的基音周期搜索范围调整因子确定次要声道信号的基音周期索引值上限;
根据次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限计算出次要声道信号的基音周期估计值。
举例说明如下,使用主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期参考值,详见前述的计算过程。次要声道信号的基音周期搜索范围调整因子可用于调整次要声道信号的基音周期索引值,以确定出次要声道信号的基音周期索引值上限。该次要声道信号的基音周期索引值上限表示了次要声道信号的基音周期索引值的取值不能超过的上限值。次要声道信号的基音周期索引值可用于确定次要声道信号的基音周期索引值。
解码端在确定出次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限之后,根据次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限进行差分解码,输出次要声道信号的基音周期估计值。
进一步的,在本申请的一些实施例中,根据次要声道信号的闭环基音周期参考值、次要声道信号的基音周期索引值和次要声道信号的基音周期索引值上限计算出次要声道信号的基音周期估计值,包括:
通过如下方式计算出次要声道信号的基音周期估计值T0_pitch:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
其中,f_pitch_prim表示次要声道信号的闭环基音周期参考值,soft_reuse_index表示次要声道信号的基音周期索引值,N表示次要声道信号被划分的子帧个数,M表示次要 声道信号的基音周期索引值上限的调整因子,M为非零的实数,/表示相除运算符,+表示相加运算符,﹣表示相减运算符。
具体的,首先根据主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期整数部分loc_T0,和次要声道信号的闭环基音周期分数部分loc_frac_prim,详见前述的计算过程。N表示次要声道信号被划分的子帧个数,例如N的取值可以为3,或者4,或者5,M表示次要声道信号的基音周期索引值上限的调整因子,例如M的取值可以为2,或者3,对于N和M的取值取决于应用场景,此处不做限定。
不限定的是,本申请实施例中计算次要声道信号的基音周期估计值可以不限于上述公式,例如在f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N计算出结果之后,还可以设置修正因子,该修正因子再乘以f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N的结果,可以作为最终输出的T0_pitch。又如,T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N中的等式右边,还可以再加上一个修正因子,该修正因子的具体取值不做限定,同样也可以计算出最终的T0_pitch。
需要说明的是,计算出次要声道信号的基音周期估计值T0_pitch之后,还可以根据次要声道信号的基音周期估计值T0_pitch进一步的计算出次要声道信号的基音周期估计值整数部分T0和基音周期估计值分数部分T0_frac。举例说明如下,T0=INT(T0_pitch),T0_frac=(T0_pitch–T0)*N。
其中,INT(T0_pitch)表示对T0_pitch下取整运算,T0为解码次要声道基音周期的整数部分,T0_frac为解码次要声道基音周期的分数部分。
通过前述实施例的举例说明,本申请实施例中由于使用了主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,可以使用少量比特资源分配给次要声道信号的基音周期进行差分编码,通过对次要声道信号的基音周期进行差分编码,可以提高立体声信号的空间感和声像稳定性。另外,本申请实施例中采用较小的比特资源进行了次要声道信号的基音周期的差分编码,因此可以将节省的比特资源用于立体声的其他编码参数,进而提升了次要声道的编码效率,最终提升了整体的立体声编码质量。另外,本申请实施例中在可以对次要声道信号的基音周期进行差分解码时,可以使用主要声道信号的基音周期估计值和次要声道信号的基音周期索引值对次要声道信号的基音周期进行差分解码,因此得到次要声道信号的基音周期估计值,使用该次要声道信号的基音周期估计值可以对立体声编码码流进行解码,因此可以提高立体声信号的空间感和声像稳定性。
为便于更好的理解和实施本申请实施例的上述方案,下面举例相应的应用场景来进行具体说明。
本申请实施例所提出的针对次要声道信号的基音周期编码方案,在次要声道信号基音周期编码过程中判断是否能够对次要声道信号的基音周期进行差分编码,当可以对次要声道信号的基音周期进行差分编码时,采用面向次要声道信号基音周期的差分编码方法对次要声道信号基音周期编码,用少量比特资源进行差分编码,将节省的比特分配给其他立体声编码参数,实现准确的次要声道信号基音周期编码,提高整体立体声编码质量。
本申请实施例中,立体声信号可以是原始的立体声信号,也可以是多声道信号中包含的两路信号组成的立体声信号,还可以是由多声道信号中包含的多路信号联合产生的两路信号组成的立体声信号。立体声编码可以构成独立的立体声编码器,也可以用于多声道编码器中的核心编码部分,旨在对由多声道信号中包含的多路信号联合产生的两路信号组成的立体声信号进行编码。
本申请实施例以立体声信号的编码速率为24.4kbps编码速率示例说明,可以理解的是,本申请实施例不限制于24.4kbps编码速率下实施,也可应用于更低速率的立体声编码中。
如图5所示,为本申请实施例提供的一种立体声信号编码的流程示意图。本申请实施例提出一种立体声编码中的基音周期编码判别方法,立体声编码可以是时域立体声编码,也可以是频域立体声编码,还可以是时频结合的立体声编码,本申请实施例不做限定。以频域立体声编码为例,接下来对立体声编码的编解码流程进行说明,重点说明后续步骤中的次要声道信号编码中基音周期的编码过程。具体地:
首先从频域立体声编码的编码端进行说明,编码端的具体实现步骤:
S01、对左右声道时域信号进行时域预处理。
立体声信号编码一般采用分帧处理来进行。若立体声音频信号的采样率为16KHz,每帧信号为20ms,帧长记作N,则N=320,即帧长为320个样点。当前帧的立体声信号包括当前帧的左声道时域信号以及当前帧的右声道时域信号,当前帧的左声道时域信号记作x L(n),当前帧的右声道时域信号记作x R(n),其中n为样点序号,n=0,1,…,N-1。当前帧的左右声道时域信号是当前帧的左声道时域信号以及当前帧的右声道时域信号的简称。
对当前帧的左右声道时域信号进行时域预处理,具体地可以包括:对当前帧的左右声道时域信号分别进行高通滤波处理,以得到当前帧预处理后的左右声道时域信号,当前帧预处理后的左时域信号记作x L_HP(n),当前帧预处理后的右时域信号记作x R_HP(n)。其中,n为样点序号,n=0,1,…,N-1。当前帧预处理后的左右声道时域信号是当前帧预处理后的左声道时域信号以及当前帧预处理后的右声道时域信号的简称。高通滤波处理可以是截止频率为20Hz的无限脉冲响应(infinite impulse response,IIR)滤波器,也可是其他类型的滤波器。例如,采样率为16KHz对应的截止频率为20Hz的高通滤波器的传递函数为:
Figure PCTCN2020096296-appb-000001
其中,b 0=0.994461788958195,b 1=-1.988923577916390,b 2=0.994461788958195,a 1=1.988892905899653,a 2=-0.988954249933127,z为在Z变换域下的变换因子。
相应的时域滤波器为:
x L_HP(n)=b 0*x L(n)+b 1*x L(n-1)+b 2*x L(n-2)-a 1*x L_HP(n-1)-a 2*x L_HP(n-2),
可以理解的是,对当前帧的左右声道时域信号进行时域预处理不是必须要执行的步骤。如果没有时域预处理的步骤,则用于进行时延估计的左右声道信号就是原始立体声信号中的左右声道信号。这里原始立体声信号中的左右声道信号是指采集到的经过模数转换后的 脉冲编码调制(pulse code modulation,PCM)信号,信号的采样率可以包括8KHz、16KHz、32KHz、44.1KHz以及48KHz。
另外,预处理除了本实施例中描述的高通滤波处理,还可以包含其它处理,例如预加重处理等,本申请实施例不做限定。
S02、根据预处理后的左右声道信号进行时域分析。
具体地,时域分析可以包括瞬态检测等。其中,瞬态检测可以是分别对当前帧预处理后的左右声道时域信号进行能量检测,检测当前帧是否发生能量突变。例如,计算当前帧预处理后的左声道时域信号的能量E cur_L;根据前一帧预处理后的左声道时域信号的能量E pre_L和当前帧预处理后的左声道时域信号的能量E cur_L之间的差值的绝对值进行瞬态检测,以得到当前帧预处理后的左声道时域信号的瞬态检测结果。同样的,还可以用相同的方法对当前帧预处理后的右声道时域信号进行瞬态检测。时域分析可以包含除瞬态检测之外的其他的时域分析,例如可以包含时域声道间时间差参数(inter-channel time difference,ITD)确定、时域的时延对齐处理、频带扩展预处理等。
S03、对预处理后的左右声道信号进行时频变换,以得到左右声道频域信号。
具体地,可以是对预处理后的左声道信号进行离散傅里叶变换,以得到左声道频域信号;对预处理后的右声道信号进行离散傅里叶变换,以得到右声道频域信号。为了克服频谱混叠的问题,连续两次离散傅里叶变换之间一般都采用叠接相加的方法进行处理,有时还会对离散傅里叶变换的输入信号进行补零。
离散傅里叶变换可以是每帧进行一次,也可以将每帧信号分成P个子帧,每个子帧进行一次。如果每帧进行一次,则变换后左声道频域信号可以记作L(k),k=0,1,…,L/2-1,L表示采样点,变换后右声道频域信号可以记作R(k),k=0,1,…,L/2-1,k为频点索引值。如果每子帧进行一次,则变换后第i个子帧的左声道频域信号可以记作L i(k),k=0,1,…,L/2-1,变换后第i个子帧的右声道频域信号可以记作R i(k),k=0,1,…,L/2-1,k为频点索引值,i为子帧索引值,i=0,1,…P-1。例如,本实施例中以宽带为例,宽带指的是编码带宽可以为8kHz或者更大,每帧左声道或每帧右声道信号为20ms,帧长记作N,则N=320,即帧长为320个样点。将每帧信号分成两个子帧,即P=2,每个子帧信号为10ms,子帧长为160个样点。每个子帧进行一次离散傅里叶变换,离散傅里叶变换的长度记作L,L=400,即离散傅里叶变换的长度为400个样点,则变换后第i个子帧的左声道频域信号可以记作L i(k),k=0,1,…,L/2-1,变换后第i个子帧的右声道频域信号可以记作R i(k),k=0,1,…,L/2-1,k为频点索引值,i为子帧索引值,i=0,1,…,P-1。
S04、确定ITD参数,并进行编码。
确定ITD参数的方法有很多种,可以只在频域进行,可以只在时域进行,也可以通过时频结合的方法来确定,本申请实施例不做限制。
例如,可以在时域采用左右声道互相关系数提取ITD参数,例如:在0≤i≤Tmax范围内,计算
Figure PCTCN2020096296-appb-000002
Figure PCTCN2020096296-appb-000003
如果
Figure PCTCN2020096296-appb-000004
则ITD参数值为max(Cn(i))对应的索引值的相反数,其中,在编解码器中默认规定了max(Cn(i))值对应的索引表;否则ITD参数值为max(Cp(i))对应的索引值。
其中,i为计算互相关系数的索引值,j为样点的索引值,Tmax对应于不同采样率下ITD取值的最大值,N为帧长。也可以在频域基于左右声道频域信号确定ITD参数,例如:可以采用离散傅里叶变换(discrete Fourier transform,DFT)、快速傅氏变换(Fast Fourier Transformation,FFT)、修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)等时频变换技术,将时域信号变换为频域信号。本实施例中DFT变换后第i个子帧的左声道频域信号L i(k),k=0,1,…,L/2-1,变换后第i个子帧的右声道频域信号R i(k),k=0,1,…,L/2-1,i=0,1,…,P-1,计算第i个子帧的频域相关系数:
XCORR i(k)=L i(k)*R * i(k)。其中,R * i(k)为时频变换后第i个子帧的右声道频域信号的共轭。将频域互相关系数转换到时域xcorr i(n),n=0,1,…,L-1,在L/2-T max≤n≤L/2+T max范围内搜索xcorr i(n)的最大值,以得到第i个子帧的ITD参数值为
Figure PCTCN2020096296-appb-000005
又如,还可以根据DFT变换后第i个子帧的左声道频域信号和第i个子帧的右声道频域信号,在搜索范围-T max≤j≤T max,计算幅度值:
Figure PCTCN2020096296-appb-000006
则ITD参数值为
Figure PCTCN2020096296-appb-000007
即幅度值最大的值对应的索引值。
在确定了ITD参数后,需要在编码器中将ITD参数进行残差编码和熵编码,然后写入立体声编码码流。
S05、根据ITD参数,对左右声道频域信号进行时移调整。
本申请实施例对左右声道频域信号进行时移调整的方式有多种,接下来进行举例说明。
本实施例中,以每帧信号分成P个子帧,P=2为例,经过时移调整后的第i个子帧的左声道频域信号可以记作L i′(k),k=0,1,…,L/2-1,经过时移调整后的第i个子帧的右声道频域信号可以记作R i′(k),k=0,1,…,L/2-1,k为频点索引值,i=0,1,…,P-1。
Figure PCTCN2020096296-appb-000008
Figure PCTCN2020096296-appb-000009
其中,τ i为第i个子帧的ITD参数值,L为离散傅里叶变换的长度,L i(k)为时频变换后第i个子帧的左声道频域信号,R i(k)为变换后第i个子帧的右声道频域信号,i为子帧索引值,i=0,1,…,P-1。
可以理解的是,如果DFT不是分帧进行的,也可以整帧进行一次时移调整。其中,分帧后则按每个子帧进行时移调整,若不分帧则按每帧进行时移调整。
S06、计算其他频域立体声参数,并进行编码。
其他频域立体声参数可以包含但不限于:声道间相位差(inter-channel phase difference,IPD)参数、声道间电平差(也称声道间幅度差)(inter-channel level difference,ILD)参数、子带边增益等,本申请实施例中不做限定。计算得到其他频域立体声参数后,需要将其进行残差编码和熵编码,写入立体声编码码流。
S07、计算主要声道信号和次要声道信号。
计算主要声道信号和次要声道信号。具体地,可以使用本申请实施例中的任何一种时域或频域下混处理实现。例如,可以根据当前帧的左声道频域信号和当前帧的右声道频域信号,计算当前帧的主要声道信号和次要声道信号;可以根据当前帧预设低频带所对应的各个子带的左声道频域信号和当前帧预设低频带所对应的各个子带的右声道频域信号,计算当前帧预设低频带所对应的各个子带的主要声道信号和次要声道信号;也可以根据当前帧各个子帧的左声道频域信号和当前帧各个子帧的右声道频域信号,计算当前帧各个子帧的主要声道信号和次要声道信号;还可以根据当前帧各个子帧预设低频带所对应的各个子带的左声道频域信号和当前帧各个子帧预设低频带所对应的各个子带的右声道频域信号,计算当前帧各个子帧预设低频带所对应的各个子带的主要声道信号和次要声道信号。可以根据当前帧的左声道时域信号和当前帧的右声道时域信号,通过两路信号相加得到主要声道信号,通过两路信号相减得到次要声道信号。
在本实施例中,由于对每帧信号进行了分帧处理,将每个子帧的主要声道信号和次要声道信号经过离散傅里叶变换的逆变换转换到时域,并进行子帧间的叠接相加处理,以得到当前帧的时域主要声道信号和次要声道信号。
需要说明的是,步骤S07得到主要声道信号和次要声道信号的过程称为下混处理,从步骤S08开始是对主要声道信号和次要声道信号处理。
S08、对下混后的主要声道信号和次要声道信号进行编码。
具体地,可以先根据前一帧的主要声道信号和次要声道信号编码中得到的参数信息以及主要声道信号编码和次要声道信号编码的总比特数,对主要声道信号编码和次要声道信号编码进行比特分配。然后根据比特分配的结果分别对主要声道信号和次要声道信号进行 编码。主要声道信号编码和次要声道信号编码,可以采用任何一种单声道音频编码技术。例如,采用ACELP的编码方法对下混处理得到的主要声道信号和次要声道信号进行编码。ACELP编码方法通常包括:确定线性预测系数(linear prediction coefficient,LPC)并将其转换成为线谱频率参数(line spectral frequency,LSF)进行量化编码;搜索自适应码激励确定基音周期及自适应码本增益,并对基音周期及自适应码本增益分别进行量化编码;搜索代数码激励确定代数码激励的脉冲索引及增益,并对代数码激励的脉冲索引及增益分别进行量化编码。
如图6所示,为本申请实施例提供的主要声道信号的基音周期参数和次要声道信号的基音周期参数进行编码的流程图。图6所示的流程包括如下步骤S09至步骤S12,对于主要声道信号的基音周期参数和次要声道信号的基音周期参数进行编码的过程为:
S09、确定主要声道信号基音周期并进行编码。
在主要声道信号编码中,基音周期估计采用开环基音分析和闭环基音搜索相结合,提高了基音周期估计的准确度。语音的基音周期估计可以采用多种方法,例如自相关函数,短时平均幅度差等。基音周期估计算法以自相关函数为基础。自相关函数在基音周期的整数倍位置上出现峰值,利用这个特点可以完成基音周期估计。为了提高基音预测的准确性,更好地逼近语音实际的基音周期,基音周期检测采用以1/3为采样分辨率的分数延迟。为了减少基音周期估计的运算量,基音周期估计包括开环基音分析和闭环基音搜索两个步骤。利用开环基音分析对一帧语音的整数延迟进行粗略估计得到一个候选的整数延迟,闭环基音搜索在其附近对基音延迟进行细致估计,闭环基音搜索每一子帧执行一次。开环基音分析每帧进行一次,分别计算自相关、归一化处理和计算最佳的开环整数延迟。
通过以上步骤得到的主要声道信号的基音周期估计值,除了作为主要声道信号基音周期编码参数之外,还会作为次要声道信号的基音周期参考值。
S10、次要声道编码中是否采用基音周期差分编码。
在次要声道编码中,根据主要声道的基音周期估计值和次要声道信号的开环基音周期估计值进行次要声道基音周期差分编码判决,判决条件为:
DIFF=|∑(pitch[0])-∑(pitch[1])|,
其中,DIFF表示主要声道信号的基音周期估计值和次要声道信号的开环基音周期估计值之间的差值,|∑(pitch[0])-∑(pitch[1])|表示对∑(pitch[0])和∑(pitch[1])之间的差值取绝对值,∑pitch[0]表示主要声道信号的基音周期估计值,∑pitch[1]表示次要声道信号的开环基音周期估计值。
次要声道基音周期差分编码标识用Pitch_reuse_flag来指示。DIFF_THR为预设的次要声道基音周期差分编码阈值,根据不同的编码速率确定次要声道基音周期差分编码阈值为{1,3,6}中的某一值。例如,当DIFF>DIFF_THR时Pitch_reuse_flag=1,此时判别当前帧采用次要声道信号的基音周期差分编码。当DIFF≤DIFF_THR时Pitch_reuse_flag=0,此时不进行基音周期差分编码,采用次要声道信号的独立编码。
S11:若不进行基音周期差分编码,则使用次要声道信号的基音周期独立编码方法,对次要声道信号的基音周期进行编码。
不限定的是,若不采用次要声道信号的基音周期差分编码,还可以采用次要声道信号 的基音周期复用方法,即在编码端不对次要声道信号的基音周期编码,在解码端将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
S12:进行次要声道信号的基音周期差分编码。
次要声道信号的基音周期差分编码具体步骤包括:
S121:根据主要声道信号的基音周期估计值进行次要声道信号的闭环基音周期搜索,确定次要声道信号的基音周期估计值。
S12101:根据主要声道信号的基音周期估计值确定次要声道信号的闭环基音周期的参考值。
在本实施例中以24.4kbps编码速率为例,基音周期编码按子帧进行,主要声道信号被划分为5个子帧,次要声道信号被划分为4个子帧。根据主要声道信号的基音周期确定次要声道信号的基音周期的参考值,其中一种方法是直接将主要声道信号的基音周期作为次要声道信号的基音周期参考值,即从主要声道信号5个子帧中的基音周期选出4个值作为次要声道信号4个子帧的基音周期参考值。另一种方法是采用插值方法将主要声道信号5个子帧中的基音周期映射为次要声道信号4个子帧的基音周期参考值。通过以上方法均可以得到次要声道信号的闭环基音周期参考值,其中整数部分为loc_T0,分数部分为loc_frac_prim。
S12102:根据次要声道信号基音周期参考值进行次要声道信号闭环基音周期搜索,确定次要声道信号基音周期。具体为:使用次要声道信号的闭环基音周期参考值作为次要声道信号闭环基音周期搜索的起始点,采用整数精度和下采样分数精度进行闭环基音周期搜索,通过计算内插归一化相关性得到次要声道信号基音周期估计值。
例如,其中一种方法是采用2比特(bits)用于次要声道信号基音周期编码,具体为:
以loc_T0为搜索起点,在[loc_T0-1,loc_T0+1]范围内对次要声道信号基音周期进行整数精度搜索,每个搜索点再以loc_frac_prim为初始值,在[loc_frac_prim+2,loc_frac_prim+3]或[loc_frac_prim,loc_frac_prim-3]或[loc_frac_prim-2,loc_frac_prim+1]范围内对次要声道信号基音周期进行分数精度搜索,计算每个搜索点对应的内插归一化相关性,在一个帧计算多个搜索点对应的相似度,当内插归一化相关性取得最大值时,该搜索点即为最优次要声道信号基音周期估计值,其中整数部分为pitch_soft_reuse,分数部分为pitch_frac_soft_reuse。
又如,另一种方法是采用3bits至5bits用于编码次要声道信号基音周期编码,具体为:
当采用3bits至5bits用于编码次要声道信号基音周期编码时,搜索半径half_range分别为1,2,4。此时以loc_T0为搜索起点,在[loc_T0-half_range,loc_T0+half_range]范围内对次要声道信号基音周期进行整数精度搜索,每个搜索点再以loc_frac_prim为初始值,在[loc_frac_prim,loc_frac_prim+3]或[loc_frac_prim,loc_frac_prim-1]或[loc_frac_prim,loc_frac_prim+3]范围内计算每个搜索点对应的内插归一化相关性,当内插归一化相关性取得最大值时,该搜索点即为最优次要声道信号基音周期估计值,其中整数部分为pitch_soft_reuse,分数部分为pitch_frac_soft_reuse。
S122:利用主要声道信号基音周期和次要声道信号的基音周期进行差分编码。具体可 以包括如下过程:
S1221:计算差分编码中次要声道信号基音周期索引上限。
次要声道信号基音周期索引上限用下式计算得到:
soft_reuse_index_high_limit=2 Z
其中,Z为次要声道基音周期搜索范围调整因子。本实施例中Z=3,4,5。
S1222:计算差分编码中次要声道信号基音周期索引值。
次要声道信号基音周期索引表征了对前述步骤得到的次要声道信号基音周期的参考值和最优次要声道信号基音周期估计值的差值进行差分编码的结果。
次要声道信号基音周期索引值soft_reuse_index用下式计算得到:
soft_reuse_index=(4*pitch_soft_reuse+pitch_frac_soft_reuse)-(4*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/2。
S1223:对次要声道信号基音周期索引进行差分编码。
例如,对次要声道信号基音周期索引soft_reuse_index进行残差编码。
本申请实施例采用次要声道信号的基音周期码方法,每个编码帧被划分为4个子帧(subframe),对每个子帧的基音周期进行差分编码。与次要声道信号的基音周期独立编码相比可以节省22bits或18bits,并分配给其他编码参数用于量化编码,例如可以将节省的比特开销分配给固定码表(fixed codebook)。
采用本申请实施例完成主要声道信号和次要声道信号的其他参数编码,以得到主要声道信号和次要声道信号的编码码流,将编码数据按一定码流格式要求写入立体声编码码流中。
接下来对本申请实施例中节省次要声道信号的编码开销的效果进行举例说明,对于次要声道信号基音周期独立编码方式,分配给4个子帧的基音周期编码比特数分别为10,6,9,6,即编码每帧需要31bits。而采用本申请实施例所提出的面向次要声道信号基音周期差分编码方法,每个子帧只需要3bits用于差分编码,再需要1bit用于指示是否对次要声道信号的基音周期进行差分编码(1比特的取值可以为0或1,例如取值为1时需要进行差分编码,取值为0时不进行差分编码)。因此采用本申请实施例方法编码次要声道信号基音周期每帧只需要31-4×3=13bits。即可以节省18bits并分配给其他编码参数,例如固定码表参数等。
如图8所示,为采用独立编码方式和差分编码方式之后分配给固定码表的比特数的比较图,实线为独立编码之后分配给固定码表的比特数,虚线为差分编码之后分配给固定码表的比特数。从图8中可以看出采用面向次要声道信号的基音周期差分编码节省出的大量比特资源分配至固定码表的量化编码上,使次要声道信号的编码质量得到提升。
接下对解码端的执行的立体声解码算法进行举例说明,主要执行如下流程:
S13:从码流中读取Pitch_reuse_flag;
S14:在满足如下条件:次要声道信号的编码速率较低,且Pitch_reuse_flag=1时,进行次要声道信号的基音周期差分解码,否则进行次要声道信号的基音周期独立解码。
不限定的是,在不满足如下条件:次要声道信号的编码速率较低,且Pitch_reuse_flag=1时,还可以通过次要声道信号基音周期复用标识指示次要声道信号的 基音周期复用主要声道信号的基音周期估计值,则在解码端可以根据该次要声道信号基音周期复用标识将主要声道信号的基音周期作为次要声道信号的基音周期进行解码。
举例说明如下,次要声道基音周期差分编码标识用Pitch_reuse_flag来指示。DIFF_THR为预设的次要声道基音周期差分编码阈值,根据不同的编码速率确定次要声道基音周期差分编码阈值为{1,3,6}中的某一值。例如,当DIFF>DIFF_THR时Pitch_reuse_flag=1,此时判别当前帧采用次要声道信号的基音周期差分编码。当DIFF≤DIFF_THR时Pitch_reuse_flag=0,此时不进行基音周期差分编码,采用次要声道信号的独立编码。
S1401:基音周期映射。
在本实施例中基音周期编码按子帧进行,主要声道被划分为5个子帧,次要声道被划分为4个子帧。根据主要声道信号的基音周期估计值确定次要声道基音周期的参考值,其中一种方法是直接将主要声道的基音周期作为次要声道基音周期的参考值,即从主要声道5个子帧中的基音周期选出4个值作为次要声道4个子帧的基音周期参考值。另一种方法是采用插值方法将主要声道5个子帧中的基音周期映射为次要声道4个子帧的基音周期参考值。通过以上方法均可以得到次要声道闭环基音周期的整数部分loc_T0和分数部分loc_frac_prim。
S1402:计算次要声道闭环基音周期参考值。
采用下式计算得到次要声道闭环基音周期参考值f_pitch_prim:
f_pitch_prim=loc_T0+loc_frac_prim/4.0
S1403:计算差分编码中次要声道基音周期索引上限。
次要声道基音周期索引上限用下式计算得到:
soft_reuse_index_high_limit=0.5+2 Z
其中,Z为次要声道基音周期搜索范围调整因子。本实施例中Z可取3,或4,或5。
S1404:从码流中读取次要声道基音周期索引值soft_reuse_index;
S1405:计算次要声道信号的基音周期估计值。
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/2.0)/4.0。
T0=INT(T0_pitch),
T0_frac=(T0_pitch–T0)*4.0。
其中,INT(T0_pitch)表示对T0_pitch下取整运算,T0为解码次要声道基音周期的整数部分,T0_frac为解码次要声道基音周期的分数部分。
前述实施例中描述了频域下的立体声编解码过程,接下来描述将本申请实施例应用于时域立体声编码时,前述实施例中的步骤S01到S07将由下述步骤S21到S26代替。如图9所示,为本申请实施例提供的时域立体声编码方法的示意图,具体地:
S21、对立体声时域信号进行时域预处理,以得到预处理后的立体声左右声道信号。
若立体声音频信号的采样率为16KHz,一帧信号为20ms,帧长记作N,则N=320,即帧长为320个样点。当前帧的立体声信号包括当前帧的左声道时域信号以及当前帧的右声道时域信号,当前帧的左声道时域信号记作x L(n),当前帧的右声道时域信号记作x R(n),其 中n为样点序号,n=0,1,…,N-1。
对当前帧的左、右声道时域信号进行时域预处理,具体地可以包括对当前帧的左、右声道时域信号进行高通滤波处理,以得到当前帧预处理后的左、右声道时域信号。当前帧预处理后的左声道时域信号记作
Figure PCTCN2020096296-appb-000010
当前帧预处理后的右声道时域信号记作
Figure PCTCN2020096296-appb-000011
其中n为样点序号,n=0,1,…,N-1。
可以理解的是,对当前帧的左、右声道时域信号进行时域预处理不是必须要做的。如果没有时域预处理的步骤,则用于进行时延估计的左右声道信号就是原始立体声信号中的左右声道信号。这里原始立体声信号中的左右声道信号是指采集到的经过A/D转换后的PCM信号。信号的采样率可以包括8KHz、16KHz、32KHz、44.1KHz以及48KHz。
另外,预处理除了本实施例中描述的高通滤波处理,还可以包含其它处理,如预加重处理等,本申请实施例不做限定。
S22、根据当前帧预处理后的左、右声道时域信号,进行时延估计,获得当前帧估计出的声道间时延差。
最简单地,可以根据当前帧预处理后的左、右声道时域信号计算左右声道间的互相关函数。然后,搜索互相关函数的最大值,作为当前帧估计出的声道间时延差。
假设T max对应于当前采样率下声道间时延差取值的最大值,T min对应于当前采样率下声道间时延差取值的最小值。T max和T min为预先设定的实数,且T max>T min。在本实施例中,T max等于40,T min等于-40,在T min≤i≤T max范围内搜索左右声道间的互相关系数c(i)的最大值,以得到最大值对应的索引值,作为当前帧估计出的声道间时延差,记作cur_itd。
不限定的是,本申请实施例中还很多时延估计的具体方法,例如也可以是,根据当前帧预处理后的左、右声道时域信号或者根据当前帧的左、右声道时域信号计算左右声道间的互相关函数。然后,根据前L帧(L为大于等于1的整数)的左右声道间的互相关函数以及计算出的当前帧的左右声道间的互相关函数进行长时平滑处理,以得到平滑后的左右声道间的互相关函数,然后在T min≤i≤T max范围内搜索平滑后的左右声道间的互相关系数的最大值,以得到最大值对应的索引值,作为当前帧估计出的声道间时延差。还可以包括,对根据前M帧(M为大于等于1的整数)的声道间时延差和当前帧估计出的声道间时延差进行帧间平滑处理,用平滑后的声道间时延差作为当前帧最终估计出的声道间时延差。本申请实施例不限于以上所述的时延估计方法。
其中,当前帧估计出的声道时延差,通过在T min≤i≤T max范围内搜索左右声道间的互相关系数c(i)的最大值,以得到最大值对应的索引值。
S23、根据当前帧估计出的声道间时延差,对立体声左右声道信号进行时延对齐处理,以得到时延对齐后的立体声信号。
本申请实施例中对立体声左右声道信号进行时延对齐处理的方法有很多种,例如,根据当前帧估计出的声道间时延差以及前一帧的声道间时延差,对立体声左右声道信号中的一路或者两路进行压缩或拉伸处理,使得处理后得到的时延对齐后的立体声信号中两路信号不存在声道间时延差。本申请实施例不限于以上所述的时延对齐处理方法。
当前帧时延对齐后的左声道时域信号记作x′ L(n),当前帧时延对齐后的右声道时域信号记作x′ R(n),其中n为样点序号,n=0,1,…,N-1。
S24、量化编码当前帧估计出的声道间时延差。
量化声道间时延差的方法可以多种,例如对当前帧估计出的声道间时延差进行量化处理,以得到量化索引,然后对量化索引编码。将量化索引编码后写入码流。
S25、根据时延对齐后的立体声信号,计算声道组合比例因子并量化编码,可以增加将量化编码结果写入码流。
计算声道组合比例因子的方法有很多种。例如本申请实施例中计算声道组合比例因子的方法。首先根据当前帧时延对齐后的左、右声道时域信号,计算左、右声道的帧能量。
当前帧左声道的帧能量rms_L满足:
Figure PCTCN2020096296-appb-000012
当前帧右声道的帧能量rms_R满足:
Figure PCTCN2020096296-appb-000013
其中,x′ L(n)为当前帧时延对齐后的左声道时域信号,x′ R(n)为当前帧时延对齐后的右声道时域信号。
然后,根据左、右声道的帧能量,计算当前帧的声道组合比例因子。
计算得到的当前帧的声道组合比例因子ratio满足:
Figure PCTCN2020096296-appb-000014
最后,对计算出的当前帧声道组合比例因子进行量化,以得到比例因子对应的量化索引ratio_idx,及量化后的当前帧的声道组合比例因子ratio qua
ratio qua=ratio_tabl[ratio_idx],
其中,ratio_tabl为标量量化的码书。量化编码可以采用本申请实施例中的任何一种标量量化方法,如均匀的标量量化,也可以是非均匀的标量量化,编码比特数可以是5比特,这里对具体方法不再赘述。
本申请实施例不限于以上所述的声道组合比例因子计算和量化编码方法。
S26、根据声道组合比例因子对时延对齐后的立体声信号进行时域下混处理,以得到主要声道信号和次要声道信号。
具体地,可以使用本申请实施例中的任何一种时域下混处理实现。但是需要注意的是,需要根据声道组合比例因子的计算方法选择对应的时域下混处理方式,对时延对齐后的立体声信号进行时域下混处理,以得到主要声道信号和次要声道信号。
例如,上面的不用前述步骤5中的计算声道组合比例因子的方法,其对应的时域下混处理可以是:根据声道组合比例因子ratio进行时域下混处理,第一种声道组合方案对应的时域下混处理后得到的主要声道信号Y(n)和次要声道信号X(n)满足:
Figure PCTCN2020096296-appb-000015
本申请实施例不限于以上所述的时域下混处理方法。
S27、对次要声道信号进行差分编码。
对于步骤S27所包括的内容,详见前述实施例中步骤S10至步骤S12的描述,此处不再赘述。
通过前述的举例说明可知,本申请实施例中判决是否采用次要声道信号基音周期差分编码,通过差分编码的方式,可以节省对次要声道信号的基音周期的编码开销。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。
请参阅图10所示,本申请实施例提供的一种立体声编码装置1000,可以包括:下混模块1001、确定模块1002、差分编码模块1003,其中,
下混模块1001,用于对当前帧的左声道信号和所述当前帧的右声道信号进行下混处理,以得到所述当前帧的主要声道信号和所述当前帧的次要声道信号;
确定模块1002,用于确定是否对所述次要声道信号的基音周期进行差分编码;
差分编码模块1003,用于当确定对所述次要声道信号的基音周期进行差分编码时,使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,所述次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
在本申请的一些实施例中,所述确定模块,包括:
主要声道编码模块,用于对所述当前帧的主要声道信号进行编码,以得到所述主要声道信号的基音周期估计值;
开环分析模块,用于对所述当前帧的次要声道信号进行开环基音周期分析,以得到所述次要声道信号的开环基音周期估计值;
阈值判断模块,用于判断所述主要声道信号的基音周期估计值和所述次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值,当所述差值超过所述次要声道基音周期差分编码阈值时确定进行差分编码,当所述差值没有超过所述次要声道基音周期差分编码阈值时确定不进行差分编码。
在本申请的一些实施例中,所述立体声编码装置,还包括:标识配置模块,用于当确定对所述次要声道信号的基音周期进行差分编码时,将所述当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第一值用于指示对所述次要声道信号的基音周期进行差分编码。
在本申请的一些实施例中,所述立体声编码装置,还包括:独立编码模块,其中,
所述独立编码模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。
进一步的,在本申请的一些实施例中,所述标识配置模块,还用于当确定不对所述次要声道信号的基音周期进行差分编码时,将所述次要声道基音周期差分编码标识配置为预设的第二值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第二值用于指示不对所述次要声道信号的基音周期进行差分编码;当确定不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第三值,所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第三值用于指示不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期;
所述独立编码模块,用于对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。
在本申请的一些实施例中,所述标识配置模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
进一步的,在本申请的一些实施例中,所述标识配置模块,用于当确定不对所述次要声道信号的基音周期进行差分编码时,将所述次要声道基音周期差分编码标识配置为预设的第二值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第二值用于指示不对所述次要声道信号的基音周期进行差分编码;当确定复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
在本申请的一些实施例中,所述差分编码模块,包括:
闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值;
索引值上限确定模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;
索引值计算模块,用于根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值。
在本申请的一些实施例中,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;使用所述次要声道信号的闭环基音周期参考值作为所述次要声道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索, 以得到所述次要声道信号的基音周期估计值。
在本申请的一些实施例中,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的闭环基音周期参考值f_pitch_prim:
f_pitch_prim=loc_T0+loc_frac_prim/N;
其中,所述N表示所述次要声道信号被划分的子帧个数。
在本申请的一些实施例中,所述索引值上限确定模块,用于通过如下方式计算出所述次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;
soft_reuse_index_high_limit=0.5+2 Z
其中,所述Z为所述次要声道信号的基音周期搜索范围调整因子,所述Z的取值为:3、或者4、或者5。
在本申请的一些实施例中,所述索引值计算模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的基音周期索引值soft_reuse_index:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
其中,所述pitch_soft_reuse表示所述次要声道信号的基音周期估计值的整数部分,所述pitch_frac_soft_reuse表示所述次要声道信号的基音周期估计值的分数部分,所述soft_reuse_index_high_limit表示所述次要声道信号的基音周期索引值上限,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述*表示相乘运算符,所述+表示相加运算符,所述﹣表示相减运算符。
在本申请的一些实施例中,所述立体声编码装置应用于所述当前帧的编码速率低于预设的速率阈值的立体声编码场景;
所述速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。
请参阅图11所示,本申请实施例提供的一种立体声解码装置1100,可以包括:确定模块1101、值获取模块1102、差分解码模块1103,其中,
确定模块1101,用于根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码;
值获取模块1102,用于当确定对所述次要声道信号的基音周期进行差分解码时,从所述立体声编码码流中获取当前帧的主要声道信号的基音周期估计值和所述当前帧的次要声道信号的基音周期索引值;
差分解码模块1103,用于根据所述主要声道信号的基音周期估计值和所述次要声道信号的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,以得到所述次要声道信号的基音周期估计值,所述次要声道信号的基音周期估计值用于对所述立体声编码码 流进行解码。
在本申请的一些实施例中,所述确定模块,用于从所述当前帧中获取次要声道基音周期差分编码标识;当所述次要声道基音周期差分编码标识为预设的第一值时,确定对所述次要声道信号的基音周期进行差分解码。
在本申请的一些实施例中,所述立体声解码装置,还包括:独立解码模块,其中,
独立解码模块,用于当确定不对所述次要声道信号的基音周期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,从所述立体声编码码流中解码所述次要声道信号的基音周期。
进一步的,独立解码模块,用于当所述次要声道基音周期差分编码标识为预设的第二值、且所述立体声编码码流中携带的次要声道信号基音周期复用标识为预设的第三值时,确定不对所述次要声道信号的基音周期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期,从所述立体声编码码流中解码所述次要声道信号的基音周期。
在本申请的一些实施例中,所述立体声解码装置,还包括:基音周期复用模块,其中,
所述基音周期复用模块,用于当确定不对所述次要声道信号的基音周期进行差分解码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
进一步的,所述基音周期复用模块,用于当所述次要声道基音周期差分编码标识为预设的第二值、且所述立体声编码码流中携带的次要声道信号基音周期复用标识为预设的第四值时,确定不对所述次要声道信号的基音周期进行差分解码,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
在本申请的一些实施例中,所述差分解码模块,包括:
参考值确定子模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;
索引值上限确定子模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;
估计值计算子模块,用于根据所述次要声道信号的闭环基音周期参考值、所述次要声道的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值。
在本申请的一些实施例中,所述估计值计算子模块,用于通过如下方式计算出所述次要声道信号的基音周期估计值T0_pitch:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
其中,所述f_pitch_prim表示所述次要声道信号的闭环基音周期参考值,所述soft_reuse_index表示所述次要声道信号的基音周期索引值,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述/表示相除运算符,所述+表示相加运算符,所述﹣表示相减运算符。
通过前述实施例的举例说明,本申请实施例中由于使用了主要声道信号的基音周期估计值对次要声道信号的基音周期进行差分编码,可以使用少量比特资源分配给次要声道信 号的基音周期进行差分编码,通过对次要声道信号的基音周期进行差分编码,可以提高立体声信号的空间感和声像稳定性。另外,本申请实施例中采用较小的比特资源进行了次要声道信号的基音周期的差分编码,因此可以将节省的比特资源用于立体声的其他编码参数,进而提升了次要声道的编码效率,最终提升了整体的立体声编码质量。另外,本申请实施例中在可以对次要声道信号的基音周期进行差分解码时,可以使用主要声道信号的基音周期估计值和次要声道信号的基音周期索引值对次要声道信号的基音周期进行差分解码,因此得到次要声道信号的基音周期估计值,使用该次要声道信号的基音周期估计值可以对立体声编码码流进行解码,因此可以提高立体声信号的空间感和声像稳定性。
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储有程序,该程序执行包括上述方法实施例中记载的部分或全部步骤。
接下来介绍本申请实施例提供的另一种立体声编码装置,请参阅图12所示,立体声编码装置1200包括:
接收器1201、发射器1202、处理器1203和存储器1204(其中立体声编码装置1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例)。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接,其中,图12中以通过总线连接为例。
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1203控制立体声编码装置的操作,处理器1203还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,立体声编码装置的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处 理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。
接收器1201可用于接收输入的数字或字符信息,以及产生与立体声编码装置的相关设置以及功能控制有关的信号输入,发射器1202可包括显示屏等显示设备,发射器1202可用于通过外接接口输出数字或字符信息。
本申请实施例中,处理器1203用于执行前述实施例图4所示的由立体声编码装置执行的立体声编码方法。
接下来介绍本申请实施例提供的另一种立体声解码装置,请参阅图13所示,立体声解码装置1300包括:
接收器1301、发射器1302、处理器1303和存储器1304(其中立体声解码装置1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例)。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接,其中,图13中以通过总线连接为例。
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括NVRAM。存储器1304存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1303控制立体声解码装置的操作,处理器1303还可以称为CPU。具体的应用中,立体声解码装置的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中,处理器1303,用于执行前述实施例图4所示的由立体声解码装置执行的立体声解码方法。
在另一种可能的设计中,当立体声编码装置或者立体声解码装置为终端内的芯片时, 芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行上述第一方面任意一项的无线通信方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面或第二方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (46)

  1. 一种立体声编码方法,其特征在于,包括:
    对当前帧的左声道信号和所述当前帧的右声道信号进行下混处理,以得到所述当前帧的主要声道信号和所述当前帧的次要声道信号;
    当确定对所述次要声道信号的基音周期进行差分编码时,使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,所述次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述当前帧的主要声道信号进行编码,以得到所述主要声道信号的基音周期估计值;
    对所述当前帧的次要声道信号进行开环基音周期分析,以得到所述次要声道信号的开环基音周期估计值;
    判断所述主要声道信号的基音周期估计值和所述次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值;
    当所述差值超过所述次要声道基音周期差分编码阈值时,确定对所述次要声道信号的基音周期进行差分编码;或,
    当所述差值没有超过所述次要声道基音周期差分编码阈值时,确定不对所述次要声道信号的基音周期进行差分编码。
  3. 根据权利要求1或2所述的方法,其特征在于,当确定对所述次要声道信号的基音周期进行差分编码时,所述方法还包括:
    将所述当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第一值用于指示对所述次要声道信号的基音周期进行差分编码。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    当确定不对所述次要声道信号的基音周期进行差分编码且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,包括:
    根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值;
    根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周 期索引值上限;
    根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值,包括:
    根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;
    使用所述次要声道信号的闭环基音周期参考值作为所述次要声道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值,包括:
    根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;
    通过如下方式计算出所述次要声道信号的闭环基音周期参考值f_pitch_prim:
    f_pitch_prim=loc_T0+loc_frac_prim/N;
    其中,所述N表示所述次要声道信号被划分的子帧个数。
  9. 根据权利要求6所述的方法,其特征在于,所述根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限,包括:
    通过如下方式计算出所述次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;
    soft_reuse_index_high_limit=0.5+2 Z
    其中,所述Z为所述次要声道信号的基音周期搜索范围调整因子。
  10. 根据权利要求9所述的方法,其特征在于,所述Z的取值为3、或者4、或者5。
  11. 根据权利要求6所述的方法,其特征在于,所述根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值,包括:
    根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;
    通过如下方式计算出所述次要声道信号的基音周期索引值soft_reuse_index:
    soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
    其中,所述pitch_soft_reuse表示所述次要声道信号的基音周期估计值的整数部分,所述pitch_frac_soft_reuse表示所述次要声道信号的基音周期估计值的分数部分,所述soft_reuse_index_high_limit表示所述次要声道信号的基音周期索引值上限,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值 上限的调整因子,M为非零的实数,所述*表示相乘运算符,所述+表示相加运算符,所述﹣表示相减运算符。
  12. 根据权利要求11所述的方法,其特征在于,所述次要声道信号的基音周期索引值上限的调整因子的取值为2,或者3。
  13. 根据权利要求1至12中任一项所述的方法,其特征在于,所述方法应用于所述当前帧的编码速率低于预设的速率阈值的立体声编码场景;
    所述速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。
  14. 一种立体声解码方法,其特征在于,包括:
    根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码;
    当确定对所述次要声道信号的基音周期进行差分解码时,从所述立体声编码码流中获取当前帧的主要声道的基音周期估计值和所述当前帧的次要声道的基音周期索引值;
    根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,以得到所述次要声道信号的基音周期估计值,所述次要声道信号的基音周期估计值用于对所述立体声编码码流进行解码。
  15. 根据权利要求14所述的方法,其特征在于,所述根据接收到的立体声编码码流确定是否对所述次要声道信号的基音周期进行差分解码,包括:
    从所述当前帧中获取次要声道基音周期差分编码标识;
    当所述次要声道基音周期差分编码标识为预设的第一值时,确定对所述次要声道信号的基音周期进行差分解码。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    当确定不对所述次要声道信号的基音周期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,从所述立体声编码码流中解码所述次要声道信号的基音周期。
  17. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    当确定不对所述次要声道信号的基音周期进行差分解码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
  18. 根据权利要求14至17中任一项所述的方法,其特征在于,所述根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,包括:
    根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;
    根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;
    根据所述次要声道信号的闭环基音周期参考值、所述次要声道的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值。
  19. 根据权利要求18所述的方法,其特征在于,所述根据所述次要声道信号的闭环基 音周期参考值、所述次要声道信号的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值,包括:
    通过如下方式计算出所述次要声道信号的基音周期估计值T0_pitch:
    T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
    其中,所述f_pitch_prim表示所述次要声道信号的闭环基音周期参考值,所述soft_reuse_index表示所述次要声道信号的基音周期索引值,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述/表示相除运算符,所述+表示相加运算符,所述﹣表示相减运算符。
  20. 根据权利要求19所述的方法,其特征在于,所述次要声道信号的基音周期索引值上限的调整因子的取值为2,或者3。
  21. 一种立体声编码装置,其特征在于,包括:
    下混模块,用于对当前帧的左声道信号和所述当前帧的右声道信号进行下混处理,以得到所述当前帧的主要声道信号和所述当前帧的次要声道信号;
    差分编码模块,用于当确定对所述次要声道信号的基音周期进行差分编码时,使用所述主要声道信号的基音周期估计值对所述次要声道信号的基音周期进行差分编码,以得到所述次要声道信号的基音周期索引值,所述次要声道信号的基音周期索引值用于生成待发送的立体声编码码流。
  22. 根据权利要求21所述的装置,其特征在于,所述立体声编码装置,还包括:
    主要声道编码模块,用于对所述当前帧的主要声道信号进行编码,以得到所述主要声道信号的基音周期估计值;
    开环分析模块,用于对所述当前帧的次要声道信号进行开环基音周期分析,以得到所述次要声道信号的开环基音周期估计值;
    阈值判断模块,用于判断所述主要声道信号的基音周期估计值和所述次要声道信号的开环基音周期估计值之间的差值是否超过预设的次要声道基音周期差分编码阈值,当所述差值超过所述次要声道基音周期差分编码阈值时,确定对所述次要声道信号的基音周期进行差分编码,当所述差值没有超过所述次要声道基音周期差分编码阈值时,确定不对所述次要声道信号的基音周期进行差分编码。
  23. 根据权利要求21或22所述的装置,其特征在于,所述立体声编码装置,还包括:标识配置模块,用于当确定对所述次要声道信号的基音周期进行差分编码时,将所述当前帧中的次要声道基音周期差分编码标识配置为预设的第一值,所述立体声编码码流中携带所述次要声道基音周期差分编码标识,所述第一值用于指示对所述次要声道信号的基音周期进行差分编码。
  24. 根据权利要求21至23中任一项所述的装置,其特征在于,所述立体声编码装置,还包括:独立编码模块,其中,
    所述独立编码模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,对所述次要声道信号的基音周期和所述主要声道信号的基音周期分别进行编码。
  25. 根据权利要求21至23中任一项所述的装置,其特征在于,所述立体声编码装置, 还包括:标识配置模块,用于当确定不对所述次要声道信号的基音周期进行差分编码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将次要声道信号基音周期复用标识配置为预设的第四值,并在所述立体声编码码流中携带所述次要声道信号基音周期复用标识,所述第四值用于指示复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
  26. 根据权利要求21至25中任一项所述的装置,其特征在于,所述差分编码模块,包括:
    闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值进行次要声道的闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值;
    索引值上限确定模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;
    索引值计算模块,用于根据所述主要声道信号的基音周期估计值、所述次要声道信号的基音周期估计值和次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期索引值。
  27. 根据权利要求26所述的装置,其特征在于,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;使用所述次要声道信号的闭环基音周期参考值作为所述次要声道信号的闭环基音周期搜索的起始点,采用整数精度和分数精度进行闭环基音周期搜索,以得到所述次要声道信号的基音周期估计值。
  28. 根据权利要求27所述的装置,其特征在于,所述闭环基音周期搜索模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的闭环基音周期参考值f_pitch_prim:
    f_pitch_prim=loc_T0+loc_frac_prim/N;
    其中,所述N表示所述次要声道信号被划分的子帧个数。
  29. 根据权利要求26所述的装置,其特征在于,所述索引值上限确定模块,用于通过如下方式计算出所述次要声道信号的基音周期索引值上限soft_reuse_index_high_limit;
    soft_reuse_index_high_limit=0.5+2 Z
    其中,所述Z为所述次要声道信号的基音周期搜索范围调整因子。
  30. 根据权利要求29所述的装置,其特征在于,所述Z的取值为:3、或者4、或者5。
  31. 根据权利要求26所述的装置,其特征在于,所述索引值计算模块,用于根据所述主要声道信号的基音周期估计值确定所述次要声道信号的闭环基音周期整数部分loc_T0,和所述次要声道信号的闭环基音周期分数部分loc_frac_prim;通过如下方式计算出所述次要声道信号的基音周期索引值soft_reuse_index:
    soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
    其中,所述pitch_soft_reuse表示所述次要声道信号的基音周期估计值的整数部分,所述pitch_frac_soft_reuse表示所述次要声道信号的基音周期估计值的分数部分,所述 soft_reuse_index_high_limit表示所述次要声道信号的基音周期索引值上限,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述*表示相乘运算符,所述+表示相加运算符,所述﹣表示相减运算符。
  32. 根据权利要求31所述的装置,其特征在于,所述次要声道信号的基音周期索引值上限的调整因子的取值为2,或者3。
  33. 根据权利要求21至32中任一项所述的装置,其特征在于,所述立体声编码装置应用于所述当前帧的编码速率低于预设的速率阈值的立体声编码场景;
    所述速率阈值为如下取值中的至少一种:13.2千比特每秒kbps、16.4kbps、或24.4kbps。
  34. 一种立体声解码装置,其特征在于,包括:
    确定模块,用于根据接收到的立体声编码码流确定是否对次要声道信号的基音周期进行差分解码;
    值获取模块,用于当确定对所述次要声道信号的基音周期进行差分解码时,从所述立体声编码码流中获取当前帧的主要声道的基音周期估计值和所述当前帧的次要声道的基音周期索引值;
    差分解码模块,用于根据所述主要声道的基音周期估计值和所述次要声道的基音周期索引值,对所述次要声道信号的基音周期进行差分解码,以得到所述次要声道信号的基音周期估计值,所述次要声道信号的基音周期估计值用于对所述立体声编码码流进行解码。
  35. 根据权利要求34所述的装置,其特征在于,所述确定模块,用于从所述当前帧中获取次要声道基音周期差分编码标识;当所述次要声道基音周期差分编码标识为预设的第一值时,确定对所述次要声道信号的基音周期进行差分解码。
  36. 根据权利要求35所述的装置,其特征在于,所述立体声解码装置,还包括:独立解码模块,其中,
    所述独立解码模块,用于当确定不对所述次要声道信号的基音周期进行差分解码、且不复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,从所述立体声编码码流中解码所述次要声道信号的基音周期。
  37. 根据权利要求35所述的装置,其特征在于,所述立体声解码装置,还包括:基音周期复用模块,其中,
    所述基音周期复用模块,用于当确定不对所述次要声道信号的基音周期进行差分解码且复用所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期时,将所述主要声道信号的基音周期估计值作为所述次要声道信号的基音周期。
  38. 根据权利要求34至37中任一项所述的装置,其特征在于,所述差分解码模块,包括:
    参考值确定子模块,用于根据所述主要声道信号的基音周期估计值和所述当前帧的次要声道信号被划分的子帧个数,确定所述次要声道信号的闭环基音周期参考值;
    索引值上限确定子模块,用于根据所述次要声道信号的基音周期搜索范围调整因子确定所述次要声道信号的基音周期索引值上限;
    估计值计算子模块,用于根据所述次要声道信号的闭环基音周期参考值、所述次要声道的基音周期索引值和所述次要声道信号的基音周期索引值上限计算出所述次要声道信号的基音周期估计值。
  39. 根据权利要求38所述的装置,其特征在于,所述估计值计算子模块,用于通过如下方式计算出所述次要声道信号的基音周期估计值T0_pitch:
    T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
    其中,所述f_pitch_prim表示所述次要声道信号的闭环基音周期参考值,所述soft_reuse_index表示所述次要声道信号的基音周期索引值,所述N表示所述次要声道信号被划分的子帧个数,所述M表示所述次要声道信号的基音周期索引值上限的调整因子,M为非零的实数,所述/表示相除运算符,所述+表示相加运算符,所述﹣表示相减运算符。
  40. 根据权利要求39所述的装置,其特征在于,所述次要声道信号的基音周期索引值上限的调整因子的取值为2,或者3。
  41. 一种立体声编码装置,其特征在于,所述立体声编码装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求1至13中任一项所述的方法。
  42. 根据权利要求41所述的立体声编码装置,其特征在于,所述立体声编码装置还包括:所述存储器。
  43. 一种立体声解码装置,其特征在于,所述立体声解码装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求14至20中任一项所述的方法。
  44. 根据权利要求43所述的立体声解码装置,其特征在于,所述立体声解码装置还包括:所述存储器。
  45. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至13、或者14至20中任意一项所述的方法。
  46. 一种计算机可读存储介质,其特征在于,包括如权利要求1至13任意一项所述的方法所生成的立体声编码码流。
PCT/CN2020/096296 2019-06-29 2020-06-16 一种立体声编码方法、立体声解码方法和装置 WO2021000723A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20835190.8A EP3975175A4 (en) 2019-06-29 2020-06-16 METHODS AND DEVICES FOR STEREO CODING AND STEREO DECODING
JP2021577947A JP7337966B2 (ja) 2019-06-29 2020-06-16 ステレオエンコーディング方法及び装置、並びにステレオデコーディング方法及び装置
US17/563,538 US20220122619A1 (en) 2019-06-29 2021-12-28 Stereo Encoding Method and Apparatus, and Stereo Decoding Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910581398.5A CN112233682A (zh) 2019-06-29 2019-06-29 一种立体声编码方法、立体声解码方法和装置
CN201910581398.5 2019-06-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/563,538 Continuation US20220122619A1 (en) 2019-06-29 2021-12-28 Stereo Encoding Method and Apparatus, and Stereo Decoding Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2021000723A1 true WO2021000723A1 (zh) 2021-01-07

Family

ID=74101099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096296 WO2021000723A1 (zh) 2019-06-29 2020-06-16 一种立体声编码方法、立体声解码方法和装置

Country Status (5)

Country Link
US (1) US20220122619A1 (zh)
EP (1) EP3975175A4 (zh)
JP (1) JP7337966B2 (zh)
CN (1) CN112233682A (zh)
WO (1) WO2021000723A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151045A (zh) * 2019-06-29 2020-12-29 华为技术有限公司 一种立体声编码方法、立体声解码方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101069232A (zh) * 2004-11-30 2007-11-07 松下电器产业株式会社 立体声编码装置、立体声解码装置及其方法
CN101118747A (zh) * 2003-12-19 2008-02-06 艾利森电话股份有限公司 保真度优化的预回声抑制编码
CN101313355A (zh) * 2005-09-27 2008-11-26 Lg电子株式会社 编码/解码多声道音频信号的方法和装置
CN101981616A (zh) * 2008-04-04 2011-02-23 松下电器产业株式会社 立体声信号变换装置、立体声信号逆变换装置及其方法
CN107592937A (zh) * 2015-03-09 2018-01-16 弗劳恩霍夫应用研究促进协会 用于对多声道信号进行编码或解码的装置与方法
CN108352162A (zh) * 2015-09-25 2018-07-31 沃伊斯亚吉公司 用于使用主声道的编码参数编码立体声声音信号以编码辅声道的方法和系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE519985C2 (sv) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Kodning och avkodning av signaler från flera kanaler
JP3453116B2 (ja) * 2000-09-26 2003-10-06 パナソニック モバイルコミュニケーションズ株式会社 音声符号化方法及び装置
US6584437B2 (en) * 2001-06-11 2003-06-24 Nokia Mobile Phones Ltd. Method and apparatus for coding successive pitch periods in speech signal
CN101027718A (zh) * 2004-09-28 2007-08-29 松下电器产业株式会社 可扩展性编码装置以及可扩展性编码方法
JP2009518659A (ja) * 2005-09-27 2009-05-07 エルジー エレクトロニクス インコーポレイティド マルチチャネルオーディオ信号の符号化/復号化方法及び装置
US9269366B2 (en) * 2009-08-03 2016-02-23 Broadcom Corporation Hybrid instantaneous/differential pitch period coding
FR2966634A1 (fr) * 2010-10-22 2012-04-27 France Telecom Codage/decodage parametrique stereo ameliore pour les canaux en opposition de phase
SG10201808285UA (en) * 2014-03-28 2018-10-30 Samsung Electronics Co Ltd Method and device for quantization of linear prediction coefficient and method and device for inverse quantization
CN107731238B (zh) * 2016-08-10 2021-07-16 华为技术有限公司 多声道信号的编码方法和编码器
CN112151045A (zh) * 2019-06-29 2020-12-29 华为技术有限公司 一种立体声编码方法、立体声解码方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118747A (zh) * 2003-12-19 2008-02-06 艾利森电话股份有限公司 保真度优化的预回声抑制编码
CN101069232A (zh) * 2004-11-30 2007-11-07 松下电器产业株式会社 立体声编码装置、立体声解码装置及其方法
CN101313355A (zh) * 2005-09-27 2008-11-26 Lg电子株式会社 编码/解码多声道音频信号的方法和装置
CN101981616A (zh) * 2008-04-04 2011-02-23 松下电器产业株式会社 立体声信号变换装置、立体声信号逆变换装置及其方法
CN107592937A (zh) * 2015-03-09 2018-01-16 弗劳恩霍夫应用研究促进协会 用于对多声道信号进行编码或解码的装置与方法
CN108352162A (zh) * 2015-09-25 2018-07-31 沃伊斯亚吉公司 用于使用主声道的编码参数编码立体声声音信号以编码辅声道的方法和系统

Also Published As

Publication number Publication date
EP3975175A4 (en) 2022-07-20
JP2022539571A (ja) 2022-09-12
JP7337966B2 (ja) 2023-09-04
EP3975175A1 (en) 2022-03-30
CN112233682A (zh) 2021-01-15
US20220122619A1 (en) 2022-04-21

Similar Documents

Publication Publication Date Title
RU2667382C2 (ru) Улучшение классификации между кодированием во временной области и кодированием в частотной области
US9117458B2 (en) Apparatus for processing an audio signal and method thereof
US11664034B2 (en) Optimized coding and decoding of spatialization information for the parametric coding and decoding of a multichannel audio signal
KR20080044707A (ko) 오디오/스피치 신호 부호화 및 복호화 방법 및 장치
US11640825B2 (en) Time-domain stereo encoding and decoding method and related product
US11120807B2 (en) Method for determining audio coding/decoding mode and related product
US20180330740A1 (en) Apparatus and method for mdct m/s stereo with global ild with improved mid/side decision
US20240153511A1 (en) Time-domain stereo encoding and decoding method and related product
JP2021526239A (ja) ステレオ信号エンコード方法および装置
JP2022163058A (ja) ステレオ信号符号化方法およびステレオ信号符号化装置
WO2017206794A1 (zh) 一种声道间相位差参数的提取方法及装置
WO2021000723A1 (zh) 一种立体声编码方法、立体声解码方法和装置
WO2021000724A1 (zh) 一种立体声编码方法、立体声解码方法和装置
EP2212883B1 (en) An encoder
US11727943B2 (en) Time-domain stereo parameter encoding method and related product
US20110191112A1 (en) Encoder
CA3163373A1 (en) Switching between stereo coding modes in a multichannel sound codec

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20835190

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021577947

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020835190

Country of ref document: EP

Effective date: 20211222