CN112233682A - Stereo coding method, stereo decoding method and device - Google Patents

Stereo coding method, stereo decoding method and device Download PDF

Info

Publication number
CN112233682A
CN112233682A CN201910581398.5A CN201910581398A CN112233682A CN 112233682 A CN112233682 A CN 112233682A CN 201910581398 A CN201910581398 A CN 201910581398A CN 112233682 A CN112233682 A CN 112233682A
Authority
CN
China
Prior art keywords
channel signal
pitch period
secondary channel
pitch
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910581398.5A
Other languages
Chinese (zh)
Inventor
艾雅·苏谟特
高原
王宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910581398.5A priority Critical patent/CN112233682A/en
Priority to PCT/CN2020/096296 priority patent/WO2021000723A1/en
Priority to EP20835190.8A priority patent/EP3975175A4/en
Priority to JP2021577947A priority patent/JP7337966B2/en
Publication of CN112233682A publication Critical patent/CN112233682A/en
Priority to US17/563,538 priority patent/US20220122619A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1

Abstract

The embodiment of the application discloses a stereo coding method, a stereo decoding method and a stereo decoding device, which are used for improving the stereo coding and decoding performance. The embodiment of the application provides a stereo coding method, which comprises the following steps: performing down-mixing processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame; when the pitch period of the secondary channel signal is determined to be differentially encoded, differentially encoding the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, wherein the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be transmitted.

Description

Stereo coding method, stereo decoding method and device
Technical Field
The present application relates to the field of stereo technology, and in particular, to a stereo encoding method, a stereo decoding method, and an apparatus.
Background
At present, monophonic audio has not been able to meet people's demand for high quality audio. Compared with single-channel audio, stereo audio has the direction sense and the distribution sense of each sound source, and can improve the definition, the intelligibility and the presence sense of information, thereby being popular with people.
In order to better transmit stereo signals with limited bandwidth, it is usually necessary to encode the stereo signals first and then transmit the encoded code stream to the decoding end through a channel. And decoding processing is carried out at a decoding end according to the received code stream to obtain a decoded stereo signal for playback.
There are many different implementations of stereo codec, such as down-mixing the time domain signal into two mono signals at the encoding end. The left and right channels are typically first downmixed into a primary channel signal and a secondary channel signal. Then, the primary channel signal and the secondary channel signal are encoded by a mono encoding method, respectively. For the main channel signal, a larger number of bits is usually used for encoding; for the secondary channel signal, no encoding is usually performed. In decoding, the primary channel signal and the secondary channel signal are usually decoded separately according to the received code stream, and then time domain upmixing processing is performed to obtain a decoded stereo signal.
For stereo signals, an important feature that is distinguished from mono signals is that sound has image information, making the sound sense of spatialization stronger. In stereo signals, the accuracy of the secondary channel signal better represents the spatial perception of the stereo signal, while the accuracy of the secondary channel coding plays an important role in the stability of the stereo image.
In stereo coding, the pitch period, which is an important feature for human speech generation, is an important parameter for primary channel signal coding and secondary channel signal coding. The accuracy of the pitch parameter prediction value affects the coding quality of the entire stereo. In stereo coding in the time or frequency domain, the stereo parameters and the primary and secondary channel signals can be obtained by analyzing the input signal. In case of a relatively low coding rate (e.g. 24.4kbps and lower), the encoder typically only encodes the primary channel signal and not the secondary channel signal, e.g. directly using the pitch period of the primary channel signal as the pitch period of the secondary channel signal. Since the secondary channel signal is not decoded, the spatial perception of the decoded stereo signal is poor, and the sound image stability is greatly affected by the difference between the pitch period parameter of the primary channel signal and the pitch period parameter of the actual secondary channel signal, thereby reducing the coding performance of stereo coding. Accordingly, decoding performance of stereo decoding is also reduced.
Disclosure of Invention
The embodiment of the application provides a stereo coding method, a stereo decoding method and a stereo decoding device, which are used for improving the stereo coding and decoding performance.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides a stereo encoding method, including: performing down-mixing processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame; when the pitch period of the secondary channel signal is determined to be differentially encoded, differentially encoding the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, wherein the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be transmitted.
In the embodiment of the present application, first, a left channel signal of a current frame and a right channel signal of the current frame are subjected to downmix processing to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame, and when it is determined to perform differential coding on a pitch period of the secondary channel signal, a pitch period of the secondary channel signal is subjected to differential coding using a pitch period estimation value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, where the pitch period index value of the secondary channel signal is used to generate a stereo coded code stream to be transmitted. In the embodiment of the application, because the pitch period of the secondary channel signal is differentially encoded by using the pitch period estimation value of the primary channel signal, the pitch period allocated to the secondary channel signal by using a small amount of bit resources can be differentially encoded, and the spatial impression and the image stability of the stereo signal can be improved by differentially encoding the pitch period of the secondary channel signal. In addition, in the embodiment of the application, the differential coding of the pitch period of the secondary channel signal is performed by using smaller bit resources, so that the saved bit resources can be used for other coding parameters of stereo, the coding efficiency of the secondary channel is further improved, and the overall stereo coding quality is finally improved.
In one possible implementation, the determining whether to differentially encode the pitch period of the secondary channel signal includes: coding the main sound channel signal of the current frame to obtain a pitch period estimated value of the main sound channel signal; performing open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal; judging whether the difference value between the pitch period estimated value of the primary sound channel signal and the open-loop pitch period estimated value of the secondary sound channel signal exceeds a preset secondary sound channel pitch period differential coding threshold value or not; when the difference value exceeds the secondary channel pitch period differential coding threshold value, determining to perform differential coding on the pitch period of the secondary channel signal; or, when the difference does not exceed the secondary channel pitch lag differential coding threshold, determining not to differentially code the pitch lag of the secondary channel signal.
In the embodiment of the present application, coding may be performed according to the main channel signal, so as to obtain a pitch period estimation value of the main channel signal. After the secondary channel signal of the current frame is acquired, the secondary channel signal may be subjected to open-loop pitch period analysis, so that an open-loop pitch period estimation value of the secondary channel signal may be obtained. After obtaining the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal, a difference between the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal may be calculated, and then it is determined whether the difference exceeds a preset secondary channel pitch period differential coding threshold. The pitch period difference coding threshold of the secondary channel can be preset, and can be flexibly configured by combining a stereo coding scene. And determining to perform differential encoding when the difference value exceeds the secondary channel pitch period differential encoding threshold, and determining not to perform differential encoding when the difference value does not exceed the secondary channel pitch period differential encoding threshold.
In one possible implementation, when determining to differentially encode the pitch period of the secondary channel signal, the method further comprises: configuring a secondary channel pitch period differential coding identifier in the current frame as a preset first value, wherein the stereo coding code stream carries the secondary channel pitch period differential coding identifier, and the first value is used for indicating that the pitch period of the secondary channel signal is differentially coded. The coding end obtains a secondary channel pitch period differential coding identifier, the value of the secondary channel pitch period differential coding identifier can be configured according to whether differential coding is performed on the pitch period of the secondary channel signal, and the secondary channel pitch period differential coding identifier is used for indicating whether differential coding is performed on the pitch period of the secondary channel signal. The secondary channel pitch difference coding flag may have a plurality of values, for example, the secondary channel pitch difference coding flag may be a preset first value or configured as a second value. The following illustrates a method for configuring the secondary channel pitch lag differential coding flag, which is configured to be a first value when the pitch lag of the secondary channel signal is determined to be differentially coded.
In one possible implementation, the method further includes: when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimate of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, the pitch period of the secondary channel signal and the pitch period of the primary channel signal are encoded separately. In this case, the pitch period of the secondary channel signal may be encoded by using a pitch period independent encoding method for the secondary channel, so that the pitch period of the secondary channel signal may be encoded.
In one possible implementation, the method further includes: when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset fourth value, and carrying the pitch period multiplexing identifier of the secondary channel signal in the stereo coded code stream, where the fourth value is used to indicate that the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal. When the pitch period of the secondary channel signal is not differentially encoded, a pitch period multiplexing method may also be adopted in the embodiment of the present application. That is, the secondary channel pitch period is not encoded at the encoding end, but a secondary channel signal pitch period multiplexing identifier is carried in the stereo encoded code stream, the secondary channel signal pitch period multiplexing identifier indicates whether the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, and when the secondary channel signal pitch period multiplexing identifier indicates that the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, the decoding end can decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
In one possible implementation, the differentially encoding the pitch lag of the secondary channel signal using the pitch lag estimate of the primary channel signal to obtain the pitch lag index value of the secondary channel signal includes: performing closed-loop pitch period search of a secondary channel according to the pitch period estimated value of the primary channel signal to obtain the pitch period estimated value of the secondary channel signal; determining the upper limit of the pitch period index value of the secondary sound channel signal according to the pitch period searching range adjusting factor of the secondary sound channel signal; and calculating the pitch period index value of the secondary sound channel signal according to the pitch period estimated value of the primary sound channel signal, the pitch period estimated value of the secondary sound channel signal and the pitch period index value upper limit of the secondary sound channel signal. The encoding end may perform a closed-loop pitch period search of the secondary channel according to the pitch period estimation value of the secondary channel signal to determine the pitch period estimation value of the secondary channel signal. The pitch search range adjustment factor of the secondary channel signal may be used to adjust a pitch index value of the secondary channel signal to determine an upper pitch index value limit of the secondary channel signal. The pitch period index value upper limit of the secondary channel signal represents an upper limit value that the pitch period index value of the secondary channel signal cannot exceed. The pitch index value of the secondary channel signal may be used to determine the pitch index value of the secondary channel signal. After determining the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal, the coding end performs differential coding according to the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal, and outputs the pitch period index value of the secondary channel signal.
In a possible implementation manner, the performing a closed-loop pitch period search of a secondary channel according to a pitch period estimated value of the primary channel signal to obtain a pitch period estimated value of the secondary channel signal includes: determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimated value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimation value of the secondary channel signal. The number of divided subframes of the secondary channel signal of the current frame may be determined by subframe configuration of the secondary channel signal, for example, the number of divided subframes may be 4 subframes, or 3 subframes, and is determined by specifically combining an application scenario. After obtaining the pitch period estimate of the primary channel signal, a closed-loop pitch period reference value of the secondary channel signal may be calculated using the pitch period estimate of the primary channel signal and the number of subframes into which the secondary channel signal is divided. The closed-loop pitch reference value of the secondary channel signal is a reference value determined from a pitch estimate of the primary channel signal, the closed-loop pitch reference value of the secondary channel signal representing a closed-loop pitch of the secondary channel signal determined with the pitch estimate of the primary channel signal as a reference.
In a possible implementation manner, the determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame includes: determining a closed loop pitch integer portion of the secondary channel signal loc _ T0 and a closed loop pitch fractional portion of the secondary channel signal loc _ frac _ prim from the pitch estimate of the primary channel signal; calculating a closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal by: f _ pitch _ prim ═ loc _ T0+ loc _ frac _ prim/N; wherein the N represents the number of sub-frames into which the secondary channel signal is divided. Specifically, the closed-loop pitch period integer part and the closed-loop pitch period fractional part of the secondary channel signal are first determined according to the pitch period estimation value of the primary channel signal, for example, the integer part of the pitch period estimation value of the primary channel signal is directly used as the closed-loop pitch period integer part of the secondary channel signal, the fractional part of the pitch period estimation value of the primary channel signal is used as the closed-loop pitch period fractional part of the secondary channel signal, and the pitch period estimation value of the primary channel signal may be mapped to the closed-loop pitch period integer part and the closed-loop pitch period fractional part of the secondary channel signal by using an interpolation method. For example, the closed loop pitch period integer part of the secondary channel is loc _ T0 and the closed loop pitch period fractional part is loc _ frac _ prim, all as obtained by the above method.
In one possible implementation, the determining is based on a pitch period search range adjustment factor of the secondary channel signalThe pitch period index value upper limit of the secondary channel signal comprises the following steps: calculating a pitch period index upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal; soft _ reuse _ index _ high _ limit is 0.5+2Z(ii) a Wherein Z is a pitch period search range adjustment factor of the secondary channel signal.
In a possible implementation manner, the value of Z is 3, or 4, or 5.
In a possible implementation manner, the calculating a pitch period index value of the secondary channel signal according to the pitch period estimated value of the primary channel signal, the pitch period estimated value of the secondary channel signal, and a pitch period index value upper limit of the secondary channel signal includes: determining a closed loop pitch integer portion of the secondary channel signal loc _ T0 and a closed loop pitch fractional portion of the secondary channel signal loc _ frac _ prim from the pitch estimate of the primary channel signal; a pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows: (N × pitch _ soft _ reuse + pitch _ frac _ soft _ reuse) — (N × loc _ T0+ loc _ frac _ prim) + soft _ reuse _ index _ high _ limit/M; wherein, the pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, the pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, the pitch _ reuse _ index _ high _ limit represents an upper pitch period index value limit of the secondary channel signal, the N represents the number of sub-frames into which the secondary channel signal is divided, the M represents an adjustment factor of the upper pitch period index value limit of the secondary channel signal, the M is a non-zero real number, the x represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
In one possible implementation, the method is applied to a stereo coding scene in which the coding rate of the current frame is lower than a preset rate threshold; the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps. The rate threshold may be less than or equal to 13.2kbps, for example, the rate threshold may also be 16.4kbps or 24.4kbps, and a specific value of the rate threshold may be determined according to an application scenario. Under the condition that the coding rate is lower (such as 24.4kbps and lower rate), the pitch period independent coding of the secondary sound channel is not carried out, the pitch period estimated value of the main sound channel signal is used as a reference value, and the purposes of realizing the pitch period coding of the secondary sound channel signal and improving the stereo coding quality are realized by adopting a differential coding method.
In a second aspect, an embodiment of the present application further provides a stereo decoding method, including: determining whether to carry out differential decoding on the fundamental tone period of the secondary sound channel signal according to the received stereo coding code stream; when the pitch period of the secondary sound channel signal is determined to be differentially decoded, acquiring a pitch period estimated value of a primary sound channel of a current frame and a pitch period index value of a secondary sound channel of the current frame from the stereo coding code stream; and carrying out differential decoding on the pitch period of the secondary channel signal according to the pitch period estimated value of the primary channel and the pitch period index value of the secondary channel to obtain the pitch period estimated value of the secondary channel signal, wherein the pitch period estimated value of the secondary channel signal is used for decoding the stereo coding code stream.
In the embodiment of the present application, it is first determined whether to perform differential decoding on a pitch period of a secondary channel signal according to a received stereo coded code stream, and when performing differential decoding on the pitch period of the secondary channel signal, a pitch period estimation value of a primary channel of a current frame and a pitch period index value of a secondary channel of the current frame are obtained from the stereo coded code stream, and according to the pitch period estimation value of the primary channel and the pitch period index value of the secondary channel, the pitch period of the secondary channel signal is differentially decoded to obtain a pitch period estimation value of the secondary channel signal, and the pitch period estimation value of the secondary channel signal is used for decoding the stereo coded code stream. In the embodiment of the present application, when the pitch period of the secondary channel signal can be differentially decoded, the pitch period of the secondary channel signal can be differentially decoded by using the pitch period estimated value of the primary channel signal and the pitch period index value of the secondary channel signal, so that the pitch period estimated value of the secondary channel signal is obtained, and the stereo coded stream can be decoded by using the pitch period estimated value of the secondary channel signal, so that the spatial impression and the sound image stability of the stereo signal can be improved.
In a possible implementation manner, the determining whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo coded code stream includes: acquiring a secondary channel pitch period differential coding identifier from the current frame; and when the secondary channel pitch period differential coding identifier is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal. In this embodiment, the secondary channel pitch lag difference coding flag may have a plurality of values, for example, the secondary channel pitch difference coding flag may have a preset first value, for example, the value of the secondary channel pitch difference coding flag is 1, and at this time, the differential decoding of the pitch lag of the secondary channel signal is performed.
In one possible implementation, the method further includes: and when the pitch period of the secondary sound channel signal is determined not to be differentially decoded and the pitch period estimated value of the primary sound channel signal is not multiplexed as the pitch period of the secondary sound channel signal, decoding the pitch period of the secondary sound channel signal from the stereo coding code stream. In this case, in the embodiment of the present application, a pitch period independent decoding method for a secondary channel may be further used to decode the pitch period of the secondary channel signal, so that the decoding end may decode the pitch period of the secondary channel signal.
In one possible implementation, the method further includes: and when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimate of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, taking the pitch period estimate of the primary channel signal as the pitch period of the secondary channel signal. When the decoding end determines not to perform differential decoding on the pitch period of the secondary channel signal, a pitch period multiplexing method may also be adopted in the embodiment of the present application. For example, when the secondary channel signal pitch multiplexing flag indicates that the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, the decoding side may decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal based on the secondary channel signal pitch multiplexing flag.
In one possible implementation, the differentially decoding the pitch lag of the secondary channel signal according to the pitch lag estimate of the primary channel and the pitch lag index of the secondary channel includes: determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimated value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; determining the upper limit of the pitch period index value of the secondary sound channel signal according to the pitch period searching range adjusting factor of the secondary sound channel signal; and calculating the pitch period estimation value of the secondary sound channel signal according to the closed-loop pitch period reference value of the secondary sound channel signal, the pitch period index value of the secondary sound channel and the pitch period index value upper limit of the secondary sound channel signal. Specifically, the pitch period estimate of the primary channel signal is used to determine the closed-loop pitch period reference value of the secondary channel signal, as described in detail in the foregoing calculation process. The pitch search range adjustment factor of the secondary channel signal may be used to adjust a pitch index value of the secondary channel signal to determine an upper pitch index value limit of the secondary channel signal. The pitch period index value upper limit of the secondary channel signal represents an upper limit value that the pitch period index value of the secondary channel signal cannot exceed. The pitch index value of the secondary channel signal may be used to determine the pitch index value of the secondary channel signal. After determining the closed-loop pitch reference value of the secondary channel signal, the pitch index value of the secondary channel signal and the upper limit of the pitch index value of the secondary channel signal, the decoding end performs differential decoding according to the closed-loop pitch reference value of the secondary channel signal, the pitch index value of the secondary channel signal and the upper limit of the pitch index value of the secondary channel signal, and outputs the pitch estimation value of the secondary channel signal.
In one possible implementation, the calculating the pitch period estimation value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal, and the pitch period index value upper limit of the secondary channel signal includes: the pitch period estimate T0_ pitch of the secondary channel signal is calculated as follows:
t0_ pitch ═ f _ pitch _ prim + (soft _ reuse _ index-soft _ reuse _ index _ high _ limit/M)/N; wherein f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of sub-frames into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the/represents a division operator, the + represents an addition operator, and the-represents a subtraction operator.
In a possible implementation manner, the adjustment factor of the pitch period index upper limit of the secondary channel signal is 2 or 3.
In a third aspect, an embodiment of the present application further provides a stereo encoding apparatus, including: a down-mixing module, configured to perform down-mixing processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame; and the differential coding module is used for performing differential coding on the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal when determining that the differential coding is performed on the pitch period of the secondary channel signal, so as to obtain a pitch period index value of the secondary channel signal, wherein the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be sent.
In one possible implementation manner, the stereo encoding apparatus further includes: a main sound channel coding module, configured to code the main sound channel signal of the current frame to obtain a pitch period estimation value of the main sound channel signal; an open-loop analysis module, configured to perform open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal; a threshold judging module, configured to judge whether a difference between the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal exceeds a preset secondary channel pitch period differential coding threshold, determine to perform differential coding on the pitch period of the secondary channel signal when the difference exceeds the secondary channel pitch period differential coding threshold, and determine not to perform differential coding on the pitch period of the secondary channel signal when the difference does not exceed the secondary channel pitch period differential coding threshold.
In one possible implementation manner, the stereo encoding apparatus further includes: and an identifier configuration module, configured to configure, when it is determined to perform differential coding on the pitch periods of the secondary channel signals, a secondary channel pitch period differential coding identifier in the current frame as a preset first value, where the stereo coded code stream carries the secondary channel pitch period differential coding identifier, and the first value is used to indicate that the pitch periods of the secondary channel signals are differentially coded.
In one possible implementation manner, the stereo encoding apparatus further includes: an independent coding module, wherein the independent coding module is configured to code a pitch lag of the secondary channel signal and a pitch lag of the primary channel signal, respectively, when it is determined that the pitch lag of the secondary channel signal is not differentially coded and the pitch lag estimate of the primary channel signal is not multiplexed as the pitch lag of the secondary channel signal.
In one possible implementation manner, the stereo encoding apparatus further includes: and the identifier configuration module is configured to configure a secondary channel signal pitch period multiplexing identifier as a preset fourth value when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, and carry the secondary channel signal pitch period multiplexing identifier in the stereo encoded code stream, where the fourth value is used to indicate that the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
In one possible implementation, the differential encoding module includes: a closed-loop pitch period searching module, configured to perform closed-loop pitch period search on a secondary channel according to the pitch period estimation value of the primary channel signal, so as to obtain a pitch period estimation value of the secondary channel signal; an index value upper limit determining module, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal; and the index value calculating module is used for calculating the pitch period index value of the secondary channel signal according to the pitch period estimated value of the primary channel signal, the pitch period estimated value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal.
In a possible implementation manner, the closed-loop pitch period searching module is configured to determine a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimation value of the secondary channel signal.
In one possible implementation, the closed loop pitch search module is configured to determine a closed loop pitch integer portion loc _ T0 of the secondary channel signal and a closed loop pitch fractional portion loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal; calculating a closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal by: f _ pitch _ prim ═ loc _ T0+ loc _ frac _ prim/N; wherein the N represents the number of sub-frames into which the secondary channel signal is divided.
In a possible implementation manner, the index value upper limit determining module is configured to calculate a pitch period index value upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal; soft _ reuse _ index _ high _ limit is 0.5+2Z(ii) a Wherein Z is a pitch period search range adjustment factor of the secondary channel signal.
In a possible implementation manner, the value of Z is: 3. or 4, or 5.
In one possible implementation, the index value calculation module is configured to determine a closed loop pitch period integer part loc _ T0 of the secondary channel signal and a closed loop pitch period fractional part loc _ frac _ prim of the secondary channel signal from the pitch period estimate of the primary channel signal; a pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows:
(N × pitch _ soft _ reuse + pitch _ frac _ soft _ reuse) — (N × loc _ T0+ loc _ frac _ prim) + soft _ reuse _ index _ high _ limit/M; wherein, the pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, the pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, the pitch _ reuse _ index _ high _ limit represents an upper pitch period index value limit of the secondary channel signal, the N represents the number of sub-frames into which the secondary channel signal is divided, the M represents an adjustment factor of the upper pitch period index value limit of the secondary channel signal, the M is a non-zero real number, the x represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
In a possible implementation manner, the stereo encoding apparatus is applied to a stereo encoding scene in which the encoding rate of the current frame is lower than a preset rate threshold; the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps.
In the third aspect of the present application, the constituent modules of the stereo encoding apparatus may further perform the steps described in the foregoing first aspect and various possible implementations, for details, see the foregoing description of the first aspect and various possible implementations.
In a fourth aspect, an embodiment of the present application further provides a stereo decoding apparatus, including: the determining module is used for determining whether to carry out differential decoding on the pitch period of the secondary sound channel signal according to the received stereo coding code stream; a value obtaining module, configured to obtain, when it is determined to perform differential decoding on the pitch period of the secondary channel signal, a pitch period estimation value of a primary channel of a current frame and a pitch period index value of a secondary channel of the current frame from the stereo coded code stream; and the differential decoding module is used for carrying out differential decoding on the pitch period of the secondary channel signal according to the pitch period estimated value of the primary channel and the pitch period index value of the secondary channel to obtain the pitch period estimated value of the secondary channel signal, wherein the pitch period estimated value of the secondary channel signal is used for decoding the stereo coding code stream.
In a possible implementation manner, the determining module is configured to obtain a secondary channel pitch period differential coding identifier from the current frame; and when the secondary channel pitch period differential coding identifier is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal.
In one possible implementation manner, the stereo decoding apparatus further includes: an independent decoding module, wherein the independent decoding module is configured to decode a pitch lag of the secondary channel signal from the stereo coded stream when it is determined that the pitch lag of the secondary channel signal is not differentially decoded and the pitch lag estimate of the primary channel signal is not multiplexed as the pitch lag of the secondary channel signal.
In one possible implementation manner, the stereo decoding apparatus further includes: a pitch period multiplexing module, wherein the pitch period multiplexing module is configured to use the pitch period estimation value of the primary channel signal as the pitch period of the secondary channel signal when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
In one possible implementation, the differential decoding module includes: a reference value determining submodule, configured to determine a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; an index value upper limit determining submodule, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal; and the estimated value calculating submodule is used for calculating the pitch period estimated value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel and the pitch period index value upper limit of the secondary channel signal.
In one possible implementation, the estimate calculation submodule is configured to calculate the pitch estimate T0_ pitch of the secondary channel signal by:
t0_ pitch ═ f _ pitch _ prim + (soft _ reuse _ index-soft _ reuse _ index _ high _ limit/M)/N; wherein f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of sub-frames into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the/represents a division operator, the + represents an addition operator, and the-represents a subtraction operator.
In a possible implementation manner, the adjustment factor of the pitch period index upper limit of the secondary channel signal is 2 or 3.
In a fourth aspect of the present application, the constituent modules of the stereo decoding apparatus may further perform the steps described in the foregoing second aspect and various possible implementations, for details, see the foregoing description of the second aspect and various possible implementations.
In a fifth aspect, an embodiment of the present application provides a stereo processing apparatus, where the stereo processing apparatus may include an entity such as a stereo encoding apparatus or a stereo decoding apparatus or a chip, and the stereo processing apparatus includes: a processor. Optionally, the stereo processing may further comprise a memory; the memory is to store instructions; the processor is configured to execute the instructions in the memory such that the stereo processing apparatus performs the method of any of the preceding first or second aspects.
In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the method of the first or second aspect.
In a seventh aspect, the present application provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform the method of the first or second aspect.
In an eighth aspect, the present application provides a chip system comprising a processor for enabling a stereo encoding apparatus or a stereo decoding apparatus to perform the functions referred to in the above aspects, e.g. to transmit or process data and/or information referred to in the above methods. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the stereo encoding apparatus or the stereo decoding apparatus. The chip system may be formed by a chip, or may include a chip and other discrete devices.
Drawings
Fig. 1 is a schematic structural diagram of a stereo processing system according to an embodiment of the present application;
fig. 2a is a schematic diagram of a stereo encoder and a stereo decoder applied to a terminal device according to an embodiment of the present application;
fig. 2b is a schematic diagram of a stereo encoder applied to a wireless device or a core network device according to an embodiment of the present application;
fig. 2c is a schematic diagram of a stereo decoder applied to a wireless device or a core network device according to an embodiment of the present application;
FIG. 3a is a diagram of a multi-channel encoder and a multi-channel decoder applied to a terminal device according to an embodiment of the present disclosure;
fig. 3b is a schematic diagram of a multi-channel encoder applied to a wireless device or a core network device according to an embodiment of the present application;
fig. 3c is a schematic diagram of a multi-channel decoder applied to a wireless device or a core network device according to an embodiment of the present application;
fig. 4 is a schematic view illustrating an interaction flow between a stereo encoding apparatus and a stereo decoding apparatus according to an embodiment of the present application;
fig. 5 is a schematic flowchart of stereo signal encoding according to an embodiment of the present application;
fig. 6 is a flowchart of coding pitch period parameters of a primary channel signal and pitch period parameters of a secondary channel signal according to an embodiment of the present application;
FIG. 7 is a diagram illustrating pitch period quantization results obtained by independent coding and differential coding;
FIG. 8 is a comparison graph of the number of bits allocated to a fixed code table after the independent coding scheme and the differential coding scheme are used;
fig. 9 is a schematic diagram of a time-domain stereo coding method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a stereo encoding apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a stereo codec device according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of another stereo encoding apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of another stereo codec device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a stereo coding method, a stereo decoding method and a stereo decoding device, which can improve the stereo coding and decoding performance.
Embodiments of the present application are described below with reference to the accompanying drawings.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical solution of the embodiment of the present application may be applied to various stereo processing systems, and as shown in fig. 1, is a schematic structural diagram of a stereo processing system provided in the embodiment of the present application. The stereo processing system 100 may include: a stereo encoding apparatus 101 and a stereo decoding apparatus 102. The stereo encoding device 101 may be configured to generate a stereo encoded code stream, and then the stereo encoded code stream may be transmitted to the stereo decoding device 102 through an audio transmission channel, and the stereo decoding device 102 may receive the stereo encoded code stream, then perform a stereo decoding function of the stereo decoding device 102, and finally obtain a stereo decoded code stream.
In the embodiment of the present application, the stereo encoding apparatus may be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices, for example, the stereo encoding apparatus may be a stereo encoder of the above terminal devices or wireless devices or core network devices. Similarly, the stereo decoding apparatus can be applied to various terminal devices required for audio communication, wireless devices required for transcoding, and core network devices, for example, the stereo decoding apparatus can be a stereo decoder of the above terminal devices or wireless devices or core network devices.
As shown in fig. 2a, a stereo encoder and a stereo decoder provided in the embodiments of the present application are applied to a terminal device. May include, for each terminal device: stereo encoder, channel encoder, stereo decoder, channel decoder. In particular, the channel encoder is configured to channel encode the stereo signal, and the channel decoder is configured to channel decode the stereo signal. For example, the first terminal device 20 may include: a first stereo encoder 201, a first channel encoder 202, a first stereo decoder 203, a first channel decoder 204. The second terminal device 21 may include: a second stereo decoder 211, a second channel decoder 212, a second stereo encoder 213, a second channel encoder 214. The first terminal device 20 is connected with a wireless or wired first network communication device 22, the first network communication device 22 and a wireless or wired second network communication device 23 are connected through a digital channel, and the second terminal device 21 is connected with the wireless or wired second network communication device 23. The wireless or wired network communication device may be generally referred to as a signal transmission device, such as a communication base station, a data exchange device, etc.
In audio communication, a terminal device serving as a transmitting end performs stereo coding on a collected stereo signal, performs channel coding, and transmits the stereo signal in a digital channel through a wireless network or a core network. And the terminal equipment as the receiving end performs channel decoding according to the received signal to obtain a stereo signal coding code stream, and then recovers the stereo signal through stereo decoding, and the stereo signal is played back by the terminal equipment of the receiving end.
As shown in fig. 2b, the stereo encoder provided in the embodiment of the present application is applied to a wireless device or a core network device. The wireless device or the core network device 25 includes: a channel decoder 251, a further audio decoder 252, a stereo encoder 253, a channel encoder 254, wherein the further audio decoder 252 refers to a further audio decoder other than a stereo decoder. In the wireless device or core network device 25, a signal entering the device is first channel decoded by a channel decoder 251, then audio decoded by another audio decoder 252 (except for stereo decoding), then stereo encoded by a stereo encoder 253, and finally channel encoded by a channel encoder 254, and then transmitted after channel encoding is completed.
As shown in fig. 2c, the stereo decoder provided in the embodiment of the present application is applied to a wireless device or a core network device. The wireless device or the core network device 25 includes: a channel decoder 251, a stereo decoder 255, another audio encoder 256, and a channel encoder 254, wherein the other audio encoder 256 refers to an audio encoder other than a stereo encoder. In the wireless device or the core network device 25, a signal entering the device is first channel-decoded by a channel decoder 251, then a stereo coded code stream received is decoded by a stereo decoder 255, then audio coding (except for stereo coding) is performed by other audio coders 256, and finally a stereo signal is channel-coded by a channel coder 254, and then the signal is transmitted after the channel coding is completed. In a wireless device or a core network device, if transcoding needs to be realized, corresponding stereo encoding and decoding processing needs to be performed. The wireless device refers to a radio frequency related device in communication, and the core network device refers to a core network related device in communication.
In some embodiments of the present application, the stereo encoding apparatus may be applied to various terminal devices, wireless devices and core network devices that require audio communication, for example, the stereo encoding apparatus may be a multi-channel encoder of the above terminal devices or wireless devices or core network devices. Similarly, the stereo decoding apparatus can be applied to various terminal devices required for audio communication, wireless devices required for transcoding, and core network devices, for example, the stereo decoding apparatus can be a multi-channel decoder of the above terminal devices or wireless devices or core network devices.
As shown in fig. 3a, the schematic diagram of the multi-channel encoder and multi-channel decoder applied to the terminal devices provided by the embodiment of the present application may include, for each terminal device: multi-channel encoder, multi-channel decoder, channel decoder. Specifically, the channel encoder is configured to perform channel encoding on a multi-channel signal, and the channel decoder is configured to perform channel decoding on the multi-channel signal. For example, the first terminal device 30 may include: a first multi-channel encoder 301, a first channel encoder 302, a first multi-channel decoder 303, a first channel decoder 304. The second terminal device 31 may include: a second multi-channel decoder 311, a second channel decoder 312, a second multi-channel encoder 313, a second channel encoder 314. The first terminal device 30 is connected with a wireless or wired first network communication device 32, the first network communication device 32 is connected with a wireless or wired second network communication device 33 through a digital channel, and the second terminal device 31 is connected with the wireless or wired second network communication device 33. The wireless or wired network communication device may be generally referred to as a signal transmission device, such as a communication base station, a data exchange device, etc. Terminal equipment serving as a sending end in audio communication carries out multichannel coding on the collected multichannel signals, and then carries out channel coding and then carries out transmission in a digital channel through a wireless network or a core network. And the terminal equipment as the receiving end performs channel decoding according to the received signal to obtain a multi-channel signal coding code stream, and then recovers the multi-channel signal through the multi-channel decoding, and the multi-channel signal is played back by the terminal equipment as the receiving end.
As shown in fig. 3b, a schematic diagram of the multi-channel encoder provided in the embodiment of the present application applied to a wireless device or a core network device, where the wireless device or the core network device 35 includes: the channel decoder 351, the other audio decoder 352, the multi-channel encoder 353 and the channel encoder 354 are similar to those in the foregoing fig. 2b, and are not described again here.
As shown in fig. 3c, the multi-channel decoder provided in the embodiment of the present application is applied to a wireless device or a core network device, where the wireless device or the core network device 35 includes: the channel decoder 351, the multi-channel decoder 355, the other audio encoder 356, and the channel encoder 354 are similar to those in fig. 2c, and are not described again here.
For example, performing multi-channel coding on the acquired multi-channel signal may be performing dimension reduction processing on the acquired multi-channel signal to obtain a stereo signal, and encoding the obtained stereo signal; and the decoding end decodes the multi-channel signal coding code stream to obtain a stereo signal, and restores the multi-channel signal after upmixing. Therefore, the embodiments of the present application can also be applied to a multi-channel encoder and a multi-channel decoder in a terminal device, a wireless device, a core network device. In wireless or core network equipment, if transcoding needs to be realized, corresponding multi-channel coding and decoding processing needs to be carried out.
In the embodiments of the application, one of the more important elements in the stereo coding method is pitch period coding. Because a voiced sound is generated by a quasi-periodic impulse excitation, its time-domain waveform exhibits significant periodicity, which is referred to as the pitch period. The pitch period plays a very important role in producing high quality voiced speech because voiced speech is characterized as a quasi-periodic signal consisting of samples separated by pitch periods. In speech processing, the pitch period may be represented by the number of samples included in one period, and this is referred to as pitch lag. The pitch delay is an important parameter of the adaptive codebook.
The pitch period estimation mainly refers to the estimation process of the pitch period, so the accuracy of the pitch period estimation directly determines the correctness of the excitation signal, and also determines the synthesis quality of the voice signal. The fact that less bit resources are used for representing pitch periods at the medium and low code rates is one of the reasons for the quality of speech coding. The pitch periods of the primary channel signal and the secondary channel signal have strong similarity, the similarity of the pitch periods can be reasonably utilized, the coding efficiency is improved, and the method is an important factor influencing the quality of the whole stereo coding at a medium-low rate.
In the embodiment of the application, for parametric stereo coding performed under the condition of frequency domain or time frequency combination, correlation exists between the pitch period of the primary channel signal and the pitch period of the secondary channel signal, and for the pitch period coding of the secondary channel signal, when the pitch period multiplexing condition of the secondary channel signal is met, the pitch period parameter in the secondary channel signal is reasonably predicted and differentially coded by a differential coding method, and only a small amount of bit resources are required to be allocated to the pitch period of the secondary channel signal for quantization coding. In addition, in the embodiment of the present application, the pitch period of the secondary channel signal uses a smaller bit resource, which ensures the accuracy of pitch period prediction of the secondary channel signal, and the remaining bit resource is used for other stereo coding parameters, for example, for coding parameters such as a fixed code table, so as to improve the coding efficiency of the secondary channel and finally improve the overall stereo coding quality.
In the embodiment of the application, for pitch period coding of a secondary channel signal, a pitch period differential coding method oriented to the secondary channel signal is adopted, a pitch period of a primary channel signal is used as a reference value, and bit resources of the secondary channel are redistributed, so that the purpose of improving stereo coding quality is achieved. Next, a stereo encoding method and a stereo decoding method according to embodiments of the present application will be described based on the system architecture, the stereo encoding apparatus, and the stereo decoding apparatus described above. As shown in fig. 4, an interactive flowchart between a stereo encoding apparatus and a stereo decoding apparatus in the embodiment of the present application is shown, wherein the following steps 401 to 403 may be executed by the stereo encoding apparatus (hereinafter referred to as an encoding end), and the following steps 411 to 413 may be executed by the stereo decoding apparatus (hereinafter referred to as an interface end), and mainly include the following processes:
401. and performing down-mixing processing on the left channel signal of the current frame and the right channel signal of the current frame to obtain a main channel signal of the current frame and a secondary channel signal of the current frame.
In this embodiment, a current frame refers to a stereo signal frame currently being encoded at an encoding end, a left channel signal of the current frame and a right channel signal of the current frame are first obtained, and a primary channel signal of the current frame and a secondary channel signal of the current frame can be obtained by performing downmix processing on the left channel signal and the right channel signal. For example, the stereo codec technology has many different implementations, for example, an encoding end down-mixes a time domain signal into two mono channel signals, and down-mixes a left channel signal and a right channel signal into a primary channel signal and a secondary channel signal, where L represents a left channel signal and R represents a right channel signal, and the primary channel signal may be 0.5 × L + R, which represents related information between two channels; the secondary channel signal may be 0.5 x (L-R), characterizing the difference information between the two channels.
It should be noted that the following embodiments will describe the downmix process in the frequency domain stereo coding and the downmix process in the time domain stereo coding in detail.
In some embodiments of the present application, the stereo encoding method performed by the encoding end may be applied to a stereo encoding scene in which the encoding rate of the current frame is lower than a preset rate threshold. The stereo decoding method performed by the decoding end can be applied to a stereo decoding scene in which the decoding rate of the current frame is lower than a preset rate threshold. The coding rate of the current frame refers to a coding rate adopted by a stereo signal of the current frame, the rate threshold refers to a minimum rate value set for the stereo signal, the stereo coding method provided by the embodiment of the present application can be executed when the coding rate of the current frame is lower than a preset rate threshold, and the stereo decoding method provided by the embodiment of the present application can be executed when the decoding rate of the current frame is lower than the preset rate threshold.
Further, in some embodiments of the present application, the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps.
The rate threshold may be less than or equal to 13.2kbps, for example, the rate threshold may also be 16.4kbps or 24.4kbps, and a specific value of the rate threshold may be determined according to an application scenario. Under the condition that the coding rate is lower (such as 24.4kbps and lower rate), the pitch period independent coding of the secondary sound channel is not carried out, the pitch period estimated value of the main sound channel signal is used as a reference value, and the purposes of realizing the pitch period coding of the secondary sound channel signal and improving the stereo coding quality are realized by adopting a differential coding method.
402. It is determined whether the pitch period of the secondary channel signal is differentially encoded.
In the embodiment of the present application, after the primary channel signal of the current frame and the secondary channel signal of the current frame are obtained, it may be determined whether the pitch period of the secondary channel signal can be differentially encoded according to the primary channel signal and the secondary channel signal of the current frame. For example, whether the pitch period of the secondary channel signal is differentially encoded is determined according to the signal characteristics of the primary channel signal and the secondary channel signal of the current frame, and whether the pitch period of the secondary channel signal is differentially encoded may also be decided using the primary channel signal, the secondary channel signal, and a preset decision condition, for example. There are various ways to determine whether to perform differential encoding using the primary channel signal and the secondary channel signal, and details will be described in the following embodiments.
In the embodiment of the present application, the step 402 of determining whether to differentially encode the pitch period of the secondary channel signal includes:
coding a main sound channel signal of a current frame to obtain a pitch period estimated value of the main sound channel signal;
carrying out open-loop pitch period analysis on a secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal;
judging whether the difference value between the pitch period estimated value of the primary sound channel signal and the open-loop pitch period estimated value of the secondary sound channel signal exceeds a preset secondary sound channel pitch period differential coding threshold value or not;
when the difference value exceeds a secondary channel pitch period differential coding threshold value, determining to carry out differential coding on the pitch period of the secondary channel signal; or the like, or, alternatively,
determining not to differentially encode a pitch lag of the secondary channel signal when the difference does not exceed a secondary channel pitch lag differential encoding threshold.
In this embodiment of the application, after the main channel signal of the current frame is obtained in step 401, encoding may be performed according to the main channel signal, so as to obtain a pitch period estimation value of the main channel signal. Specifically, in the main vocal tract coding, the pitch period estimation adopts the combination of open-loop pitch analysis and closed-loop pitch search, so that the accuracy of the pitch period estimation is improved. The pitch period estimation of the speech signal can be performed in a variety of ways, for example, by using autocorrelation functions, short-term average amplitude differences, etc. The pitch period estimation algorithm is based on an autocorrelation function. The autocorrelation function has a peak value at the position of integral multiple of the pitch period, and the pitch period estimation can be completed by utilizing the characteristic. To improve the accuracy of pitch prediction and better approximate the actual pitch period of speech, pitch period detection employs a fractional delay of 1/3 sample resolution. In order to reduce the amount of operation of the pitch period estimation, the pitch period estimation comprises two steps, an open-loop pitch analysis and a closed-loop pitch search. The integer delay of a frame of speech is roughly estimated using open-loop pitch analysis to obtain a candidate integer delay, a closed-loop pitch search is performed around which the pitch delay is finely estimated, and the closed-loop pitch search is performed once per subframe. The open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and calculation of the optimal open-loop integer delay are calculated, respectively. The pitch period estimation value of the main channel signal can be obtained through the above process.
After the secondary channel signal of the current frame is acquired, the secondary channel signal may be subjected to open-loop pitch period analysis, so that an open-loop pitch period estimation value of the secondary channel signal may be obtained.
In the embodiment of the present application, after obtaining the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal, a difference between the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal may be calculated, and then it is determined whether the difference exceeds a preset secondary channel pitch period differential coding threshold. The pitch period difference coding threshold of the secondary channel can be preset, and can be flexibly configured by combining a stereo coding scene. And determining to perform differential encoding when the difference value exceeds the secondary channel pitch period differential encoding threshold, and determining not to perform differential encoding when the difference value does not exceed the secondary channel pitch period differential encoding threshold.
It should be noted that, in the embodiment of the present application, the manner of determining whether to perform differential coding on the pitch lag of the secondary channel signal is not limited to the above-mentioned determination of the magnitude by using the difference value and the secondary channel pitch lag differential coding threshold, and for example, it may be determined whether the result of dividing the difference value by the secondary channel pitch lag differential coding threshold is smaller than 1. For another example, the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal may be divided, and the obtained division result and the secondary channel pitch period difference coding threshold may be subjected to numerical value determination. In addition, the specific value of the secondary channel pitch period differential coding threshold can be determined by combining the application scenarios, and is not limited herein.
For example, in the secondary channel coding, the secondary channel pitch difference coding decision is performed according to the pitch period estimated value of the primary channel signal and the open-loop pitch period estimated value of the secondary channel signal, and for example, the available decision conditions are: DIFF ═ Σ (pitch [0]) - ∑ (pitch [1]) |.
Wherein, DIFF represents a difference between the pitch period estimation value of the primary channel signal and the open loop pitch period estimation value of the secondary channel signal, | Σ (pitch [0]) - Σ (pitch [1]) | represents taking an absolute value of the difference between Σ (pitch [0]) and Σ (pitch [1]), ∑ pitch [0] represents the pitch period estimation value of the primary channel signal, and Σ pitch [1] represents the open loop pitch period estimation value of the secondary channel signal.
Without limitation, the decision condition that can be used in the embodiment of the present application may not be limited to the above formula, and for example, after the result is calculated by | ∑ (pitch [0]) - ∑ (pitch [1]) |, a correction factor may be set, and the result multiplied by | ∑ (pitch [0]) - (| (pitch [1]) |) may be used as the final output DIFF. For another example, to the right of the equation in DIFF | ∑ (pitch [0]) - ∑ (pitch [1]) |, a conditional threshold constant may be added or subtracted to obtain the final DIFF.
In the embodiment of the present application, after determining whether to differentially encode the pitch lag of the secondary channel signal, it is determined whether to perform step 403 according to the determined result, and when determining to differentially encode the pitch lag of the secondary channel signal, the subsequent step 403 is triggered to be performed.
In some embodiments of the present application, after the step 402 determines whether to differentially encode the pitch period of the secondary channel signal, the method provided by the embodiments of the present application further includes:
when the pitch period of the secondary channel signal is determined to be differentially coded, configuring a secondary channel pitch period differential coding identifier in the current frame as a preset first value, wherein the stereo coding code stream carries the secondary channel pitch period differential coding identifier, and the first value is used for indicating the differential coding of the pitch period of the secondary channel signal.
The coding end obtains a secondary channel pitch period differential coding identifier, the value of the secondary channel pitch period differential coding identifier can be configured according to whether differential coding is performed on the pitch period of the secondary channel signal, and the secondary channel pitch period differential coding identifier is used for indicating whether differential coding is performed on the pitch period of the secondary channel signal.
In this embodiment, the secondary channel pitch difference coding flag may have a plurality of values, for example, the secondary channel pitch difference coding flag may be a preset first value or configured as a second value. The following illustrates a method for configuring the secondary channel pitch lag differential coding flag, which is configured to be a first value when the pitch lag of the secondary channel signal is determined to be differentially coded. By the secondary channel pitch lag differential encoding flag indicating the first value, the decoding end may be caused to determine that the pitch lag of the secondary channel signal may be differentially decoded. For example, the value of the secondary channel pitch difference coding flag may be 0 or 1, the first value is 1, and the second value is 0.
For example, the secondary channel Pitch difference coding flag is indicated by Pitch _ reuse _ flag as follows. DIFF _ THR is a preset secondary channel pitch period difference coding threshold, and the secondary channel pitch period difference coding threshold is determined to be a certain value in {1,3,6} according to different coding rates. For example, when DIFF > DIFF _ THR, Pitch _ reuse _ flag is 1, it is discriminated that the current frame is coded using the Pitch period difference of the secondary channel signal. And when the DIFF is less than or equal to the DIFF _ THR, the Pitch _ reuse _ flag is equal to 0, and the Pitch period difference coding is not carried out at the moment, and the independent coding of the secondary channel signal is adopted.
In some embodiments of the present application, after the step 402 determines whether to differentially encode the pitch period of the secondary channel signal, the method provided by the embodiments of the present application further includes:
when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimate of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, the pitch period of the secondary channel signal and the pitch period of the primary channel signal are encoded separately.
In this case, the pitch period of the secondary channel signal may be encoded by using a pitch period independent encoding method for the secondary channel, so that the pitch period of the secondary channel signal may be encoded.
In some embodiments of the present application, after the step 402 determines whether to differentially encode the pitch period of the secondary channel signal, the method provided by the embodiments of the present application further includes:
when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset fourth value, and carrying the pitch period multiplexing identifier of the secondary channel signal in the stereo coded code stream, where the fourth value is used to indicate that the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
When the pitch period of the secondary channel signal is not differentially encoded, a pitch period multiplexing method may also be adopted in the embodiment of the present application. That is, the secondary channel pitch period is not encoded at the encoding end, but a secondary channel signal pitch period multiplexing identifier is carried in the stereo encoded code stream, the secondary channel signal pitch period multiplexing identifier indicates whether the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, and when the secondary channel signal pitch period multiplexing identifier indicates that the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, the decoding end can decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
In some embodiments of the present application, after the step 402 determines whether to differentially encode the pitch period of the secondary channel signal, the method provided by the embodiments of the present application further includes:
when the pitch period of the secondary channel signal is determined not to be differentially coded, configuring a secondary channel pitch period differential coding identifier as a preset second value, wherein the stereo coding code stream carries the secondary channel pitch period differential coding identifier, and the second value is used for indicating that the pitch period of the secondary channel signal is not differentially coded;
when the pitch period estimated value of the primary channel signal which is not multiplexed is determined to be used as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset third value, wherein the stereo coding code stream carries the pitch period multiplexing identifier of the secondary channel signal, and the third value is used for indicating the pitch period estimated value of the primary channel signal which is not multiplexed to be used as the pitch period of the secondary channel signal;
the pitch period of the secondary channel signal and the pitch period of the primary channel signal are encoded separately.
The secondary channel pitch period difference coding identifier may have a plurality of values, for example, the secondary channel pitch period difference coding identifier may be a preset first value or configured as a second value. To illustrate the method of configuring the secondary channel pitch lag differential coding flag, the secondary channel pitch lag differential coding flag is configured to be a second value when it is determined that the pitch lag of the secondary channel signal is not to be differentially coded. The indication of the second value by the secondary channel pitch period differential coding flag may enable the decoding end to determine that the pitch period of the secondary channel signal may not be differentially decoded, for example, the value of the secondary channel pitch period differential coding flag may be 0 or 1, the first value is 1, and the second value is 0. The indication of the second value by the secondary channel pitch period differential encoding flag may cause the decoding end to determine not to differentially decode the pitch period of the secondary channel signal.
The secondary channel pitch multiplexing flag may have a plurality of values, for example, the secondary channel pitch multiplexing flag may be a preset fourth value or configured as a third value. Next, an example of a method for configuring a secondary channel pitch period multiplexing flag is described, in which when it is determined that a pitch period estimation value of the primary channel signal is not multiplexed as a pitch period of the secondary channel signal, the secondary channel pitch period multiplexing flag is configured to be a third value. The second channel pitch period multiplexing flag indicates a third value, so that the decoding end may determine that the pitch period estimation value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, for example, the value of the second channel pitch period multiplexing flag may be 0 or 1, the fourth value is 1, and the third value is 0. When the encoding end determines not to differentially encode the pitch lag of the secondary channel signal and not to multiplex the pitch lag estimated value of the primary channel signal as the pitch lag of the secondary channel signal, the encoding end may employ an independent encoding method, that is, the pitch lag of the secondary channel signal and the pitch lag of the primary channel signal are encoded separately.
It should be noted that, in the embodiment of the present application, when it is determined that the pitch lag of the secondary channel signal is not differentially encoded, the pitch lag of the secondary channel signal may be encoded by using a pitch lag independent encoding method of the secondary channel. In addition, when it is determined that the pitch period of the secondary channel signal is not differentially encoded, a pitch period multiplexing method may also be employed. Wherein, the stereo coding method executed by the coding end can be applied to the stereo coding scene that the coding rate of the current frame is lower than the preset rate threshold, if the pitch period of the secondary channel signal is not adopted for differential coding, the method of multiplexing the pitch period of the secondary channel can also be adopted, namely, the pitch period of the secondary sound channel is not coded at the coding end, and the pitch period multiplexing identification of the secondary sound channel signal is carried in the stereo coded code stream, indicating whether the pitch lag of the secondary channel signal is multiplexed with the pitch lag estimated value of the primary channel signal by the secondary channel signal pitch lag multiplexing flag, when the secondary channel signal pitch period multiplexing flag indicates that the pitch period of the secondary channel signal multiplexes the pitch period estimate of the primary channel signal, the pitch lag of the primary channel signal may be decoded at the decoding end as the pitch lag of the secondary channel signal based on the secondary channel signal pitch lag multiplexing indicator.
In some embodiments of the present application, after the step 402 determines whether to differentially encode the pitch period of the secondary channel signal, the method provided by the embodiments of the present application further includes:
when the pitch period of the secondary channel signal is determined not to be differentially coded, configuring a secondary channel pitch period differential coding identifier as a preset second value, wherein the stereo coding code stream carries the secondary channel pitch period differential coding identifier, and the second value is used for indicating that the pitch period of the secondary channel signal is not differentially coded;
and when the pitch period estimated value of the multiplexed main channel signal is determined to be used as the pitch period of the secondary channel signal, configuring the pitch period multiplexing identifier of the secondary channel signal as a preset fourth value, wherein the stereo coding code stream carries the pitch period multiplexing identifier of the secondary channel signal, and the fourth value is used for indicating the pitch period estimated value of the multiplexed main channel signal to be used as the pitch period of the secondary channel signal.
The secondary channel pitch period difference coding identifier may have a plurality of values, for example, the secondary channel pitch period difference coding identifier may be a preset first value or configured as a second value. To illustrate the method of configuring the secondary channel pitch lag differential coding flag, the secondary channel pitch lag differential coding flag is configured to be a second value when it is determined that the pitch lag of the secondary channel signal is not to be differentially coded. The indication of the second value by the secondary channel pitch period differential coding flag may enable the decoding end to determine that the pitch period of the secondary channel signal may not be differentially decoded, for example, the value of the secondary channel pitch period differential coding flag may be 0 or 1, the first value is 1, and the second value is 0. The indication of the second value by the secondary channel pitch period differential encoding flag may cause the decoding end to determine not to differentially decode the pitch period of the secondary channel signal.
The secondary channel pitch multiplexing flag may have a plurality of values, for example, the secondary channel pitch multiplexing flag may be a preset fourth value or configured as a third value. And when the coding end determines that the pitch period of the secondary channel signal is not differentially coded and the pitch period estimated value of the multiplexed primary channel signal is used as the pitch period of the secondary channel signal, configuring the pitch period multiplexing identifier of the secondary channel signal to be a fourth value. Next, an example of a method for configuring a secondary channel pitch period multiplexing flag is given, where when it is determined that a pitch period estimation value of the primary channel signal is multiplexed as a pitch period of the secondary channel signal, the secondary channel pitch period multiplexing flag is configured to be a fourth value. The second channel pitch period multiplexing flag indicates a fourth value, so that the decoding end may determine the pitch period estimation value of the multiplexed primary channel signal as the pitch period of the secondary channel signal, for example, the value of the second channel pitch period multiplexing flag may be 0 or 1, the fourth value is 1, and the third value is 0.
403. When the pitch period of the secondary channel signal is determined to be differentially encoded, the pitch period of the secondary channel signal is differentially encoded by using the pitch period estimated value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, and the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be transmitted.
In the embodiment of the present application, when it is determined that the pitch lag of the secondary channel signal can be differentially encoded, the pitch lag of the secondary channel signal may be differentially encoded using the pitch lag estimated value of the primary channel signal, and since the above-mentioned differential encoding uses the pitch lag estimated value of the primary channel signal, the pitch lag estimated value of the secondary channel signal is accurately encoded by performing the differential encoding in consideration of the pitch similarity between the primary channel signal and the secondary channel signal, and the secondary channel signal can be more accurately decoded using the pitch lag estimated value of the secondary channel signal, so that the spatial perception and the sound image stability of the stereo signal can be improved. In addition, if the pitch period of the secondary channel signal is independently encoded, in the embodiment of the present application, the pitch period of the secondary channel signal is differentially encoded, so that bit resource overhead used when the pitch period of the secondary channel signal is independently encoded can be reduced, saved bits are allocated to other stereo encoding parameters, accurate secondary channel pitch period encoding is realized, and overall stereo encoding quality is improved.
In this embodiment of the application, after the main channel signal of the current frame is obtained in step 401, encoding may be performed according to the main channel signal, so as to obtain a pitch period estimation value of the main channel signal. Specifically, in the main vocal tract coding, the pitch period estimation adopts the combination of open-loop pitch analysis and closed-loop pitch search, so that the accuracy of the pitch period estimation is improved. The pitch period estimation of the speech signal can be performed in a variety of ways, for example, by using autocorrelation functions, short-term average amplitude differences, etc. The pitch period estimation algorithm is based on an autocorrelation function. The autocorrelation function has a peak value at the position of integral multiple of the pitch period, and the pitch period estimation can be completed by utilizing the characteristic. To improve the accuracy of pitch prediction and better approximate the actual pitch period of speech, pitch period detection employs a fractional delay of 1/3 sample resolution. In order to reduce the amount of operation of the pitch period estimation, the pitch period estimation comprises two steps, an open-loop pitch analysis and a closed-loop pitch search. The integer delay of a frame of speech is roughly estimated using open-loop pitch analysis to obtain a candidate integer delay, a closed-loop pitch search is performed around which the pitch delay is finely estimated, and the closed-loop pitch search is performed once per subframe. The open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and calculation of the optimal open-loop integer delay are calculated, respectively. The pitch period estimation value of the main channel signal can be obtained through the above process.
Next, a specific process of differential coding in the embodiment of the present application is described, specifically, step 403 performs differential coding on a pitch lag of a secondary channel signal by using a pitch lag estimated value of a primary channel signal to obtain a pitch lag index value of the secondary channel signal, including:
performing closed-loop pitch period search of a secondary channel according to the pitch period estimated value of the primary channel signal to obtain the pitch period estimated value of the secondary channel signal;
determining the upper limit of the pitch period index value of the secondary channel signal according to the pitch period searching range adjustment factor of the secondary channel signal;
and calculating the pitch period index value of the secondary sound channel signal according to the pitch period estimated value of the primary sound channel signal, the pitch period estimated value of the secondary sound channel signal and the pitch period index value upper limit of the secondary sound channel signal.
The coding end firstly carries out closed-loop pitch period search of the secondary channel according to the pitch period estimated value of the secondary channel signal so as to determine the pitch period estimated value of the secondary channel signal. The following is a detailed description of the specific procedure for the closed loop pitch period search. In some embodiments of the present application, performing a closed-loop pitch lag search of the secondary channel based on the pitch lag estimate of the primary channel signal to obtain a pitch lag estimate of the secondary channel signal comprises:
determining a closed-loop pitch period reference value of a secondary channel signal according to a pitch period estimated value of a primary channel signal and the number of divided subframes of the secondary channel signal of a current frame;
and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimated value of the secondary channel signal.
The number of divided subframes of the secondary channel signal of the current frame may be determined by subframe configuration of the secondary channel signal, for example, the number of divided subframes may be 4 subframes, or 3 subframes, and is determined by specifically combining an application scenario. After obtaining the pitch period estimate of the primary channel signal, a closed-loop pitch period reference value of the secondary channel signal may be calculated using the pitch period estimate of the primary channel signal and the number of subframes into which the secondary channel signal is divided. The closed-loop pitch reference value of the secondary channel signal is a reference value determined from a pitch estimate of the primary channel signal, the closed-loop pitch reference value of the secondary channel signal representing a closed-loop pitch of the secondary channel signal determined with the pitch estimate of the primary channel signal as a reference. For example, one method is to directly use the pitch period of the primary channel signal as the closed-loop pitch period reference value of the secondary channel signal, i.e., 4 values are selected from the pitch periods in 5 subframes of the primary channel signal as the closed-loop pitch period reference values of 4 subframes of the secondary channel signal. Another approach is to use an interpolation method to map the pitch period in 5 sub-frames of the primary channel signal to a closed-loop pitch period reference value of 4 sub-frames of the secondary channel signal.
Specifically, a closed-loop pitch period reference value of the secondary channel signal is used as a starting point of closed-loop pitch period searching of the secondary channel signal, integer precision and down-sampling fractional precision are adopted to carry out closed-loop pitch period searching, and finally, interpolation normalization correlation is calculated to obtain a pitch period estimated value of the secondary channel signal. The calculation of the pitch period estimate of the secondary channel signal is described in detail in the following embodiments.
The pitch search range adjustment factor of the secondary channel signal may be used to adjust a pitch index value of the secondary channel signal to determine an upper pitch index value limit of the secondary channel signal. The pitch period index value upper limit of the secondary channel signal represents an upper limit value that the pitch period index value of the secondary channel signal cannot exceed. The pitch index value of the secondary channel signal may be used to determine the pitch index value of the secondary channel signal.
Further, in some embodiments of the present application, determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame includes:
determining a closed loop pitch integer portion loc _ T0 of the secondary channel signal and a closed loop pitch fractional portion loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal;
the closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal is calculated as follows:
f_pitch_prim=loc_T0+loc_frac_prim/N;
where N represents the number of sub-frames into which the secondary channel signal is divided.
Specifically, the closed-loop pitch period integer part and the closed-loop pitch period fractional part of the secondary channel signal are first determined according to the pitch period estimation value of the primary channel signal, for example, the integer part of the pitch period estimation value of the primary channel signal is directly used as the closed-loop pitch period integer part of the secondary channel signal, the fractional part of the pitch period estimation value of the primary channel signal is used as the closed-loop pitch period fractional part of the secondary channel signal, and the pitch period estimation value of the primary channel signal may be mapped to the closed-loop pitch period integer part and the closed-loop pitch period fractional part of the secondary channel signal by using an interpolation method. For example, the closed loop pitch period integer part of the secondary channel is loc _ T0 and the closed loop pitch period fractional part is loc _ frac _ prim, all as obtained by the above method.
N represents the number of sub-frames into which the secondary channel signal is divided, for example, N may have a value of 3,4, or 5, and the specific value depends on the application scenario. The closed-loop pitch reference value of the secondary channel signal can be calculated by the above formula, but is not limited to that, in the embodiment of the present application, the closed-loop pitch reference value of the secondary channel signal can be calculated by not being limited to the above formula, for example, after the calculation result of loc _ T0+ loc _ frac _ prim/N, a correction factor can be further set, and the result of multiplying the correction factor by loc _ T0+ loc _ frac _ prim/N can be used as the final output f _ pitch _ prim. For example, on the right side of the equation in f _ pitch _ prim ═ loc _ T0+ loc _ frac _ prim/N, N may be replaced with N-1, and the final f _ pitch _ prim may be calculated in the same manner.
In some embodiments of the present application, determining the upper pitch lag index value limit of the secondary channel signal according to the pitch search range adjustment factor of the secondary channel signal comprises:
calculating the pitch period index value upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal by the following method;
soft_reuse_index_high_limit=0.5+2Z
wherein, Z is a pitch period searching range adjusting factor of the secondary sound channel signal, and the value of Z is as follows: 3. or 4, or 5.
In calculating the pitch period index upper limit of the secondary channel signal in the differential coding, it is necessary to first determine the pitch period search range adjustment factor Z of the secondary channel signal, and then calculate the following formula: soft _ reuse _ index _ high _ limit is 0.5+2ZFor example, Z may be 3,4, or 5, and a specific value of Z is not limited herein and depends on an application scenario.
After determining the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal, the coding end performs differential coding according to the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal, and outputs the pitch period index value of the secondary channel signal.
Further, in some embodiments of the present application, calculating a pitch period index value of the secondary channel signal according to the pitch period estimation value of the primary channel signal, the pitch period estimation value of the secondary channel signal, and the pitch period index value upper limit of the secondary channel signal includes:
determining a closed loop pitch integer portion loc _ T0 of the secondary channel signal and a closed loop pitch fractional portion loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal;
the pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
wherein, pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, soft _ reuse _ index _ high _ limit represents an upper limit of the pitch period index value of the secondary channel signal, N represents the number of subframes into which the secondary channel signal is divided, M represents an adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, represents a multiplication operator, + represents an addition operator, and-represents a subtraction operator.
Specifically, the closed-loop pitch period integer part loc _ T0 of the secondary channel signal and the closed-loop pitch period fractional part loc _ frac _ prim of the secondary channel signal are first determined based on the pitch period estimate of the primary channel signal, as described in detail in the foregoing calculation process. N represents the number of sub-frames into which the secondary channel signal is divided, for example, the value of N may be 3, or 4, or 5, M represents an adjustment factor of the upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, for example, the value of M may be 2 or 3, and the values of N and M depend on the application scenario and are not limited herein.
Without limitation, the pitch period index value of the secondary channel signal may be calculated according to the present embodiment without being limited to the above formula, for example, after the calculated result of (N × pitch _ reuse + pitch _ frac _ pitch _ reuse) — (N × loc _ T0+ loc _ frac _ prim) + pitch _ reuse _ index _ high _ limit/M, a correction factor may be further set, and the correction factor may be further multiplied by the result of (N × pitch _ reuse + pitch _ frac _ pitch _ precision _ reuse) ((N × pitch _ T0+ loc _ frac _ prime) + pitch _ reuse _ index _ high _ limit/M), and may be used as the final output pitch _ reuse _ index.
For example, the left _ reuse _ index (N × pitch _ soft _ reuse + pitch _ frac _ soft _ reuse) — (N × loc _ T0+ loc _ frac _ prim) + the right side of the equation in the soft _ reuse _ index _ high _ limit/M may be added with a correction factor, and the specific value of the correction factor is not limited, and the final soft _ reuse _ index may be calculated.
In the embodiment of the present application, the pitch period of the secondary channel signal is differentially encoded using the pitch period estimated value of the primary channel signal, and a pitch period index value of the secondary channel signal is obtained, where the pitch period index value of the secondary channel signal is used to represent the pitch period of the secondary channel signal. After the pitch period index value of the secondary channel signal is obtained, the pitch period index value of the secondary channel signal can be used for generating a stereo coded code stream to be sent. After the coding end generates the stereo coding code stream, the stereo coding code stream can be output and sent to the decoding end through the audio transmission channel.
411. And determining whether to carry out differential decoding on the pitch period of the secondary channel signal according to the received stereo coded code stream.
In this embodiment of the present application, whether to perform differential decoding on the pitch period of the secondary channel signal is determined according to the received stereo coded code stream, for example, the decoding end may determine whether to perform differential decoding on the pitch period of the secondary channel signal according to indication information carried in the stereo coded code stream. For another example, after the transmission environment of the stereo signal is preconfigured, whether to perform differential decoding may be preconfigured, so that the decoding end may determine whether to perform differential decoding on the pitch period of the secondary channel signal according to the preconfigured result.
In some embodiments of the present application, the step 411 determining whether to perform differential decoding on the pitch period of the secondary channel signal according to the received stereo coded code stream includes:
acquiring a secondary channel pitch period differential coding identifier from a current frame;
when the secondary channel pitch period differential coding flag is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal.
In this embodiment, the secondary channel pitch difference coding flag may have a plurality of values, for example, the secondary channel pitch difference coding flag may be a preset first value or a second value. For example, the value of the secondary channel pitch difference coding flag may be 0 or 1, the first value is 1, and the second value is 0. For example, when the value of the secondary channel pitch period difference coding flag is 1, the step 412 is triggered to be executed.
For example, the secondary channel Pitch difference coding is identified as Pitch _ reuse _ flag as follows. For example, in the secondary channel decoding, a secondary channel Pitch period differential coding flag Pitch _ reuse _ flag is obtained; the Pitch _ reuse _ flag is 1 when the Pitch period of the secondary channel signal can be differentially decoded, the differential decoding method in the embodiment of the present application is performed, and the Pitch _ reuse _ flag is 0 when the Pitch period of the secondary channel signal cannot be differentially decoded, and the independent decoding method is performed. For example, in the embodiment of the present application, the differential decoding processes in steps 412 and 413 are performed only when it is satisfied that Pitch _ reuse _ flag is all 1.
In some embodiments of the present application, methods provided by embodiments of the present application further include:
and when the pitch period of the secondary channel signal is determined not to be differentially decoded and the pitch period estimated value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, decoding the pitch period of the secondary channel signal from the stereo coding code stream.
In this case, in the embodiment of the present application, a pitch period independent decoding method for a secondary channel may be further used to decode the pitch period of the secondary channel signal, so that the decoding end may decode the pitch period of the secondary channel signal.
In some embodiments of the present application, methods provided by embodiments of the present application further include:
when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimate of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, the pitch period estimate of the primary channel signal is taken as the pitch period of the secondary channel signal.
When the decoding end determines not to perform differential decoding on the pitch period of the secondary channel signal, a pitch period multiplexing method may also be adopted in the embodiment of the present application. For example, when the secondary channel signal pitch multiplexing flag indicates that the pitch period of the secondary channel signal multiplexes the pitch period estimated value of the primary channel signal, the decoding side may decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal based on the secondary channel signal pitch multiplexing flag.
In other embodiments of the present application, according to the value identified by the secondary channel pitch period difference coding, the stereo decoding method performed by the decoding end may further include the following steps:
and when the pitch period differential coding identifier of the secondary sound channel is a preset second value and the pitch period multiplexing identifier of the secondary sound channel signal carried in the stereo coding code stream is a preset third value, determining that differential decoding is not performed on the pitch period of the secondary sound channel signal and the pitch period estimated value of the main sound channel signal is not multiplexed as the pitch period of the secondary sound channel signal, and decoding the pitch period of the secondary sound channel signal from the stereo coding code stream.
In other embodiments of the present application, according to the value identified by the secondary channel pitch period difference coding, the stereo decoding method performed by the decoding end may further include the following steps:
and when the pitch period differential coding identifier of the secondary sound channel is a preset second value and the pitch period multiplexing identifier of the secondary sound channel signal carried in the stereo coding code stream is a preset fourth value, determining not to perform differential decoding on the pitch period of the secondary sound channel signal, and taking the pitch period estimated value of the primary sound channel signal as the pitch period of the secondary sound channel signal.
When the secondary channel pitch period differential coding flag is a second value, it is determined that the differential decoding process in steps 412 and 413 is not performed, the secondary channel signal pitch period multiplexing flag carried in the stereo coded code stream is further parsed, whether the pitch period of the secondary channel signal is multiplexed with the pitch period estimation value of the primary channel signal is indicated by the secondary channel signal pitch period multiplexing flag, when the value of the secondary channel signal pitch period multiplexing flag is a fourth value, the pitch period of the secondary channel signal is indicated to be multiplexed with the pitch period estimation value of the primary channel signal, and the decoding end may perform decoding using the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing flag. When the pitch period multiplexing identifier of the secondary channel signal takes a third value, the pitch period multiplexing identifier indicates that the pitch period of the secondary channel signal does not multiplex the pitch period estimated value of the primary channel signal, the pitch period of the secondary channel signal is decoded from the stereo coding code stream, and the pitch period of the secondary channel signal and the pitch period of the primary channel signal can be decoded respectively, that is, the pitch period of the secondary channel signal is independently decoded. The decoding end can determine to execute a differential decoding method or an independent decoding method according to the pitch period differential coding identification of the secondary channel carried in the stereo coding code stream.
In the embodiment of the present application, when the pitch period of the secondary channel signal is not differentially decoded, the pitch period of the secondary channel signal may be decoded by using a pitch period independent decoding method for the secondary channel. In addition, when the pitch period of the secondary channel signal is not differentially decoded, a pitch period multiplexing method may be used. If the stereo coded code stream carries a secondary channel signal pitch period multiplexing identifier, the secondary channel signal pitch period multiplexing identifier indicates whether the pitch period of the secondary channel signal is multiplexed with the pitch period estimation value of the primary channel signal, and when the secondary channel signal pitch period multiplexing identifier indicates that the pitch period of the secondary channel signal is multiplexed with the pitch period estimation value of the primary channel signal, the decoding end can decode the pitch period of the primary channel signal as the pitch period of the secondary channel signal according to the secondary channel signal pitch period multiplexing identifier.
412. When the pitch period of the secondary channel signal is determined to be differentially decoded, a pitch period estimated value of a primary channel of the current frame and a pitch period index value of a secondary channel of the current frame are obtained from the stereo coded stream.
In this embodiment of the application, after the coding end sends the stereo coded code stream, the decoding end first receives the stereo coded code stream through the audio transmission channel, and then performs channel decoding according to the stereo coded code stream, and if it is necessary to perform differential decoding on the pitch period of the secondary channel signal, the pitch period index value of the secondary channel signal of the current frame may be obtained from the stereo coded code stream, and the pitch period estimation value of the primary channel signal of the current frame may also be obtained from the stereo coded code stream.
413. And carrying out differential decoding on the pitch period of the secondary channel signal according to the pitch period estimated value of the primary channel and the pitch period index value of the secondary channel to obtain the pitch period estimated value of the secondary channel signal, wherein the pitch period estimated value of the secondary channel signal is used for decoding the stereo coding code stream.
In this embodiment, when it is determined in step 411 that it is necessary to perform differential decoding on the pitch period of the secondary channel signal, the pitch period of the secondary channel signal may be differentially decoded by using the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal, so as to implement accurate secondary channel pitch period decoding and improve the overall stereo decoding quality.
Next, a specific process of differential decoding in the embodiment of the present application is described, specifically, step 413 performs differential decoding on a pitch lag of the secondary channel signal according to the pitch lag estimated value of the primary channel signal and the pitch lag index value of the secondary channel signal, and includes:
determining a closed-loop pitch period reference value of a secondary channel signal according to a pitch period estimated value of a primary channel signal and the number of divided subframes of the secondary channel signal of a current frame;
determining the upper limit of the pitch period index value of the secondary channel signal according to the pitch period searching range adjustment factor of the secondary channel signal;
and calculating the pitch period estimation value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal.
For example, the closed-loop pitch reference value of the secondary channel signal is determined using the pitch estimate of the primary channel signal, as described in detail in the foregoing calculation procedure. The pitch search range adjustment factor of the secondary channel signal may be used to adjust a pitch index value of the secondary channel signal to determine an upper pitch index value limit of the secondary channel signal. The pitch period index value upper limit of the secondary channel signal represents an upper limit value that the pitch period index value of the secondary channel signal cannot exceed. The pitch index value of the secondary channel signal may be used to determine the pitch index value of the secondary channel signal.
After determining the closed-loop pitch reference value of the secondary channel signal, the pitch index value of the secondary channel signal and the upper limit of the pitch index value of the secondary channel signal, the decoding end performs differential decoding according to the closed-loop pitch reference value of the secondary channel signal, the pitch index value of the secondary channel signal and the upper limit of the pitch index value of the secondary channel signal, and outputs the pitch estimation value of the secondary channel signal.
Further, in some embodiments of the present application, calculating the pitch period estimation value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel signal, and the pitch period index value upper limit of the secondary channel signal includes:
the pitch period estimate T0_ pitch of the secondary channel signal is calculated as follows:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
where f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of subframes into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number,/represents a division operator, + represents an addition operator, and-represents a subtraction operator.
Specifically, the closed-loop pitch period integer part loc _ T0 of the secondary channel signal and the closed-loop pitch period fractional part loc _ frac _ prim of the secondary channel signal are first determined based on the pitch period estimate of the primary channel signal, as described in detail in the foregoing calculation process. N represents the number of sub-frames into which the secondary channel signal is divided, for example, the value of N may be 3, or 4, or 5, M represents an adjustment factor for the upper limit of the pitch period index value of the secondary channel signal, for example, the value of M may be 2, or 3, and the values of N and M depend on the application scenario and are not limited herein.
Without limitation, the pitch period estimation value of the secondary channel signal may be calculated by the above formula, for example, after f _ pitch _ prim + (soft _ reuse _ index-soft _ reuse _ index _ high _ limit/M)/N calculation result, a correction factor may be further set, and the result of multiplying the correction factor by f _ pitch _ prim + (soft _ reuse _ index-soft _ reuse _ index _ high _ limit/M)/N may be T0_ pitch which is finally output. For another example, to the right of the equation in T0_ pitch ═ f _ pitch _ prim + (soft _ reuse _ index-soft _ reuse _ index _ high _ limit/M)/N, a correction factor may be added, and a specific value of the correction factor is not limited, and the final T0_ pitch may also be calculated.
After the pitch lag estimate T0_ pitch of the secondary channel signal is calculated, the pitch lag estimate integer part T0 and the pitch lag estimate fractional part T0_ frac of the secondary channel signal may be further calculated from the pitch lag estimate T0_ pitch of the secondary channel signal. For example, T0 is INT (T0_ pitch), and T0_ frac is (T0_ pitch-T0) N.
Where INT (T0_ pitch) represents the rounding down of T0_ pitch, T0 is the integer part of the decoded secondary channel pitch period, and T0_ frac is the fractional part of the decoded secondary channel pitch period.
By way of illustration of the foregoing embodiment, in the embodiment of the present application, since the pitch lag of the secondary channel signal is differentially encoded by using the pitch lag estimated value of the primary channel signal, the pitch lag allocated to the secondary channel signal can be differentially encoded by using a small amount of bit resources, and the spatial perception and the sound image stability of the stereo signal can be improved by differentially encoding the pitch lag of the secondary channel signal. In addition, in the embodiment of the application, the differential coding of the pitch period of the secondary channel signal is performed by using smaller bit resources, so that the saved bit resources can be used for other coding parameters of stereo, the coding efficiency of the secondary channel is further improved, and the overall stereo coding quality is finally improved. In addition, in the embodiment of the present application, when the pitch period of the secondary channel signal can be differentially decoded, the pitch period of the secondary channel signal can be differentially decoded using the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal, so that the pitch period estimation value of the secondary channel signal is obtained, and the stereo coded stream can be decoded using the pitch period estimation value of the secondary channel signal, so that the spatial impression and the audio-video stability of the stereo signal can be improved.
In order to better understand and implement the above-described scheme of the embodiments of the present application, the following description specifically illustrates a corresponding application scenario.
The pitch period coding scheme for the secondary channel signal provided by the embodiment of the application judges whether the pitch period of the secondary channel signal can be differentially coded in the pitch period coding process of the secondary channel signal, when the pitch period of the secondary channel signal can be differentially coded, the pitch period of the secondary channel signal is coded by adopting a differential coding method facing the pitch period of the secondary channel signal, a small amount of bit resources are used for carrying out differential coding, saved bits are distributed to other stereo coding parameters, accurate pitch period coding of the secondary channel signal is realized, and the overall stereo coding quality is improved.
In the embodiment of the present application, the stereo signal may be an original stereo signal, or may be a stereo signal composed of two signals included in a multi-channel signal, or may be a stereo signal composed of two signals generated by combining multiple signals included in a multi-channel signal. The stereo coding can constitute an independent stereo encoder, and can also be used as a core coding part in a multi-channel encoder, aiming at coding a stereo signal consisting of two paths of signals generated by combining a plurality of paths of signals contained in the multi-channel signal.
The embodiment of the present application illustrates an encoding rate of a stereo signal being 24.4kbps, and it should be understood that the embodiment of the present application is not limited to the implementation at the encoding rate of 24.4kbps, and can also be applied to stereo encoding at a lower rate.
Fig. 5 is a schematic flowchart of stereo signal encoding according to an embodiment of the present application. The embodiment of the present application provides a method for discriminating pitch period coding in stereo coding, where the stereo coding may be time domain stereo coding, frequency domain stereo coding, or time-frequency combined stereo coding, and the embodiment of the present application is not limited. Taking frequency domain stereo coding as an example, the following describes the encoding and decoding process of stereo coding, and focuses on the encoding process of pitch period in secondary channel signal encoding in the subsequent steps. Specifically, the method comprises the following steps:
firstly, explaining from a coding end of frequency domain stereo coding, the concrete implementation steps of the coding end are as follows:
and S01, performing time domain preprocessing on the left and right channel time domain signals.
Stereo signal encoding is typically performed using a framing process. If the sampling rate of the stereo audio signal is 16KHz, each frame of signal is 20ms, and the frame length is denoted as N, N is 320, that is, the frame length is 320 samples. The stereo signal of the current frame comprises a left channel time domain signal of the current frame and a right channel time domain signal of the current frame, and the left channel time domain signal of the current frame is marked as xL(n), the right channel time domain signal of the current frame is denoted as xR(N), wherein N is the sample number, N is 0,1, …, N-1. The left and right channel time domain signals of the current frame are short for the left channel time domain signal of the current frame and the right channel time domain signal of the current frame.
The time domain preprocessing is performed on the left and right channel time domain signals of the current frame, and specifically may include: respectively carrying out high-pass filtering processing on the left and right sound channel time domain signals of the current frame to obtain left and right sound channel time domain signals after the current frame is preprocessed, and recording the left time domain signal after the current frame is preprocessed as xL_HP(n), the right time domain signal after the current frame preprocessing is recorded as xR_HP(n) of (a). Wherein N is a sample number, and N is 0,1, …, N-1. The left and right channel time domain signals after the current frame preprocessing are short for the left channel time domain signal after the current frame preprocessing and the right channel time domain signal after the current frame preprocessing. The high-pass filtering process may be an Infinite Impulse Response (IIR) filter with a cut-off frequency of 20Hz, or other types of filters. For example, a high pass filter with a cut-off frequency of 20Hz for a sample rate of 16KHz has a transfer function of:
Figure BDA0002113272790000261
wherein, b0=0.994461788958195,b1=-1.988923577916390,b2=0.994461788958195,a1=1.988892905899653,a2Z is the transform factor in the Z transform domain-0.988954249933127.
The corresponding time domain filter is:
xL_HP(n)=b0*xL(n)+b1*xL(n-1)+b2*xL(n-2)-a1*xL_HP(n-1)-a2*xL_HP(n-2),
it will be appreciated that the time domain pre-processing of the left and right channel time domain signals of the current frame is not a necessary step. If there is no time domain preprocessing step, the left and right channel signals used for time delay estimation are the left and right channel signals in the original stereo signal. Here, the left and right channel signals in the original stereo signal refer to collected Pulse Code Modulation (PCM) signals after analog-to-digital conversion, and the sampling rate of the signals may include 8KHz, 16KHz, 32KHz, 44.1KHz, and 48 KHz.
In addition, the preprocessing may include other processing, such as pre-emphasis processing, besides the high-pass filtering processing described in this embodiment, which is not limited in this embodiment.
And S02, performing time domain analysis according to the preprocessed left and right channel signals.
In particular, the time domain analysis may include transient detection, etc. The transient detection may be energy detection of the left and right channel time domain signals after the current frame is preprocessed, and whether the current frame has an energy mutation is detected. For example, the energy E of the pre-processed left channel time domain signal of the current frame is calculatedcur_L(ii) a According to the energy E of the left channel time domain signal after the previous frame preprocessingpre_LAnd the energy E of the left channel time domain signal after the current frame preprocessingcur_LAnd carrying out transient detection on the absolute value of the difference value to obtain a transient detection result of the left channel time domain signal after the current frame is preprocessed. Similarly, transient detection can be performed on the right channel time domain signal after the current frame preprocessing by the same method. The time domain analysis may comprise other time domain analysis than transient detection, e.g. may comprise a time domain inter-channel time differenceDetermining parameters (ITD), performing time domain delay alignment, performing frequency band extension preprocessing, and the like.
And S03, performing time-frequency transformation on the preprocessed left and right channel signals to obtain left and right channel frequency domain signals.
Specifically, discrete fourier transform may be performed on the preprocessed left channel signal to obtain a left channel frequency domain signal; and performing discrete Fourier transform on the preprocessed right channel signal to obtain a right channel frequency domain signal. In order to overcome the problem of spectrum aliasing, splicing and adding methods are generally adopted between two consecutive discrete fourier transforms for processing, and sometimes zero padding is performed on an input signal of the discrete fourier transform.
The discrete fourier transform may be performed once per frame, or may divide each frame into P subframes, each of which is performed once. If the frequency domain signal is processed once per frame, the left channel frequency domain signal after transformation may be referred to as L (k), k is 0,1, …, L/2-1, L represents a sampling point, and the right channel frequency domain signal after transformation may be referred to as r (k), k is 0,1, …, L/2-1, k is a frequency point index value. If the signal is processed once per sub-frame, the left channel frequency domain signal of the ith sub-frame after transformation can be recorded as Li(k) Where k is 0,1, …, L/2-1, and the right channel frequency domain signal of the i-th sub-frame after transformation can be denoted as Ri(k) K is 0,1, …, L/2-1, k is the frequency point index value, i is the subframe index value, i is 0,1, … P-1. For example, in this embodiment, taking wideband as an example, the wideband means that the encoding bandwidth may be 8kHz or more, the signal of each frame of left channel or each frame of right channel is 20ms, the frame length is denoted as N, and then N is 320, that is, the frame length is 320 samples. Each frame signal is divided into two sub-frames, namely P is 2, each sub-frame signal is 10ms, and the length of each sub-frame is 160 sampling points. Performing discrete fourier transform once per sub-frame, where the length of the discrete fourier transform is denoted as L, and L is 400, that is, the length of the discrete fourier transform is 400 samples, and then the left channel frequency domain signal of the i-th sub-frame after the transform can be denoted as Li(k) Where k is 0,1, …, L/2-1, and the right channel frequency domain signal of the i-th sub-frame after transformation can be denoted as Ri(k) K is 0,1, …, L/2-1, k is frequency point index value, i is subframe index value, i is 0,1, …, P-1。
And S04, determining ITD parameters and coding.
The method for determining the ITD parameter may be performed only in the frequency domain, only in the time domain, or in a time-frequency combination method, and the embodiment of the present application is not limited.
For example, the left and right channel cross-correlation coefficients may be used in the time domain to extract ITD parameters, such as: within the range of i is more than or equal to 0 and less than or equal to Tmax, calculating
Figure BDA0002113272790000281
And
Figure BDA0002113272790000282
if it is not
Figure BDA0002113272790000283
The ITD parameter value is the inverse number of the index value corresponding to the max (Cn (i)), wherein the index table corresponding to the max (Cn (i)) value is defined in the codec by default; otherwise, the ITD parameter value is an index value corresponding to max (Cp (i)).
Wherein i is an index value for calculating the cross-correlation coefficient, j is an index value of the sampling point, Tmax corresponds to the maximum value of the ITD value under different sampling rates, and N is the frame length. ITD parameters may also be determined in the frequency domain based on left and right channel frequency domain signals, for example: the time domain signal may be converted into a frequency domain signal using Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), Modified Discrete Cosine Transform (MDCT), and the like. In this embodiment, the left channel frequency domain signal L of the ith sub-frame after DFT conversioni(k) K is 0,1, …, L/2-1, and the right channel frequency domain signal R of the i-th sub-frame after transformationi(k) K is 0,1, …, L/2-1, i is 0,1, …, P-1, calculating the frequency domain correlation coefficient of the ith subframe: XCORRi(k)=Li(k)*R* i(k) In that respect Wherein R is* i(k) Is the conjugate of the frequency domain signal of the right sound channel of the ith sub-frame after time frequency transformation. Converting frequency-domain cross-correlation coefficients to time-domain xcorri(n),n=0,1,…,L-1,At L/2-Tmax≤n≤L/2+TmaxSearch within range xcorri(n) to obtain the ITD parameter value of the ith subframe as
Figure BDA0002113272790000284
For another example, the search range-T may also be determined according to the left channel frequency domain signal of the ith sub-frame and the right channel frequency domain signal of the ith sub-frame after DFT transformationmax≤j≤TmaxCalculating an amplitude value:
Figure BDA0002113272790000285
then the ITD parameter value is
Figure BDA0002113272790000286
I.e. the index value corresponding to the value with the largest amplitude value.
After the ITD parameter is determined, the ITD parameter needs to be residual encoded and entropy encoded in an encoder, and then written into a stereo encoded code stream.
And S05, performing time shift adjustment on the left and right channel frequency domain signals according to the ITD parameters.
There are various ways of performing time shift adjustment on the left and right channel frequency domain signals in the embodiments of the present application, and the following description is given by way of example.
In this embodiment, taking the example that each frame signal is divided into P subframes, where P is 2, the left channel frequency domain signal of the ith subframe after time shift adjustment can be denoted as Li' (k), k is 0,1, …, L/2-1, and the time-shifted right channel frequency domain signal of the i-th sub-frame can be denoted as Ri' (k), k is 0,1, …, L/2-1, k is frequency point index value, i is 0,1, …, P-1.
Figure BDA0002113272790000291
Figure BDA0002113272790000292
Wherein, tauiIs the ITD parameter value of the ith subframe, L is the length of discrete Fourier transform, Li(k) For the left channel frequency domain signal, R, of the ith sub-frame after time-frequency transformationi(k) For the right channel frequency domain signal of the ith transformed subframe, i is the subframe index value, i is 0,1, …, P-1.
It will be appreciated that if the DFT is not performed frame by frame, the time shift adjustment may be performed once for an entire frame. And performing time shift adjustment according to each subframe after framing, and performing time shift adjustment according to each frame if the subframe is not framed.
And S06, calculating other frequency domain stereo parameters and coding.
Other frequency domain stereo parameters may include, but are not limited to: inter-channel phase difference (IPD) parameter, inter-channel level difference (also called ILD) parameter, subband edge gain, and the like, which are not limited in the embodiments of the present application. After other frequency domain stereo parameters are obtained through calculation, residual coding and entropy coding are needed to be carried out on the stereo parameters, and the residual coding and entropy coding are written into a stereo coding code stream.
And S07, calculating a primary channel signal and a secondary channel signal.
A primary channel signal and a secondary channel signal are calculated. Specifically, any one of the time domain or frequency domain downmix processes in the embodiments of the present application may be used. For example, the primary channel signal and the secondary channel signal of the current frame may be calculated from the left channel frequency domain signal of the current frame and the right channel frequency domain signal of the current frame; calculating a primary channel signal and a secondary channel signal of each sub-band corresponding to the current frame preset low frequency band according to the left channel frequency domain signal of each sub-band corresponding to the current frame preset low frequency band and the right channel frequency domain signal of each sub-band corresponding to the current frame preset low frequency band; or calculating the primary channel signal and the secondary channel signal of each subframe of the current frame according to the left channel frequency domain signal of each subframe of the current frame and the right channel frequency domain signal of each subframe of the current frame; and calculating a primary channel signal and a secondary channel signal of each sub-band corresponding to each sub-frame preset low frequency band of the current frame according to the left channel frequency domain signal of each sub-band corresponding to each sub-frame preset low frequency band of the current frame and the right channel frequency domain signal of each sub-band corresponding to each sub-frame preset low frequency band of the current frame. According to the left channel time domain signal of the current frame and the right channel time domain signal of the current frame, the main channel signal is obtained by adding the two channels of signals, and the secondary channel signal is obtained by subtracting the two channels of signals.
In this embodiment, since each frame signal is subjected to framing processing, the primary channel signal and the secondary channel signal of each sub-frame are converted into the time domain by inverse discrete fourier transform, and overlap-add processing between the sub-frames is performed to obtain the time domain primary channel signal and the secondary channel signal of the current frame.
The process of obtaining the primary channel signal and the secondary channel signal in step S07 is called down-mixing processing, and the primary channel signal and the secondary channel signal are processed from step S08.
And S08, coding the downmixed primary channel signal and secondary channel signal.
Specifically, the primary channel signal encoding and the secondary channel signal encoding may be bit-allocated according to the parameter information obtained in the primary channel signal and the secondary channel signal encoding of the previous frame and the total number of bits of the primary channel signal encoding and the secondary channel signal encoding. The primary channel signal and the secondary channel signal are then encoded separately according to the result of the bit allocation. The primary channel signal encoding and the secondary channel signal encoding may employ any one of a variety of mono audio encoding techniques. For example, the coding method of ACELP is used to code the primary channel signal and the secondary channel signal obtained by the downmix processing. ACELP coding methods generally include: determining a Linear Prediction Coefficient (LPC) and converting the LPC into a line spectral frequency parameter (LSF) for quantization coding; searching adaptive code excitation to determine a pitch period and an adaptive codebook gain, and respectively carrying out quantization coding on the pitch period and the adaptive codebook gain; searching the algebraic code excitation to determine the pulse index and gain of the algebraic code excitation, and respectively carrying out quantization coding on the pulse index and gain of the algebraic code excitation.
Fig. 6 is a flowchart for encoding the pitch period parameters of the primary channel signal and the pitch period parameters of the secondary channel signal according to an embodiment of the present application. The flow shown in fig. 6 includes steps S09 to S12, where the process of encoding the pitch period parameters of the primary channel signal and the pitch period parameters of the secondary channel signal is:
and S09, determining the pitch period of the main channel signal and coding.
In the main sound channel signal coding, the pitch period estimation adopts the combination of open-loop pitch analysis and closed-loop pitch search, so that the accuracy of the pitch period estimation is improved. Pitch period estimation for speech can take a number of methods, such as autocorrelation functions, short-term average amplitude differences, etc. The pitch period estimation algorithm is based on an autocorrelation function. The autocorrelation function has a peak value at the position of integral multiple of the pitch period, and the pitch period estimation can be completed by utilizing the characteristic. To improve the accuracy of pitch prediction and better approximate the actual pitch period of speech, pitch period detection employs a fractional delay of 1/3 sample resolution. In order to reduce the amount of operation of the pitch period estimation, the pitch period estimation comprises two steps, an open-loop pitch analysis and a closed-loop pitch search. The integer delay of a frame of speech is roughly estimated using open-loop pitch analysis to obtain a candidate integer delay, a closed-loop pitch search is performed around which the pitch delay is finely estimated, and the closed-loop pitch search is performed once per subframe. The open-loop pitch analysis is performed once per frame, and the autocorrelation, normalization processing, and calculation of the optimal open-loop integer delay are calculated, respectively.
The pitch estimation value of the primary channel signal obtained in the above steps is used as a pitch reference value of the secondary channel signal in addition to the primary channel signal pitch coding parameter.
S10, whether pitch period difference coding is adopted in the secondary channel coding.
In the secondary channel coding, the secondary channel pitch period difference coding judgment is carried out according to the pitch period estimated value of the primary channel and the open-loop pitch period estimated value of the secondary channel signal, and the judgment condition is as follows:
DIFF=|∑(pitch[0])-∑(pitch[1])|,
wherein, DIFF represents a difference between the pitch period estimation value of the primary channel signal and the open loop pitch period estimation value of the secondary channel signal, | Σ (pitch [0]) - Σ (pitch [1]) | represents taking an absolute value of the difference between Σ (pitch [0]) and Σ (pitch [1]), ∑ pitch [0] represents the pitch period estimation value of the primary channel signal, and Σ pitch [1] represents the open loop pitch period estimation value of the secondary channel signal.
The secondary channel Pitch difference coding flag is indicated by Pitch _ reuse _ flag. DIFF _ THR is a preset secondary channel pitch period difference coding threshold, and the secondary channel pitch period difference coding threshold is determined to be a certain value in {1,3,6} according to different coding rates. For example, when DIFF > DIFF _ THR, Pitch _ reuse _ flag is 1, it is discriminated that the current frame is coded using the Pitch period difference of the secondary channel signal. And when the DIFF is less than or equal to the DIFF _ THR, the Pitch _ reuse _ flag is equal to 0, and the Pitch period difference coding is not carried out at the moment, and the independent coding of the secondary channel signal is adopted.
S11: if the pitch difference coding is not performed, the pitch of the secondary channel signal is coded by using the pitch independent coding method of the secondary channel signal.
However, it is also possible to use a pitch multiplexing method of the secondary channel signal, that is, a method of decoding the pitch of the primary channel signal as the pitch of the secondary channel signal at the decoding end without encoding the pitch of the secondary channel signal at the encoding end, without using the pitch difference encoding of the secondary channel signal.
S12: pitch period differential coding of the secondary channel signal is performed.
The pitch period differential coding of the secondary channel signal comprises the following specific steps:
and S121, performing closed-loop pitch period search on the secondary channel signal according to the pitch period estimated value of the primary channel signal, and determining the pitch period estimated value of the secondary channel signal.
A reference value of a closed-loop pitch lag of the secondary channel signal is determined based on the pitch lag estimate of the primary channel signal S12101.
In the present embodiment, taking the coding rate of 24.4kbps as an example, the pitch coding is performed by sub-frames, the primary channel signal is divided into 5 sub-frames, and the secondary channel signal is divided into 4 sub-frames. The reference value of the pitch lag of the secondary channel signal is determined according to the pitch lag of the primary channel signal, wherein one method is to directly use the pitch lag of the primary channel signal as the pitch lag reference value of the secondary channel signal, i.e. 4 values are selected from the pitch lags in 5 subframes of the primary channel signal as the pitch lag reference values of 4 subframes of the secondary channel signal. Another method is to use an interpolation method to map the pitch period in 5 sub-frames of the primary channel signal to the pitch period reference value of 4 sub-frames of the secondary channel signal. The closed-loop pitch reference values for the secondary channel signal are obtained by the above method, where the integer part is loc _ T0 and the fractional part is loc _ frac _ prim.
S12102, according to the pitch period reference value of the secondary sound channel signal, the closed-loop pitch period search of the secondary sound channel signal is carried out, and the pitch period of the secondary sound channel signal is determined. The method specifically comprises the following steps: and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, performing the closed-loop pitch period search by adopting integer precision and downsampling fractional precision, and calculating interpolation normalization correlation to obtain a pitch period estimation value of the secondary channel signal.
For example, one method is to use 2bits (bits) for pitch coding of the secondary channel signal, specifically:
using loc _ T0 as a search starting point, performing integer precision search on the pitch period of the secondary channel signal in the range of [ loc _ T0-1 and loc _ T0+1], using loc _ frac _ prim as an initial value at each search point, performing fractional precision search on the pitch period of the secondary channel signal in the range of [ loc _ frac _ prim +2, loc _ frac _ prim +3] or [ loc _ frac _ prim, loc _ frac _ prim-3] or [ loc _ frac _ prim-2, loc _ frac _ prim +1], calculating the corresponding interpolated normalized correlation of each search point, calculating the corresponding similarity of a plurality of search points in one frame, and when the interpolated normalized correlation takes the maximum value, the search point is the estimated value of the pitch period of the secondary channel signal, wherein the integer part is pitch _ reuse and the fractional part is pitch _ frac _ repair _ reuse.
For another example, another method is to use 3bits to 5bits for coding the pitch period of the secondary channel signal, specifically:
when 3bits to 5bits are used for coding the pitch period of the secondary channel signal, the search radius half _ range is 1,2, and 4, respectively. At this time, with loc _ T0 as a search starting point, integer precision search is performed on the pitch period of the secondary channel signal within the range of [ loc _ T0-half _ range, loc _ T0+ half _ range ], each search point further uses loc _ frac _ prim as an initial value, and an interpolated normalized correlation corresponding to each search point is calculated within the range of [ loc _ frac _ prim, loc _ frac _ prim +3] or [ loc _ frac _ prim, loc _ frac _ prim-1] or [ loc _ frac _ prim, loc _ frac _ prim +3], when the interpolated normalized correlation takes a maximum value, the search point is the optimal pitch period estimation value of the secondary channel signal, wherein the integer part is pitch _ soft _ reuse and the fraction part is pitch _ frac _ repair _ reuse.
S122: the pitch period of the primary channel signal and the pitch period of the secondary channel signal are differentially encoded. The method specifically comprises the following steps:
and S1221, calculating the pitch period index upper limit of the secondary channel signal in the differential coding.
The secondary channel signal pitch period index upper limit is calculated using the following equation:
soft_reuse_index_high_limit=2Z
wherein Z is a secondary channel pitch period search range adjustment factor. In the present embodiment, Z is 3,4, 5.
S1222, calculating the pitch period index value of the secondary channel signal in the differential coding.
The secondary channel signal pitch index represents the result of differentially encoding the difference between the reference value of the secondary channel signal pitch period obtained in the previous step and the optimal secondary channel signal pitch period estimate.
The pitch period index value soft _ reuse _ index of the secondary channel signal is calculated by the following formula:
soft_reuse_index=(4*pitch_soft_reuse+pitch_frac_soft_reuse)-(4*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/2。
and S1223, differentially encoding the pitch period index of the secondary channel signal.
For example, the secondary channel signal pitch period index soft _ reuse _ index is residual-coded.
The embodiment of the application adopts a pitch period code method of a secondary channel signal, each coded frame is divided into 4 sub-frames (subframes), and the pitch period of each sub-frame is differentially coded. 22bits or 18bits can be saved compared to pitch period independent coding of the secondary channel signal and allocated to other coding parameters for quantization coding, e.g. the saved bit overhead can be allocated to a fixed code table (fixed codebook).
The method and the device for coding the stereo sound channel signal have the advantages that other parameter coding of the main sound channel signal and the secondary sound channel signal is completed by the method and the device, so that the coding code streams of the main sound channel signal and the secondary sound channel signal are obtained, and the coding data are written into the stereo coding code stream according to a certain code stream format requirement.
Next, an effect of saving coding overhead of the secondary channel signal in the embodiment of the present application is illustrated, for the secondary channel signal pitch period independent coding mode, the number of pitch period coding bits allocated to 4 subframes is 10,6,9, and 6, respectively, that is, 31bits are required to code each frame. By adopting the secondary channel signal-oriented pitch period differential coding method provided by the embodiment of the application, each subframe only needs 3bits for differential coding, and then needs 1bit for indicating whether to perform differential coding on the pitch period of the secondary channel signal (the value of 1bit may be 0 or 1, for example, differential coding is required when the value is 1, and differential coding is not performed when the value is 0). Therefore, the pitch period of the secondary channel signal coded by the method of the embodiment of the present application only needs 31-4 × 3-13 bits per frame. I.e. 18bits can be saved and allocated to other coding parameters, e.g. fixed code table parameters, etc.
As shown in fig. 8, a comparison graph of the number of bits allocated to the fixed code table after the independent coding scheme and the differential coding scheme is adopted, where a solid line indicates the number of bits allocated to the fixed code table after the independent coding scheme, and a dotted line indicates the number of bits allocated to the fixed code table after the differential coding scheme. It can be seen from fig. 8 that the use of pitch period difference coding for the secondary channel signal saves a large amount of bit resources allocated to the quantization coding of the fixed code table, so that the coding quality of the secondary channel signal is improved.
Next, the stereo decoding algorithm executed by the decoding end is illustrated, and the following flow is mainly executed:
s13: reading a Pitch _ reuse _ flag from the code stream;
s14: when the following conditions are satisfied: when the coding rate of the secondary channel signal is low and Pitch _ reuse _ flag is 1, the Pitch period differential decoding of the secondary channel signal is performed, otherwise the Pitch period independent decoding of the secondary channel signal is performed.
Without limitation, when the following conditions are not satisfied: when the coding rate of the secondary channel signal is low and Pitch _ reuse _ flag is 1, the Pitch lag estimated value of the primary channel signal may be further multiplexed by the secondary channel signal Pitch lag multiplexing flag indicating that the Pitch lag of the secondary channel signal is multiplexed, and the decoding end may decode the Pitch lag of the primary channel signal as the Pitch lag of the secondary channel signal according to the secondary channel signal Pitch multiplexing flag.
For example, the secondary channel Pitch difference coding flag is indicated by Pitch _ reuse _ flag as follows. DIFF _ THR is a preset secondary channel pitch period difference coding threshold, and the secondary channel pitch period difference coding threshold is determined to be a certain value in {1,3,6} according to different coding rates. For example, when DIFF > DIFF _ THR, Pitch _ reuse _ flag is 1, it is discriminated that the current frame is coded using the Pitch period difference of the secondary channel signal. And when the DIFF is less than or equal to the DIFF _ THR, the Pitch _ reuse _ flag is equal to 0, and the Pitch period difference coding is not carried out at the moment, and the independent coding of the secondary channel signal is adopted.
S1401: pitch period mapping.
In this embodiment pitch coding is performed in subframes, the primary channel being divided into 5 subframes and the secondary channel being divided into 4 subframes. The reference value of the secondary channel pitch period is determined from the pitch period estimate of the primary channel signal, wherein one approach is to directly use the pitch period of the primary channel as the reference value of the secondary channel pitch period, i.e. 4 values are selected from the pitch periods in the 5 subframes of the primary channel as the pitch period reference values of the 4 subframes of the secondary channel. Another method is to use an interpolation method to map the pitch period in 5 sub-frames of the primary channel to the pitch period reference value of 4 sub-frames of the secondary channel. Both the integer part loc _ T0 and the fractional part loc _ frac _ prim of the secondary channel closed loop pitch period can be obtained by the above method.
And S1402, calculating a secondary channel closed-loop pitch period reference value.
The secondary channel closed-loop pitch period reference value f _ pitch _ prim is calculated using the following equation:
f_pitch_prim=loc_T0+loc_frac_prim/4.0
and S1403, calculating the upper limit of the pitch period index of the secondary channel in differential coding.
The secondary channel pitch period index upper limit is calculated using the following equation:
soft_reuse_index_high_limit=0.5+2Z
wherein Z is a secondary channel pitch period search range adjustment factor. In this embodiment, Z may be 3,4 or 5.
S1404: reading a pitch period index value soft _ reuse _ index of a secondary sound channel from the code stream;
s1405: a pitch period estimate of the secondary channel signal is calculated.
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/2.0)/4.0。
T0=INT(T0_pitch),
T0_frac=(T0_pitch–T0)*4.0。
Where INT (T0_ pitch) represents the rounding down of T0_ pitch, T0 is the integer part of the decoded secondary channel pitch period, and T0_ frac is the fractional part of the decoded secondary channel pitch period.
In the foregoing embodiment, which describes the stereo codec procedure in the frequency domain, and next describes the application of the embodiment of the present application to time domain stereo coding, steps S01 to S07 in the foregoing embodiment will be replaced by steps S21 to S26 described below. As shown in fig. 9, a schematic diagram of a time-domain stereo coding method provided in an embodiment of the present application is specifically:
and S21, performing time domain preprocessing on the stereo time domain signal to obtain a preprocessed stereo left and right channel signal.
If the sampling rate of the stereo audio signal is 16KHz, a frame of signal is 20ms, and the frame length is denoted as N, N is 320, that is, the frame length is 320 samples. The stereo signal of the current frame comprises a left channel time domain signal of the current frame and a right channel time domain signal of the current frame, and the left channel time domain signal of the current frame is marked as xL(n), the right channel time domain signal of the current frame is denoted as xR(N), wherein N is the sample number, N is 0,1, …, N-1.
The time-domain preprocessing may specifically include performing high-pass filtering on the left and right channel time-domain signals of the current frame to obtain the preprocessed left and right channel time-domain signals of the current frame. Recording the time domain signal of the left sound channel after the current frame is preprocessed as
Figure BDA0002113272790000341
Recording the right channel time domain signal after the current frame preprocessing as
Figure BDA0002113272790000342
Wherein N is the sample number, N is 0,1, …, N-1.
It will be appreciated that the time domain pre-processing of the left and right channel time domain signals of the current frame is not necessary. If there is no time domain preprocessing step, the left and right channel signals used for time delay estimation are the left and right channel signals in the original stereo signal. Here, the left and right channel signals in the original stereo signal refer to the acquired PCM signal after a/D conversion. The sampling rate of the signal may include 8KHz, 16KHz, 32KHz, 44.1KHz, and 48 KHz.
In addition, the preprocessing may include other processing, such as pre-emphasis processing, besides the high-pass filtering processing described in this embodiment, which is not limited in this embodiment.
And S22, performing time delay estimation according to the preprocessed left and right channel time domain signals of the current frame to obtain the estimated time delay difference between the channels of the current frame.
Most simply, the cross-correlation function between the left and right channels can be calculated from the pre-processed left and right channel time domain signals of the current frame. Then, the maximum value of the cross-correlation function is searched as the estimated inter-channel delay difference of the current frame.
Let T bemaxCorresponding to the maximum value of the inter-channel delay difference value at the current sampling rate, TminCorresponding to the minimum value of the inter-channel delay difference value at the current sampling rate. T ismaxAnd TminIs a predetermined real number, and Tmax>Tmin. In the present embodiment, TmaxEqual to 40, TminEqual to-40 at Tmin≤i≤TmaxThe maximum value of the cross correlation coefficient c (i) between the left and right channels is searched in the range to obtain the index value corresponding to the maximum value, which is taken as the estimated inter-channel delay difference of the current frame and is recorded as cur _ itd.
Without limitation, in the embodiment of the present application, there are many specific methods for estimating the time delay, for example, the cross-correlation function between the left and right channels may also be calculated according to the left and right channel time domain signals preprocessed by the current frame or according to the left and right channel time domain signals of the current frame. Then, long-term smoothing is performed on the basis of the cross-correlation function between the left and right channels of the previous L frames (L is an integer of 1 or more) and the calculated cross-correlation function between the left and right channels of the current frame to obtain a smoothed cross-correlation function between the left and right channels, and then the cross-correlation function is smoothed at Tmin≤i≤TmaxSearching the maximum value of the cross correlation coefficient between the smoothed left channel and the smoothed right channel in the range to obtain an index value corresponding to the maximum value, and taking the index value as the estimated time delay difference between the channels of the current frame. The method may further include performing inter-frame smoothing on the inter-channel delay difference according to the previous M frames (M is an integer greater than or equal to 1) and the inter-channel delay difference estimated from the current frame, and using the smoothed inter-channel delay difference as the current frameThe final estimated interchannel delay difference for the frame. The embodiment of the present application is not limited to the above-described delay estimation method.
Wherein, the estimated time delay difference of the sound channel of the current frame is obtained by the method at Tmin≤i≤TmaxSearching the maximum value of the cross correlation coefficient c (i) between the left channel and the right channel in the range to obtain an index value corresponding to the maximum value.
And S23, according to the time delay difference between the channels estimated by the current frame, performing time delay alignment processing on the stereo left and right channel signals to obtain a stereo signal after time delay alignment.
For example, according to the time delay difference between the channels estimated by the current frame and the time delay difference between the channels of the previous frame, one or two of the stereo left and right channel signals are compressed or stretched, so that the two channels of signals in the stereo signals after the processing and the time delay alignment do not have the time delay difference between the channels. The embodiment of the present application is not limited to the above-described delay alignment processing method.
The time domain signal of the left channel after the time delay of the current frame is aligned is recorded as x'L(n), the right channel time domain signal after the current frame time delay alignment is recorded as x'R(N), wherein N is the sample number, N is 0,1, …, N-1.
And S24, quantizing and coding the estimated interchannel delay difference of the current frame.
The method for quantizing the inter-channel delay difference may be various, for example, quantizing the inter-channel delay difference estimated from the current frame to obtain a quantization index, and then encoding the quantization index. And writing the quantization index into a code stream after coding.
And S25, calculating a channel combination scale factor according to the stereo signals after time delay alignment, and performing quantization coding, wherein the writing of the quantization coding result into a code stream can be increased.
There are many ways to calculate the channel combination scale factor. Such as the method of calculating the channel combination scale factor in the embodiments of the present application. Firstly, frame energy of a left channel and a right channel is calculated according to time domain signals of the left channel and the right channel after time delay of a current frame is aligned.
The frame energy rms _ L of the left channel of the current frame satisfies:
Figure BDA0002113272790000351
the frame energy rms _ R of the right channel of the current frame satisfies:
Figure BDA0002113272790000352
wherein, x'L(n) is the left channel time domain signal, x 'after the current frame time delay alignment'RAnd (n) is the right channel time domain signal after the current frame time delay is aligned.
Then, according to the frame energy of the left and right channels, the channel combination scale factor of the current frame is calculated.
The calculated sound channel combination scale factor ratio of the current frame meets the following conditions:
Figure BDA0002113272790000353
finally, quantizing the calculated current frame sound channel combination scale factor to obtain a quantization index ratio _ idx corresponding to the scale factor and a quantized current frame sound channel combination scale factor ratioqua
ratioqua=ratio_tabl[ratio_idx],
Wherein, ratio _ table is a code book of scalar quantization. The quantization coding may adopt any scalar quantization method in the embodiment of the present application, such as uniform scalar quantization, or non-uniform scalar quantization, and the number of coding bits may be 5bits, which is not described herein again for specific methods.
The embodiment of the present application is not limited to the above-described channel combination scale factor calculation and quantization encoding method.
And S26, performing time domain down-mixing processing on the time-delay aligned stereo signal according to the channel combination scale factor to obtain a primary channel signal and a secondary channel signal.
Specifically, any one of the time domain downmix processes in the embodiments of the present application may be used. However, it should be noted that a corresponding time-domain downmix processing mode needs to be selected according to a calculation method of the channel combination scaling factor, and the time-domain downmix processing is performed on the time-delay aligned stereo signal to obtain a primary channel signal and a secondary channel signal.
For example, in the above method without calculating the channel combination scale factor in the foregoing step 5, the corresponding time-domain downmix processing may be: and performing time domain down-mixing processing according to the channel combination scale factor ratio, wherein a primary channel signal Y (n) and a secondary channel signal X (n) obtained after the time domain down-mixing processing corresponding to the first channel combination scheme satisfy the following conditions:
Figure BDA0002113272790000361
the embodiments of the present application are not limited to the above-described time-domain downmix processing method.
And S27, carrying out differential coding on the secondary channel signal.
For the content included in step S27, the details of steps S10 to S12 in the foregoing embodiment are described, and are not repeated herein.
As can be seen from the foregoing illustration, in the embodiment of the present application, whether the pitch period of the secondary channel signal is differentially encoded is determined, and by means of the differential encoding, the encoding overhead for the pitch period of the secondary channel signal can be saved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
To facilitate better implementation of the above-described aspects of the embodiments of the present application, the following also provides relevant means for implementing the above-described aspects.
Referring to fig. 10, a stereo encoding apparatus 1000 according to an embodiment of the present application may include: a downmix module 1001, a determination module 1002, a differential encoding module 1003, wherein,
a downmix module 1001, configured to perform downmix processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame;
a determining module 1002, configured to determine whether to differentially encode a pitch period of the secondary channel signal;
a differential coding module 1003, configured to, when it is determined to perform differential coding on the pitch period of the secondary channel signal, perform differential coding on the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, where the pitch period index value of the secondary channel signal is used to generate a stereo coded code stream to be sent.
In some embodiments of the present application, the determining module comprises:
a main sound channel coding module, configured to code the main sound channel signal of the current frame to obtain a pitch period estimation value of the main sound channel signal;
an open-loop analysis module, configured to perform open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal;
a threshold judging module, configured to judge whether a difference between the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal exceeds a preset secondary channel pitch period differential coding threshold, determine to perform differential coding when the difference exceeds the secondary channel pitch period differential coding threshold, and determine not to perform differential coding when the difference does not exceed the secondary channel pitch period differential coding threshold.
In some embodiments of the present application, the stereo encoding apparatus further includes: and an identifier configuration module, configured to configure, when it is determined to perform differential coding on the pitch periods of the secondary channel signals, a secondary channel pitch period differential coding identifier in the current frame as a preset first value, where the stereo coded code stream carries the secondary channel pitch period differential coding identifier, and the first value is used to indicate that the pitch periods of the secondary channel signals are differentially coded.
In some embodiments of the present application, the stereo encoding apparatus further includes: an independent encoding module, wherein,
and the independent coding module is configured to code the pitch lag of the secondary channel signal and the pitch lag of the primary channel signal, respectively, when it is determined that the pitch lag of the secondary channel signal is not differentially coded and the pitch lag estimate of the primary channel signal is not multiplexed as the pitch lag of the secondary channel signal.
Further, in some embodiments of the present application, the identifier configuring module is further configured to configure, when it is determined that the pitch period of the secondary channel signal is not differentially encoded, the secondary channel pitch period differential encoding identifier as a preset second value, where the secondary channel pitch period differential encoding identifier is carried in the stereo encoded code stream, and the second value is used to indicate that the pitch period of the secondary channel signal is not differentially encoded; when determining that the pitch period estimated value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset third value, wherein the stereo coded code stream carries the pitch period multiplexing identifier of the secondary channel signal, and the third value is used for indicating that the pitch period estimated value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal;
the independent coding module is configured to code a pitch period of the secondary channel signal and a pitch period of the primary channel signal, respectively.
In some embodiments of the present application, the identifier configuration module is configured to configure a secondary channel signal pitch period multiplexing identifier as a preset fourth value when it is determined that pitch periods of the secondary channel signal are not differentially encoded and a pitch period estimated value of the primary channel signal is multiplexed as a pitch period of the secondary channel signal, and carry the secondary channel signal pitch period multiplexing identifier in the stereo coded code stream, where the fourth value is used to indicate that a pitch period estimated value of the primary channel signal is multiplexed as a pitch period of the secondary channel signal.
Further, in some embodiments of the present application, the identifier configuring module is configured to configure, when it is determined that the pitch period of the secondary channel signal is not differentially encoded, the secondary channel pitch period differential encoding identifier as a preset second value, where the secondary channel pitch period differential encoding identifier is carried in the stereo encoded code stream, and the second value is used to indicate that the pitch period of the secondary channel signal is not differentially encoded; and when determining that the pitch period estimated value of the multiplexed main channel signal is used as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset fourth value, and carrying the pitch period multiplexing identifier of the secondary channel signal in the stereo coding code stream, wherein the fourth value is used for indicating that the pitch period estimated value of the multiplexed main channel signal is used as the pitch period of the secondary channel signal.
In some embodiments of the present application, the differential encoding module includes:
a closed-loop pitch period searching module, configured to perform closed-loop pitch period search on a secondary channel according to the pitch period estimation value of the primary channel signal, so as to obtain a pitch period estimation value of the secondary channel signal;
an index value upper limit determining module, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
and the index value calculating module is used for calculating the pitch period index value of the secondary channel signal according to the pitch period estimated value of the primary channel signal, the pitch period estimated value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal.
In some embodiments of the present application, the closed-loop pitch period searching module is configured to determine a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimation value of the secondary channel signal.
In some embodiments of the application, the closed loop pitch search module is configured to determine a closed loop pitch integer portion loc _ T0 of the secondary channel signal and a closed loop pitch fractional portion loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal; calculating a closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal by:
f_pitch_prim=loc_T0+loc_frac_prim/N;
wherein the N represents the number of sub-frames into which the secondary channel signal is divided.
In some embodiments of the present application, the index value upper limit determining module is configured to calculate a pitch period index value upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal;
soft_reuse_index_high_limit=0.5+2Z
wherein Z is a pitch period search range adjustment factor of the secondary channel signal, and a value of Z is: 3. or 4, or 5.
In some embodiments of the application, the index value calculation module is configured to determine a closed loop pitch period integer portion loc _ T0 of the secondary channel signal and a closed loop pitch period fractional portion loc _ frac _ prim of the secondary channel signal from the pitch period estimate of the primary channel signal; a pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
wherein, the pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, the pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, the pitch _ reuse _ index _ high _ limit represents an upper pitch period index value limit of the secondary channel signal, the N represents the number of sub-frames into which the secondary channel signal is divided, the M represents an adjustment factor of the upper pitch period index value limit of the secondary channel signal, the M is a non-zero real number, the x represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
In some embodiments of the present application, the stereo encoding apparatus is applied to a stereo encoding scene in which the encoding rate of the current frame is lower than a preset rate threshold;
the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps.
Referring to fig. 11, a stereo decoding apparatus 1100 according to an embodiment of the present application may include: a determination module 1101, a value acquisition module 1102, a differential decoding module 1103, wherein,
a determining module 1101, configured to determine whether to perform differential decoding on a pitch period of the secondary channel signal according to the received stereo coded code stream;
a value obtaining module 1102, configured to obtain, from the stereo coded code stream, a pitch period estimation value of a primary channel signal of a current frame and a pitch period index value of a secondary channel signal of the current frame when it is determined to perform differential decoding on the pitch period of the secondary channel signal;
a differential decoding module 1103, configured to perform differential decoding on the pitch period of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal, so as to obtain a pitch period estimation value of the secondary channel signal, where the pitch period estimation value of the secondary channel signal is used to decode the stereo encoded code stream.
In some embodiments of the present application, the determining module is configured to obtain a secondary channel pitch period differential coding identifier from the current frame; and when the secondary channel pitch period differential coding identifier is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal.
In some embodiments of the present application, the stereo decoding apparatus further includes: a stand-alone decoding module, wherein,
and the independent decoding module is used for decoding the pitch period of the secondary channel signal from the stereo coded code stream when the pitch period of the secondary channel signal is determined not to be differentially decoded and the pitch period estimated value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal.
Further, the independent decoding module is configured to determine that differential decoding is not performed on the pitch period of the secondary channel signal and a pitch period estimation value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal when the secondary channel pitch period differential coding identifier is a preset second value and the secondary channel signal pitch period multiplexing identifier carried in the stereo coding code stream is a preset third value, and decode the pitch period of the secondary channel signal from the stereo coding code stream.
In some embodiments of the present application, the stereo decoding apparatus further includes: a pitch period multiplexing module, wherein,
and the pitch period multiplexing module is configured to use the pitch period estimated value of the primary channel signal as the pitch period of the secondary channel signal when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
Further, the pitch period multiplexing module is configured to determine not to perform differential decoding on the pitch period of the secondary channel signal when the secondary channel pitch period differential coding identifier is a preset second value and the secondary channel signal pitch period multiplexing identifier carried in the stereo coding code stream is a preset fourth value, and use the pitch period estimated value of the primary channel signal as the pitch period of the secondary channel signal.
In some embodiments of the present application, the differential decoding module includes:
a reference value determining submodule, configured to determine a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame;
an index value upper limit determining submodule, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
and the estimated value calculating submodule is used for calculating the pitch period estimated value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel and the pitch period index value upper limit of the secondary channel signal.
In some embodiments of the present application, the estimate calculation submodule is configured to calculate the pitch estimate T0_ pitch of the secondary channel signal by:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
wherein f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of sub-frames into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the/represents a division operator, the + represents an addition operator, and the-represents a subtraction operator.
By way of illustration of the foregoing embodiment, in the embodiment of the present application, since the pitch lag of the secondary channel signal is differentially encoded by using the pitch lag estimated value of the primary channel signal, the pitch lag allocated to the secondary channel signal can be differentially encoded by using a small amount of bit resources, and the spatial perception and the sound image stability of the stereo signal can be improved by differentially encoding the pitch lag of the secondary channel signal. In addition, in the embodiment of the application, the differential coding of the pitch period of the secondary channel signal is performed by using smaller bit resources, so that the saved bit resources can be used for other coding parameters of stereo, the coding efficiency of the secondary channel is further improved, and the overall stereo coding quality is finally improved. In addition, in the embodiment of the present application, when the pitch period of the secondary channel signal can be differentially decoded, the pitch period of the secondary channel signal can be differentially decoded using the pitch period estimation value of the primary channel signal and the pitch period index value of the secondary channel signal, so that the pitch period estimation value of the secondary channel signal is obtained, and the stereo coded stream can be decoded using the pitch period estimation value of the secondary channel signal, so that the spatial impression and the audio-video stability of the stereo signal can be improved.
It should be noted that, because the contents of information interaction, execution process, and the like between the modules/units of the apparatus are based on the same concept as the method embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.
The embodiment of the present application further provides a computer storage medium, where the computer storage medium stores a program, and the program executes some or all of the steps described in the above method embodiments.
Referring to fig. 12, a stereo encoding apparatus 1200 according to another embodiment of the present invention is described below, including:
a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (wherein the number of processors 1203 in the stereo encoding apparatus 1200 may be one or more, and one processor is taken as an example in fig. 12). In some embodiments of the present application, the receiver 1201, the transmitter 1202, the processor 1203 and the memory 1204 may be connected by a bus or other means, wherein fig. 12 illustrates the connection by a bus.
The memory 1204 may include both read-only memory and random access memory, and provides instructions and data to the processor 1203. A portion of the memory 1204 may also include non-volatile random access memory (NVRAM). The memory 1204 stores an operating system and operating instructions, executable modules or data structures, or subsets thereof, or expanded sets thereof, wherein the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
The processor 1203 controls the operation of the stereo encoding apparatus, and the processor 1203 may also be referred to as a Central Processing Unit (CPU). In a specific application, the components of the stereo encoding apparatus are coupled together by a bus system, wherein the bus system may include a power bus, a control bus, a status signal bus, etc. in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiments of the present application may be applied to the processor 1203, or implemented by the processor 1203. The processor 1203 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1203. The processor 1203 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the above method in combination with the hardware thereof.
The receiver 1201 may be used to receive input digital or character information and to generate signal inputs related to the associated setup and function control of the stereo encoding apparatus, the transmitter 1202 may comprise a display device such as a display screen, and the transmitter 1202 may be used to output the digital or character information via an external interface.
In this embodiment, the processor 1203 is configured to execute the stereo encoding method performed by the stereo encoding apparatus shown in fig. 4 in the foregoing embodiment.
Referring to fig. 13, a stereo decoding apparatus 1300 according to another embodiment of the present application is described, including:
a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (wherein the number of the processors 1303 in the stereo decoding apparatus 1300 may be one or more, and one processor is taken as an example in fig. 13). In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected by a bus or other means, wherein fig. 13 illustrates the connection by a bus.
The memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of the memory 1304 may also include NVRAM. The memory 1304 stores an operating system and operating instructions, executable modules or data structures, or subsets thereof, or expanded sets thereof, wherein the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
The processor 1303 controls the operation of the stereo decoding apparatus, and the processor 1303 may also be referred to as a CPU. In a specific application, the components of the stereo decoding apparatus are coupled together by a bus system, wherein the bus system may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiment of the present application may be applied to the processor 1303, or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method may be implemented by hardware integrated logic circuits in the processor 1303 or instructions in the form of software. The processor 1303 described above may be a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads information in the memory 1304 and completes the steps of the method in combination with hardware thereof.
In this embodiment, the processor 1303 is configured to execute the stereo decoding method performed by the stereo decoding apparatus shown in fig. 4 in the foregoing embodiment.
In another possible design, when the stereo encoding apparatus or the stereo decoding apparatus is a chip within a terminal, the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute computer-executable instructions stored by the storage unit to cause a chip within the terminal to perform the wireless communication method of any one of the above first aspects. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the terminal, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
The processor referred to in any above may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the methods of the first or second aspects.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (45)

1. A stereo encoding method, comprising:
performing down-mixing processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame;
when the pitch period of the secondary channel signal is determined to be differentially encoded, differentially encoding the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, wherein the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be transmitted.
2. The method of claim 1, further comprising:
coding the main sound channel signal of the current frame to obtain a pitch period estimated value of the main sound channel signal;
performing open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal;
judging whether the difference value between the pitch period estimated value of the primary sound channel signal and the open-loop pitch period estimated value of the secondary sound channel signal exceeds a preset secondary sound channel pitch period differential coding threshold value or not;
when the difference value exceeds the secondary channel pitch period differential coding threshold value, determining to perform differential coding on the pitch period of the secondary channel signal; or the like, or, alternatively,
determining not to differentially encode a pitch lag of the secondary channel signal when the difference does not exceed the secondary channel pitch lag differential encoding threshold.
3. The method according to claim 1 or 2, wherein when determining to differentially encode a pitch period of the secondary channel signal, the method further comprises:
configuring a secondary channel pitch period differential coding identifier in the current frame as a preset first value, wherein the stereo coding code stream carries the secondary channel pitch period differential coding identifier, and the first value is used for indicating that the pitch period of the secondary channel signal is differentially coded.
4. The method according to any one of claims 1 to 3, further comprising:
when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimate of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal, the pitch period of the secondary channel signal and the pitch period of the primary channel signal are encoded separately.
5. The method according to any one of claims 1 to 3, further comprising:
when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, configuring a pitch period multiplexing identifier of the secondary channel signal as a preset fourth value, and carrying the pitch period multiplexing identifier of the secondary channel signal in the stereo coded code stream, where the fourth value is used to indicate that the pitch period estimation value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
6. The method according to any of claims 1 to 5, wherein said differentially encoding a pitch lag of the secondary channel signal using the pitch lag estimate of the primary channel signal to obtain a pitch lag index value of the secondary channel signal comprises:
performing closed-loop pitch period search of a secondary channel according to the pitch period estimated value of the primary channel signal to obtain the pitch period estimated value of the secondary channel signal;
determining the upper limit of the pitch period index value of the secondary sound channel signal according to the pitch period searching range adjusting factor of the secondary sound channel signal;
and calculating the pitch period index value of the secondary sound channel signal according to the pitch period estimated value of the primary sound channel signal, the pitch period estimated value of the secondary sound channel signal and the pitch period index value upper limit of the secondary sound channel signal.
7. The method according to claim 6, wherein said performing a closed-loop pitch search of a secondary channel based on the pitch estimate of the primary channel signal to obtain the pitch estimate of the secondary channel signal comprises:
determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimated value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame;
and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimation value of the secondary channel signal.
8. The method according to claim 7, wherein said determining a closed-loop pitch reference value of the secondary channel signal based on the pitch estimate of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame comprises:
determining a closed loop pitch integer portion of the secondary channel signal loc _ T0 and a closed loop pitch fractional portion of the secondary channel signal loc _ frac _ prim from the pitch estimate of the primary channel signal;
calculating a closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal by:
f_pitch_prim=loc_T0+loc_frac_prim/N;
wherein the N represents the number of sub-frames into which the secondary channel signal is divided.
9. The method according to claim 6, wherein said determining the pitch lag index upper limit of the secondary channel signal according to the pitch lag search range adjustment factor of the secondary channel signal comprises:
calculating a pitch period index upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal;
soft_reuse_index_high_limit=0.5+2Z
wherein Z is a pitch period search range adjustment factor of the secondary channel signal.
10. The method of claim 9, wherein Z has a value of 3, or 4, or 5.
11. The method according to claim 6, wherein said calculating a pitch period index value of the secondary channel signal based on the pitch period estimate of the primary channel signal, the pitch period estimate of the secondary channel signal and a pitch period index value upper limit of the secondary channel signal comprises:
determining a closed loop pitch integer portion of the secondary channel signal loc _ T0 and a closed loop pitch fractional portion of the secondary channel signal loc _ frac _ prim from the pitch estimate of the primary channel signal;
a pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣
(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
wherein, the pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, the pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, the pitch _ reuse _ index _ high _ limit represents an upper pitch period index value limit of the secondary channel signal, the N represents the number of sub-frames into which the secondary channel signal is divided, the M represents an adjustment factor of the upper pitch period index value limit of the secondary channel signal, the M is a non-zero real number, the x represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
12. The method according to claim 11, wherein the adjustment factor for the upper pitch lag index value limit of the secondary channel signal is 2 or 3.
13. The method according to any one of claims 1 to 12, wherein the method is applied to a stereo coding scene in which the coding rate of the current frame is lower than a preset rate threshold;
the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps.
14. A stereo decoding method, comprising:
determining whether to carry out differential decoding on the fundamental tone period of the secondary sound channel signal according to the received stereo coding code stream;
when the pitch period of the secondary sound channel signal is determined to be differentially decoded, acquiring a pitch period estimated value of a primary sound channel of a current frame and a pitch period index value of a secondary sound channel of the current frame from the stereo coding code stream;
and carrying out differential decoding on the pitch period of the secondary channel signal according to the pitch period estimated value of the primary channel and the pitch period index value of the secondary channel to obtain the pitch period estimated value of the secondary channel signal, wherein the pitch period estimated value of the secondary channel signal is used for decoding the stereo coding code stream.
15. The method of claim 14, wherein determining whether to differentially decode a pitch period of the secondary channel signal based on the received stereo encoded code stream comprises:
acquiring a secondary channel pitch period differential coding identifier from the current frame;
and when the secondary channel pitch period differential coding identifier is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal.
16. The method of claim 15, further comprising:
and when the pitch period of the secondary sound channel signal is determined not to be differentially decoded and the pitch period estimated value of the primary sound channel signal is not multiplexed as the pitch period of the secondary sound channel signal, decoding the pitch period of the secondary sound channel signal from the stereo coding code stream.
17. The method of claim 15, further comprising:
and when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimate of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, taking the pitch period estimate of the primary channel signal as the pitch period of the secondary channel signal.
18. The method according to any of claims 14 to 17, wherein said differentially decoding the pitch lag of the secondary channel signal based on the pitch estimate of the primary channel and the pitch index value of the secondary channel comprises:
determining a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimated value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame;
determining the upper limit of the pitch period index value of the secondary sound channel signal according to the pitch period searching range adjusting factor of the secondary sound channel signal;
and calculating the pitch period estimation value of the secondary sound channel signal according to the closed-loop pitch period reference value of the secondary sound channel signal, the pitch period index value of the secondary sound channel and the pitch period index value upper limit of the secondary sound channel signal.
19. The method according to claim 18, wherein said calculating a pitch lag estimate for the secondary channel signal based on the closed-loop pitch reference value for the secondary channel signal, the pitch index value for the secondary channel signal, and the pitch index upper limit for the secondary channel signal comprises:
the pitch period estimate T0_ pitch of the secondary channel signal is calculated as follows:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
wherein f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of sub-frames into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the/represents a division operator, the + represents an addition operator, and the-represents a subtraction operator.
20. The method according to claim 19, wherein the adjustment factor for the upper pitch lag index value of the secondary channel signal is 2 or 3.
21. A stereo encoding apparatus, comprising:
a down-mixing module, configured to perform down-mixing processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame;
and the differential coding module is used for performing differential coding on the pitch period of the secondary channel signal by using the pitch period estimated value of the primary channel signal when determining that the differential coding is performed on the pitch period of the secondary channel signal, so as to obtain a pitch period index value of the secondary channel signal, wherein the pitch period index value of the secondary channel signal is used for generating a stereo coding code stream to be sent.
22. The apparatus of claim 21, wherein the stereo encoding apparatus further comprises:
a main sound channel coding module, configured to code the main sound channel signal of the current frame to obtain a pitch period estimation value of the main sound channel signal;
an open-loop analysis module, configured to perform open-loop pitch period analysis on the secondary channel signal of the current frame to obtain an open-loop pitch period estimation value of the secondary channel signal;
a threshold judging module, configured to judge whether a difference between the pitch period estimation value of the primary channel signal and the open-loop pitch period estimation value of the secondary channel signal exceeds a preset secondary channel pitch period differential coding threshold, determine to perform differential coding on the pitch period of the secondary channel signal when the difference exceeds the secondary channel pitch period differential coding threshold, and determine not to perform differential coding on the pitch period of the secondary channel signal when the difference does not exceed the secondary channel pitch period differential coding threshold.
23. The apparatus of claim 21 or 22, wherein the stereo encoding apparatus further comprises: and an identifier configuration module, configured to configure, when it is determined to perform differential coding on the pitch periods of the secondary channel signals, a secondary channel pitch period differential coding identifier in the current frame as a preset first value, where the stereo coded code stream carries the secondary channel pitch period differential coding identifier, and the first value is used to indicate that the pitch periods of the secondary channel signals are differentially coded.
24. The apparatus according to any of claims 21-23, wherein the stereo encoding apparatus further comprises: an independent encoding module, wherein,
and the independent coding module is configured to code the pitch lag of the secondary channel signal and the pitch lag of the primary channel signal, respectively, when it is determined that the pitch lag of the secondary channel signal is not differentially coded and the pitch lag estimate of the primary channel signal is not multiplexed as the pitch lag of the secondary channel signal.
25. The apparatus according to any of claims 21-23, wherein the stereo encoding apparatus further comprises: and the identifier configuration module is configured to configure a secondary channel signal pitch period multiplexing identifier as a preset fourth value when it is determined that the pitch period of the secondary channel signal is not differentially encoded and the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal, and carry the secondary channel signal pitch period multiplexing identifier in the stereo encoded code stream, where the fourth value is used to indicate that the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
26. The apparatus according to any one of claims 21 to 25, wherein the differential encoding module comprises:
a closed-loop pitch period searching module, configured to perform closed-loop pitch period search on a secondary channel according to the pitch period estimation value of the primary channel signal, so as to obtain a pitch period estimation value of the secondary channel signal;
an index value upper limit determining module, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
and the index value calculating module is used for calculating the pitch period index value of the secondary channel signal according to the pitch period estimated value of the primary channel signal, the pitch period estimated value of the secondary channel signal and the pitch period index value upper limit of the secondary channel signal.
27. The apparatus of claim 26, wherein the closed-loop pitch search module is configured to determine a closed-loop pitch reference value of the secondary channel signal according to the pitch estimate of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame; and using the closed-loop pitch period reference value of the secondary channel signal as a starting point of the closed-loop pitch period search of the secondary channel signal, and performing the closed-loop pitch period search by adopting integer precision and fractional precision to obtain a pitch period estimation value of the secondary channel signal.
28. The apparatus according to claim 27, wherein said closed loop pitch search module is adapted to determine a closed loop integer part of pitch cycle loc _ T0 of the secondary channel signal and a closed loop fractional part of pitch cycle loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal; calculating a closed-loop pitch period reference value f _ pitch _ prim of the secondary channel signal by:
f_pitch_prim=loc_T0+loc_frac_prim/N;
wherein the N represents the number of sub-frames into which the secondary channel signal is divided.
29. The apparatus according to claim 26, wherein the index value upper limit determining module is configured to calculate the pitch period index value upper limit soft _ reuse _ index _ high _ limit of the secondary channel signal;
soft_reuse_index_high_limit=0.5+2Z
wherein Z is a pitch period search range adjustment factor of the secondary channel signal.
30. The apparatus of claim 29, wherein Z is selected from the group consisting of: 3. or 4, or 5.
31. The apparatus according to claim 26, wherein said index value calculating module is adapted to determine a closed loop pitch integer part loc _ T0 of the secondary channel signal and a closed loop pitch fractional part loc _ frac _ prim of the secondary channel signal based on the pitch estimate of the primary channel signal; a pitch period index value soft _ reuse _ index of the secondary channel signal is calculated as follows:
soft_reuse_index=(N*pitch_soft_reuse+pitch_frac_soft_reuse)﹣
(N*loc_T0+loc_frac_prim)+soft_reuse_index_high_limit/M;
wherein, the pitch _ soft _ reuse represents an integer part of the pitch period estimation value of the secondary channel signal, the pitch _ frac _ soft _ reuse represents a fractional part of the pitch period estimation value of the secondary channel signal, the pitch _ reuse _ index _ high _ limit represents an upper pitch period index value limit of the secondary channel signal, the N represents the number of sub-frames into which the secondary channel signal is divided, the M represents an adjustment factor of the upper pitch period index value limit of the secondary channel signal, the M is a non-zero real number, the x represents a multiplication operator, the + represents an addition operator, and the-represents a subtraction operator.
32. The apparatus of claim 31, wherein the adjustment factor for the upper pitch lag index value of the secondary channel signal is 2 or 3.
33. The apparatus according to any of claims 21 to 32, wherein the stereo encoding apparatus is applied to a stereo encoding scene in which the encoding rate of the current frame is lower than a preset rate threshold;
the rate threshold is at least one of the following values: 13.2 kilobits per second kbps, 16.4kbps, or 24.4 kbps.
34. A stereo decoding apparatus, comprising:
the determining module is used for determining whether to carry out differential decoding on the pitch period of the secondary sound channel signal according to the received stereo coding code stream;
a value obtaining module, configured to obtain, when it is determined to perform differential decoding on the pitch period of the secondary channel signal, a pitch period estimation value of a primary channel of a current frame and a pitch period index value of a secondary channel of the current frame from the stereo coded code stream;
and the differential decoding module is used for carrying out differential decoding on the pitch period of the secondary channel signal according to the pitch period estimated value of the primary channel and the pitch period index value of the secondary channel to obtain the pitch period estimated value of the secondary channel signal, wherein the pitch period estimated value of the secondary channel signal is used for decoding the stereo coding code stream.
35. The apparatus according to claim 34, wherein said determining module is configured to obtain a secondary channel pitch period differential coding flag from said current frame; and when the secondary channel pitch period differential coding identifier is a preset first value, determining to perform differential decoding on the pitch period of the secondary channel signal.
36. The apparatus of claim 35, wherein the stereo decoding apparatus further comprises: a stand-alone decoding module, wherein,
and the independent decoding module is used for decoding the pitch period of the secondary channel signal from the stereo coded code stream when the pitch period of the secondary channel signal is determined not to be differentially decoded and the pitch period estimated value of the primary channel signal is not multiplexed as the pitch period of the secondary channel signal.
37. The apparatus of claim 35, wherein the stereo decoding apparatus further comprises: a pitch period multiplexing module, wherein,
and the pitch period multiplexing module is configured to use the pitch period estimated value of the primary channel signal as the pitch period of the secondary channel signal when it is determined that the pitch period of the secondary channel signal is not differentially decoded and the pitch period estimated value of the primary channel signal is multiplexed as the pitch period of the secondary channel signal.
38. The apparatus of any one of claims 34 to 37, wherein the differential decoding module comprises:
a reference value determining submodule, configured to determine a closed-loop pitch period reference value of the secondary channel signal according to the pitch period estimation value of the primary channel signal and the number of divided subframes of the secondary channel signal of the current frame;
an index value upper limit determining submodule, configured to determine a pitch period index value upper limit of the secondary channel signal according to the pitch period search range adjustment factor of the secondary channel signal;
and the estimated value calculating submodule is used for calculating the pitch period estimated value of the secondary channel signal according to the closed-loop pitch period reference value of the secondary channel signal, the pitch period index value of the secondary channel and the pitch period index value upper limit of the secondary channel signal.
39. The apparatus according to claim 38, wherein said estimate calculation submodule is configured to calculate the pitch estimate T0_ pitch of the secondary channel signal by:
T0_pitch=f_pitch_prim+(soft_reuse_index-soft_reuse_index_high_limit/M)/N;
wherein f _ pitch _ prim represents a closed-loop pitch period reference value of the secondary channel signal, soft _ reuse _ index represents a pitch period index value of the secondary channel signal, N represents the number of sub-frames into which the secondary channel signal is divided, M represents an adjustment factor for an upper limit of the pitch period index value of the secondary channel signal, M is a non-zero real number, the/represents a division operator, the + represents an addition operator, and the-represents a subtraction operator.
40. The apparatus of claim 39, wherein the adjustment factor for the upper pitch lag index value of the secondary channel signal is 2 or 3.
41. Stereo encoding apparatus, comprising at least one processor coupled to a memory, configured to read and execute instructions from the memory to implement a method according to any one of claims 1 to 13.
42. Stereo encoding apparatus as defined in claim 41, wherein the stereo encoding apparatus further comprises: the memory.
43. Stereo decoding apparatus, comprising at least one processor coupled to a memory, configured to read and execute instructions in the memory to implement a method according to any of claims 14 to 20.
44. The stereo decoding apparatus of claim 43, wherein the stereo decoding apparatus further comprises: the memory.
45. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 13, or 14 to 20.
CN201910581398.5A 2019-06-29 2019-06-29 Stereo coding method, stereo decoding method and device Pending CN112233682A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201910581398.5A CN112233682A (en) 2019-06-29 2019-06-29 Stereo coding method, stereo decoding method and device
PCT/CN2020/096296 WO2021000723A1 (en) 2019-06-29 2020-06-16 Stereo encoding method, stereo decoding method and devices
EP20835190.8A EP3975175A4 (en) 2019-06-29 2020-06-16 Stereo encoding method, stereo decoding method and devices
JP2021577947A JP7337966B2 (en) 2019-06-29 2020-06-16 Stereo encoding method and apparatus, and stereo decoding method and apparatus
US17/563,538 US20220122619A1 (en) 2019-06-29 2021-12-28 Stereo Encoding Method and Apparatus, and Stereo Decoding Method and Apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910581398.5A CN112233682A (en) 2019-06-29 2019-06-29 Stereo coding method, stereo decoding method and device

Publications (1)

Publication Number Publication Date
CN112233682A true CN112233682A (en) 2021-01-15

Family

ID=74101099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910581398.5A Pending CN112233682A (en) 2019-06-29 2019-06-29 Stereo coding method, stereo decoding method and device

Country Status (5)

Country Link
US (1) US20220122619A1 (en)
EP (1) EP3975175A4 (en)
JP (1) JP7337966B2 (en)
CN (1) CN112233682A (en)
WO (1) WO2021000723A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151045A (en) * 2019-06-29 2020-12-29 华为技术有限公司 Stereo coding method, stereo decoding method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029304A1 (en) * 2009-08-03 2011-02-03 Broadcom Corporation Hybrid instantaneous/differential pitch period coding
US20130262130A1 (en) * 2010-10-22 2013-10-03 France Telecom Stereo parametric coding/decoding for channels in phase opposition
CN107592937A (en) * 2015-03-09 2018-01-16 弗劳恩霍夫应用研究促进协会 For the apparatus and method for being encoded or being decoded to multi-channel signal
CN107731238A (en) * 2016-08-10 2018-02-23 华为技术有限公司 The coding method of multi-channel signal and encoder
CN108352162A (en) * 2015-09-25 2018-07-31 沃伊斯亚吉公司 For using the coding parameter encoded stereo voice signal of main sound channel to encode the method and system of auxiliary sound channel
CN112151045A (en) * 2019-06-29 2020-12-29 华为技术有限公司 Stereo coding method, stereo decoding method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE519985C2 (en) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Coding and decoding of signals from multiple channels
JP3453116B2 (en) * 2000-09-26 2003-10-06 パナソニック モバイルコミュニケーションズ株式会社 Audio encoding method and apparatus
US6584437B2 (en) * 2001-06-11 2003-06-24 Nokia Mobile Phones Ltd. Method and apparatus for coding successive pitch periods in speech signal
SE527670C2 (en) * 2003-12-19 2006-05-09 Ericsson Telefon Ab L M Natural fidelity optimized coding with variable frame length
BRPI0516201A (en) * 2004-09-28 2008-08-26 Matsushita Electric Ind Co Ltd scalable coding apparatus and scalable coding method
CN101069232A (en) * 2004-11-30 2007-11-07 松下电器产业株式会社 Stereo encoding apparatus, stereo decoding apparatus, and their methods
CN101427307B (en) * 2005-09-27 2012-03-07 Lg电子株式会社 Method and apparatus for encoding/decoding multi-channel audio signal
US8090587B2 (en) * 2005-09-27 2012-01-03 Lg Electronics Inc. Method and apparatus for encoding/decoding multi-channel audio signal
EP2264698A4 (en) * 2008-04-04 2012-06-13 Panasonic Corp Stereo signal converter, stereo signal reverse converter, and methods for both
CN110853659B (en) * 2014-03-28 2024-01-05 三星电子株式会社 Quantization apparatus for encoding an audio signal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029304A1 (en) * 2009-08-03 2011-02-03 Broadcom Corporation Hybrid instantaneous/differential pitch period coding
US20130262130A1 (en) * 2010-10-22 2013-10-03 France Telecom Stereo parametric coding/decoding for channels in phase opposition
CN107592937A (en) * 2015-03-09 2018-01-16 弗劳恩霍夫应用研究促进协会 For the apparatus and method for being encoded or being decoded to multi-channel signal
CN108352162A (en) * 2015-09-25 2018-07-31 沃伊斯亚吉公司 For using the coding parameter encoded stereo voice signal of main sound channel to encode the method and system of auxiliary sound channel
CN108352164A (en) * 2015-09-25 2018-07-31 沃伊斯亚吉公司 The method and system using the long-term relevant difference between the sound channel of left and right for auxiliary sound channel of advocating peace will be mixed under stereo signal time domain
US20180233154A1 (en) * 2015-09-25 2018-08-16 Voiceage Corporation Method and system for encoding left and right channels of a stereo sound signal selecting between two and four sub-frames models depending on the bit budget
CN107731238A (en) * 2016-08-10 2018-02-23 华为技术有限公司 The coding method of multi-channel signal and encoder
CN112151045A (en) * 2019-06-29 2020-12-29 华为技术有限公司 Stereo coding method, stereo decoding method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵毅: "《空间音频编码及多声道音频恢复技术研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 136 - 135 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151045A (en) * 2019-06-29 2020-12-29 华为技术有限公司 Stereo coding method, stereo decoding method and device
US11887607B2 (en) 2019-06-29 2024-01-30 Huawei Technologies Co., Ltd. Stereo encoding method and apparatus, and stereo decoding method and apparatus

Also Published As

Publication number Publication date
JP2022539571A (en) 2022-09-12
EP3975175A4 (en) 2022-07-20
JP7337966B2 (en) 2023-09-04
US20220122619A1 (en) 2022-04-21
WO2021000723A1 (en) 2021-01-07
EP3975175A1 (en) 2022-03-30

Similar Documents

Publication Publication Date Title
JP7124170B2 (en) Method and system for encoding a stereo audio signal using coding parameters of a primary channel to encode a secondary channel
JP6641018B2 (en) Apparatus and method for estimating time difference between channels
US11664034B2 (en) Optimized coding and decoding of spatialization information for the parametric coding and decoding of a multichannel audio signal
KR101452722B1 (en) Method and apparatus for encoding and decoding signal
EP3776541B1 (en) Apparatus, method or computer program for estimating an inter-channel time difference
US20190013031A1 (en) Audio object separation from mixture signal using object-specific time/frequency resolutions
CN103329197A (en) Improved stereo parametric encoding/decoding for channels in phase opposition
US11341975B2 (en) Apparatus for encoding or decoding an encoded multichannel signal using a filling signal generated by a broad band filter
US11640825B2 (en) Time-domain stereo encoding and decoding method and related product
US11120807B2 (en) Method for determining audio coding/decoding mode and related product
CN110556118B (en) Coding method and device for stereo signal
US11900952B2 (en) Time-domain stereo encoding and decoding method and related product
EP2212883B1 (en) An encoder
WO2021000723A1 (en) Stereo encoding method, stereo decoding method and devices
CN110556117B (en) Coding method and device for stereo signal
US11887607B2 (en) Stereo encoding method and apparatus, and stereo decoding method and apparatus
EP3657498A1 (en) Coding method for time-domain stereo parameter, and related product
US20230051420A1 (en) Switching between stereo coding modes in a multichannel sound codec

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination