WO2006059567A1 - ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 - Google Patents
ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 Download PDFInfo
- Publication number
- WO2006059567A1 WO2006059567A1 PCT/JP2005/021800 JP2005021800W WO2006059567A1 WO 2006059567 A1 WO2006059567 A1 WO 2006059567A1 JP 2005021800 W JP2005021800 W JP 2005021800W WO 2006059567 A1 WO2006059567 A1 WO 2006059567A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel signal
- signal
- spatial information
- encoding
- channel
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 26
- 238000012937 correction Methods 0.000 claims description 26
- 230000003044 adaptive effect Effects 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 16
- 238000004891 communication Methods 0.000 claims description 13
- 238000013459 approach Methods 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 29
- 230000015556 catabolic process Effects 0.000 abstract 1
- 238000006731 degradation reaction Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 18
- 238000013139 quantization Methods 0.000 description 16
- 230000005236 sound signal Effects 0.000 description 16
- 239000013598 vector Substances 0.000 description 15
- 230000010365 information processing Effects 0.000 description 13
- 230000015572 biosynthetic process Effects 0.000 description 10
- 238000003786 synthesis reaction Methods 0.000 description 10
- 230000005284 excitation Effects 0.000 description 9
- 230000006872 improvement Effects 0.000 description 8
- 238000010295 mobile communication Methods 0.000 description 5
- 239000002131 composite material Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000003111 delayed effect Effects 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Definitions
- the present invention relates to a stereo encoding device, a stereo decoding device, and methods thereof.
- the present invention relates to a stereo coding apparatus that performs coding on a stereo signal, a stereo decoding apparatus corresponding to the apparatus, and a method thereof.
- a transmission signal is encoded in advance to reduce the bit rate of transmission information. It is generally performed.
- Non-Patent Document 1 has an adaptive codebook, a fixed codebook, etc. for each of two channels of audio signals, and each channel has a separate codebook.
- separate drive sound source signals are generated to generate composite signals. That is, the CELP code of the audio signal is performed for each channel, and the obtained code key information of each channel is output to the decoding side. Therefore, code key information is generated as many as the number of channels, and there is a problem that the amount of encoded information (code key bit rate) increases.
- an object of the present invention is to provide a stereo coding apparatus, a stereo decoding apparatus, and a stereo coding apparatus that can reduce the amount of coded information (code bit rate) while preventing deterioration in sound quality of a decoded signal. It is to provide these methods.
- the stereo coding apparatus corrects the similarity between the first channel signal and the second channel signal by correcting one or both of the first channel signal and the second channel signal.
- a second sign means is also included in the stereo coding apparatus.
- the present invention it is possible to reduce the amount of encoded information (encoded bit rate) while preventing deterioration of the sound quality of the decoded signal.
- FIG. 1 is a functional block diagram of a stereo encoding device according to Embodiment 1.
- FIG. 2 A diagram showing an example of the waveform spectrum of a signal obtained by collecting sounds of the same source power at different positions
- FIG. 3 is a functional block diagram of the stereo decoding device according to Embodiment 1.
- FIG. 4 is a block diagram showing the main configuration of the stereo speech coding apparatus according to Embodiment 1.
- FIG. 5 is a block diagram showing the main configuration inside the speech code key section according to Embodiment 1.
- FIG. 6 is a block diagram showing the main configuration inside the spatial information processing section according to Embodiment 1.
- FIG. 7 is a block diagram showing a main configuration of a stereo speech coding apparatus according to Embodiment 2.
- FIG. 8 is a block diagram showing a major configuration of a speech coding section according to Embodiment 3.
- FIG. 9 is a block diagram showing the main configuration inside the spatial information assigning section according to the third embodiment.
- FIG. 1 is a diagram for explaining the concept of the stereo code encoding method according to Embodiment 1 of the present invention, that is, a functional block diagram of the stereo code encoding apparatus according to the present embodiment.
- the stereo coding apparatus first, the difference in characteristics between the L channel signal and the R channel signal of the stereo signal to be encoded is corrected. By this correction processing, the similarity between both channel signals is improved. Then, in the subsequent sign key processing, both channel signals after correction are encoded using a single sound source common to both channels, and a single sign key parameter (a single sound source In this case, a set of code parameters is obtained. Since both channel signals are very similar to each other, they can be encoded using a sound source common to both channels.
- the signal waveform exhibits different characteristics depending on the position where the microphone is placed, that is, the sound collection position.
- the energy of the stereo signal is attenuated according to the distance from the source, and the arrival time is also delayed, and the waveform spectrum varies depending on the sound collection position. In this way, stereo signals are greatly affected by spatial factors such as the sound collection environment.
- Fig. 2 shows the signals (L channel signals S, R collected from two different positions of the same source power). An example of the waveform spectrum of the channel signal s) is shown.
- the L channel signal and the R channel signal exhibit different characteristics.
- This phenomenon with different characteristics is the result of sound being collected by a sound collection device such as a microphone after a new spatial characteristic that differs depending on the sound collection position is added to the waveform of the original signal. Can be caught.
- This characteristic is called spatial information in this specification. For example, in the example of FIG.
- the L channel signal S ′ is generated from the same source.
- the difference in the characteristics of the L channel signal and the R channel signal can be corrected by eliminating the difference in the spatial information of both channels.
- the waveforms of the two channel signals are brought close to each other.
- the sound source used in the encoding process can be shared, and a single (or a set) of encoding parameters can be used without generating the encoding parameters for both channel signals.
- the sign key parameter By generating the sign key parameter, the sign key information with high accuracy can be obtained.
- the spatial information is information regarding the space between the sound generation source and each sound collecting device. For example, since the amplitude and phase of each channel signal change depending on the position of the sound collector, each channel signal contains information about the space from the sound source power to the sound collector. Can think. With this spatial information, the stereo signal spreads to the human sense of hearing. The same can be considered for each channel. For example, it can be considered that the L channel signal includes information on the space between the sound collecting devices of the L channel and the R channel. Therefore, by manipulating the spatial information contained in each channel signal, each channel signal is made similar to each other, each channel signal is made similar to a sound source signal, or each channel signal is converted to a signal of a certain virtual channel.
- the sound source can be shared between the L channel signal and the R channel signal. Therefore, as a correction for the L channel signal and the R channel signal, it is possible to improve the similarity between both channels by correcting other characteristics than just the spatial information by correcting the spatial information.
- Spatial information analysis section 101, similarity improvement section 102, and channel signal encoding section 103 shown in FIG. 1 realize the above processing by performing the following operations.
- Spatial information analysis unit 101 includes each space of L channel signal (S) and R channel signal (S).
- the information is analyzed, and the analysis result is output to the similarity improvement unit 102 and the spatial information encoding unit 104.
- the similarity improvement unit 102 corrects the difference between the spatial information of the L channel signal and the R channel signal according to the analysis result output from the spatial information analysis unit 101, thereby calculating the similarity of the L channel signal and the R channel signal. To improve.
- the similarity improvement unit 102 checks the L channel signal (S ") and the R channel signal (S”) after improving the similarity.
- the signal is output to the channel signal code section 103.
- the channel signal code key 103 uses this sound source common to both channels to
- R is encoded, and a set of obtained encoding information (channel signal code key parameter) is output.
- the spatial information encoding unit 104 outputs the analysis result of the spatial information output from the spatial information analysis unit 101. Codes are obtained and the obtained code information (spatial information coding parameters) is output.
- the above two signals (S ", S") are output from the similarity improvement unit 102.
- one of the waveforms of S and S is compensated to be close to the other waveform.
- the output is S ", S. That is, S does not pass through the similarity improvement unit 102, and the channel signal
- FIG. 3 is a functional block diagram of the stereo decoding apparatus according to the present embodiment corresponding to the above stereo encoding apparatus.
- Spatial information decoding section 151 decodes the spatial information coding parameter and outputs the obtained spatial information to channel signal restoration section 153.
- the channel signal decoding unit 152 decodes the channel signal encoding parameter to obtain a specific channel signal.
- This channel signal is a channel signal in which the spatial information of both channels of the L channel signal and the R channel signal is corrected, and the similarity between both channels common to the L channel and the R channel is increased.
- the channel signal common to the L channel and the R channel is output to the channel signal restoration unit 153.
- Channel signal restoration section 153 restores the channel signal output from channel signal decoding section 152 into an L channel signal and an R channel signal using the spatial information output from spatial information decoding section 151, and outputs the L channel signal and the R channel signal.
- each channel signal of the stereo signal is corrected to improve the similarity of each channel signal, and then Since the channel signal is encoded using a sound source common to each channel, the amount of code information (code key rate) can be reduced. Also, since the encoding side encodes and outputs the difference in spatial information of each channel, the decoding side can accurately reproduce each channel signal using this.
- the spatial information is completely removed from both channel signals, and the L channel signal and the R channel signal are returned to the sound source signal (the sound signal generated by the sound source)
- the arithmetic mean [(L + R) Z2] of the L channel signal and the R channel signal is simulated as a monaural signal, and both channel signal forces are converted into pseudo monaural signals by removing predetermined spatial information respectively.
- one of the L channel signal and the R channel signal is a main channel signal
- the other is a sub channel signal
- the sub channel signal power is predetermined.
- this signal resembles the main channel signal. Since the encoding device acquires both the L channel signal and the R channel signal, by comparing and analyzing both channel signals, the predetermined spatial information, that is, the L channel signal and the R channel signal described above can be obtained. Difference in spatial information can be obtained
- FIG. 4 is a block diagram showing the main configuration of the stereo speech coding apparatus according to the present embodiment, that is, the stereo speech coding apparatus that embodies the concept of the stereo coding method shown in FIG. It is.
- the first channel audio signal and the second channel audio signal shown below refer to the L channel audio signal and the R channel audio signal, respectively, or the audio signal of the opposite channel.
- Stereo speech coding apparatus includes speech coding section 100, MC selection section.
- the voice code key unit 100 has a configuration corresponding to the entire functional block shown in FIG.
- MC selection section 105 uses one of the input first channel audio signal and second channel audio signal as a main channel and the other as a sub channel, and uses the main channel signal (MC) and sub channel signal (SC) as audio. Output to sign part 100.
- Speech code key unit 100 first compares and analyzes the main channel signal and the subchannel signal. Then, the difference between the spatial information of both channels is obtained. Next, speech coding section 100 removes the obtained spatial information difference from the subchannel signal to make it similar to the main channel signal, and then uses the common sound source for both channels to generate the main channel signal and the main channel signal. It performs sub-channel signal coding similar to the channel signal and outputs the resulting coding information (channel signal coding parameters). Speech encoding section 100 also encodes the obtained difference in spatial information, and outputs this code information (spatial information encoding parameter).
- MC selection information encoding section 106 encodes MC selection information indicating which channel is the main channel in MC selection section 105, and this code selection information (MC selection information code key parameter). ) Is output.
- the MC selection information coding parameter is transmitted to the decoding apparatus as codeh information together with the channel signal coding parameter and the spatial information code key parameter generated by the speech coding unit 100.
- FIG. 5 is a block diagram showing a main configuration inside speech code key unit 100 described above.
- a case where CELP code key is used as a voice signal code key method will be described as an example.
- Voice coding unit 100 is roughly divided into MC coding unit 110-1 for coding the main channel signal (MC) and SC coding unit 110 for coding the subchannel signal (SC). 2.
- a spatial information processing unit 123, and an adaptive codebook, a fixed codebook, and the like common to both channels are provided.
- the spatial information processing unit 123 corresponds to the spatial information analysis unit 101, the similarity improvement unit 102, and the spatial information encoding unit 104 among the functional blocks shown in FIG.
- the MC code key unit 110-1 and the SC code key unit 110-2 have the same basic internal configuration although the signals to be encoded are different. Therefore, for the same components, numbers 1 and 2 indicating the MC encoding unit 110-1 and SC encoding unit 110-2 are attached to the same code after the hyphen. Only the configuration on the MC code key section 110-1 side will be described, and the description on the SC code key section 110-2 side will be basically omitted.
- the sound source encoding unit 100 encodes the main channel signal and the subchannel signal including the vocal tract information and the sound source information by obtaining an LPC parameter (linear prediction coefficient) for the vocal tract information, For sound source information, which of the pre-stored voice models The sound source information is encoded by obtaining an index specifying whether to use, that is, an index specifying what excitation vector is generated in the adaptive codebook 117 and the fixed codebook 118.
- LPC parameter linear prediction coefficient
- each unit of speech encoding unit 100 performs the following operation.
- the LPC analysis unit 111-1 performs linear prediction analysis on the main channel signal to obtain an LPC parameter that is spectral envelope information, and an LPC quantization unit 112-1 and an auditory weighting unit 115. — Output to 1.
- the LPC analysis unit 111-2 of the SC code unit 110-2 performs the above processing on the subchannel signal that has been subjected to the predetermined processing by the spatial information processing unit 123. The processing of the spatial information processing unit 123 will be described later.
- the LPC quantization unit 112-1 quantizes the LPC parameter obtained by the LPC analysis unit 111-1, outputs the obtained quantized LPC parameter to the LPC synthesis filter 113-1, and indexes the quantized LPC parameter. (LPC quantum key index) is output as a sign key parameter.
- adaptive codebook 117 includes LPC synthesis filter 113-1 and LPC synthesis filter 113.
- This sound source vector is output to multiplier 120 as an adaptive codebook vector.
- Fixed codebook 118 stores a plurality of excitation vectors having a predetermined shape in advance, and uses the excitation vector corresponding to the index specified by distortion minimizing section 116 as a fixed codebook vector. Output to 121.
- the adaptive codebook 117 is used to express components with strong periodicity like voiced sound, while the fixed codebook 118 is weak with periodicity like white noise. Used to represent a component.
- Gain codebook 119 is output from adaptive codebook vector gain (adaptive codebook gain) output from adaptive codebook 117 and from fixed codebook 118 in accordance with instructions from distortion minimizing section 116.
- a fixed codebook vector gain (fixed codebook gain) is generated and output to multipliers 120 and 121, respectively.
- Multiplier 120 multiplies the adaptive codebook gain output from gain codebook 119 by the adaptive codebook vector output from adaptive codebook 117 and outputs the result to adder 122.
- Multiplier 121 multiplies the fixed codebook gain output from gain codebook 119 by the fixed codebook vector output from fixed codebook 118 and outputs the result to adder 122.
- Adder 122 adds the adaptive codebook vector output from multiplier 120 and the fixed codebook vector output from multiplier 121, and performs LPC synthesis using the added excitation vector as the drive excitation Output to filter 113-1 and LPC synthesis filter 113-2.
- LPC synthesis filter 113-1 uses the quantized LPC meter output from LPC quantization section 112-1 as a filter coefficient, and uses the excitation vector generated by adaptive codebook 117 and fixed codebook 118 as the driving excitation.
- the synthesized signal is generated using the filter function, that is, the LPC synthesis filter. This synthesized signal is output to adder 1141.
- Adder 114-1 subtracts the error signal by subtracting the synthesized signal generated by LPC synthesis filter 113-1 from the main channel signal (subchannel signal after removing spatial information in adder 114-2). And the error signal is output to the perceptual weighting unit 115-1. This error signal corresponds to sign distortion.
- the perceptual weighting unit 115-1 uses a perceptual weighting filter that uses the LPC parameter output from the LPC analysis unit 111-1 as a filter coefficient, and performs coding distortion output from the adder 114-1. Aural weighting is applied and output to distortion minimizing section 116.
- the distortion minimizing unit 116 considers both the coding distortions output from the perceptual weighting unit 115-1 and perceptual weighting unit 115-2, and minimizes the sum of the coding distortions of both.
- Each index (codebook index) of adaptive codebook 117, fixed codebook 118, and gain codebook 119 is obtained for each subframe, and these indexes are output as code key information.
- the coding distortion is represented by the square of the difference between the original signal to be coded and the synthesized signal. Therefore, when the code I ⁇ only output from perceptual weighting section 115- 1 and a 2, a reference numeral I ⁇ only output from perceptual weighting section 115- 2, b 2, distortion minimizing section 116, these codes
- Each index (codebook index) of adaptive codebook 117, fixed codebook 118, and gain codebook 119 that minimizes the sum of the distortions a 2 + b 2 is obtained.
- a series of processes for generating a composite signal based on the adaptive codebook 117 and the fixed codebook 118 and obtaining the coding distortion of this signal is a closed loop (feedback loop), and the distortion is minimized.
- the encoding unit 116 searches each codebook by changing the index instructed to each codebook in one subframe, and minimizes the code distortion of both channels finally obtained.
- the index of each codebook to be output is output.
- the driving sound source when the code distortion is minimized is fed back to the adaptive codebook 117 for each subframe.
- Adaptive codebook 117 updates the stored driving sound source by this feedback.
- FIG. 6 is a block diagram showing a main configuration inside the spatial information processing unit 123. Both the main channel signal and the subchannel signal are input to the spatial information processing unit 123.
- Spatial information analysis section 131 compares the main channel signal and the subchannel signal to obtain a difference in spatial information between the two channel signals, and the obtained analysis result is sent to spatial information quantization section 132. Output.
- Spatial information quantization section 132 quantizes the difference between the spatial information of both channels obtained by spatial information analysis section 131, and obtains the spatial information coding parameters (spatial information quantum Index). Further, the spatial information quantization unit 132 performs inverse quantization on the spatial information quantization index obtained by quantizing the spatial information difference of both channels obtained by the spatial information analysis unit 131 to provide the spatial information removal unit 133. Output.
- Spatial information removing section 133 is the inverse-quantized spatial information quantization index output from spatial information quantization section 132, that is, the difference between the spatial information of both channels obtained by spatial information analysis section 131.
- the sub-channel signal is converted into a signal similar to the main channel signal by subtracting the input sub-channel signal power from the signal quantized and de-quantized.
- the subchannel signal from which the spatial information has been removed is output to the LPC analyzer 111-2.
- Spatial information analysis section 131 calculates an energy ratio in units of frames between two channels.
- C energy E in one frame of the main channel signal and the subchannel signal E
- ⁇ is a sample number
- FL is the number of samples (frame length) in one frame.
- X ( ⁇ ) and X ( ⁇ ) are the main channel signal and subchannel signal ⁇ , respectively.
- the spatial information analysis unit 131 obtains the square root C of the energy ratio of the main channel signal and the subchannel signal according to the following equation (3).
- the spatial information analysis unit 131 calculates the delay time difference, which is the amount of time lag of the signal between the two channels with respect to the main channel signal of the subchannel signal, as follows, between the two channel signals. Is determined as the value that gives the highest value. Specifically, the cross-correlation function ⁇ of the main channel signal and subchannel signal is obtained according to the following equation (4).
- the energy ratio and the delay time difference may be obtained by the following equation (5).
- equation (5) the main channel signal and the spatial information for the main channel signal are removed. Find the square root C of the energy ratio and the delay time difference m to minimize the error D between the subchannel signal and the subchannel signal.
- Spatial information quantization section 132 quantizes C and m with a predetermined number of bits, and sets the quantized values to C and M, respectively.
- Spatial information removing section 133 removes spatial information from the subchannel signal according to the following conversion equation (6).
- xsc (n) C Q -x sc (n -M Q )... (6)
- each channel signal is increased by correcting each channel signal of the stereo signal, and then each channel signal is used as a common sound source for each channel. Therefore, the amount of code information (code bit rate) can be reduced.
- each channel signal is encoded using a common sound source, it is not necessary to install two sets of an adaptive codebook, a fixed codebook, and a gain codebook for each channel 1 A sound source can be generated for each codebook in the set. In other words, the circuit scale can be reduced.
- distortion minimizing section 116 considers not only the main channel signal but also the sub-channel signal, and performs control so that the code distortion of both channels is minimized. Therefore, the code performance is improved and the sound quality of the decoded signal can be improved.
- the case where the CELP code is used for the code of the stereo audio signal is described as an example. However, the similarity between the L channel signal and the R channel signal is shown. If the correction can be increased and the state can be effectively simulated as a single channel signal, the amount of code information can be reduced. The encoding method does not have to have information as a codebook.
- the case where both of the two parameters such as the energy ratio between two channels and the delay time difference are used as the spatial information has been described as an example. However, as the spatial information, one of the two parameters is shifted. It is okay to use only parameters! When only one parameter is used, the effect of improving the similarity between the two channels is reduced compared to when two parameters are used, but conversely, the number of code bits can be further reduced. effective.
- FIG. 7 is a block diagram showing the main configuration of the stereo speech coding apparatus according to Embodiment 2 of the present invention.
- This stereo speech coding apparatus has the same basic configuration as the stereo speech coding apparatus shown in Embodiment 1 (see FIG. 4), and the same components are assigned the same reference numerals. The description is omitted.
- the stereo speech coding apparatus calculates the energy of speech signals of both the first channel and the second channel, and selects the channel with the larger energy as the main channel.
- the energy is calculated for each frame, and the main channel is selected for each frame. Details will be described below.
- the energy calculation unit 201 outputs the first channel audio signal and the second channel audio signal. Obtain energy E and E for each frame according to the following formulas (9) and (10), and M
- MC selecting section 105a determines which one of the first channel audio signal and the second channel audio signal is the main channel signal. Specifically, the energy of two channel frames E and E are compared, and the channel with the larger energy is compared.
- the smaller channel is the subchannel. That is, in the condition shown by the following equation (11), the first channel is the main channel and the second channel is the subchannel.
- the second channel is the main channel and the first channel is the sub-channel.
- the MC selection information code unit 106 selects the main channel and the sub channel for each frame, and therefore, information indicating whether the channel power of V or a deviation is selected as the main channel (MC selection information). Is encoded.
- the energy for each frame of both channels is calculated, and the channel with the higher energy is selected as the main channel.
- the code error can be reduced by setting the channel with the larger amount of information as the main channel.
- the energy of each channel is calculated and used as a reference.
- the present invention is not limited to this.
- a value obtained by smoothing the energy is used. You may do it.
- the smoothed energies E and E are obtained using the following equations (13) and (14).
- ⁇ and ⁇ are constants that satisfy the following equation (15).
- the actual code key target of SC code key section 110-2 is a subchannel signal after spatial information is removed by spatial information processing section 123.
- SC code section 110-2 also generates a composite signal for the subchannel signal power after spatial information removal, and between this composite signal and the original subchannel signal after spatial information removal.
- Encoding is performed by turning an optimization loop of encoding parameters so that the code distortion is minimized.
- the subchannel signal after removal of the spatial information is to be encoded, and the subchannel signal after removal of the spatial information is the target of the code processing.
- the sign is used as a signal.
- the subchannel signal before the spatial information is removed that is, the subchannel signal that still contains the spatial information is added to the subchannel code.
- the sign is used as a processing target signal.
- the basic configuration of the stereo speech coding apparatus according to the present embodiment is the same as in the embodiment. Since this is the same as the stereo speech coding apparatus shown in FIG. 1 (see FIG. 4), the description thereof is omitted, and the configuration is different from the speech coding unit 100 shown in Embodiment 1 (see FIG. 5).
- the speech encoding unit 300 will be described below.
- FIG. 8 is a block diagram showing the main configuration of speech code key unit 300 described above.
- the same components as those of the speech code key unit 100 shown in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.
- SC coding section 310 adds spatial information again to spatial information adding section 301 to the subchannel signal from which spatial information has been removed by spatial information processing section 123. That is, the spatial information adding unit 301 receives spatial information about the subchannel signal from the spatial information processing unit 123, adds this to the synthesized signal output from the LPC synthesis filter 113-2, and adds the adder 114. Output to 2.
- Adder 114-2 calculates the sign distortion by subtracting the subchannel signal power from the combined signal after spatial information output outputted from spatial information adding section 301, and perceptually weights this encoded distortion. Output to the distortion minimizing unit 116 via the unit 115-2.
- the distortion minimizing unit 116 minimizes the sum of the encoding distortions of the encoding distortions output from the MC encoding unit 110-1 and the SC encoding unit 310.
- the index of each codebook is obtained for each subframe, and these indexes are output as sign key information.
- SC code section 310 performs LPC separately from LPC analysis section 111-2 in order to perform auditory weighting on the subchannel signal using the LPC coefficient generated based on the subchannel signal.
- An analysis unit 302 is provided.
- the LPC analysis unit 302 performs LPC analysis using the subchannel signal as input, and obtains the LP
- the C coefficient is output to the auditory weighting unit 115-2.
- the perceptual weighting section 115-2 performs perceptual weighting on the sign distortion that is output from the adder 114-2 using the LPC coefficient output from the LPC analysis section 302.
- FIG. 9 is a block diagram showing the main configuration inside the spatial information adding unit 301.
- Spatial information inverse quantization section 321 inversely quantizes the spatial information quantum index output from spatial information processing section 123, and subspace signals corresponding to the main channel signal The information difference is output to the spatial information decoding unit 322.
- Spatial information decoding section 322 applies the difference in spatial information output from spatial information inverse quantization section 321 to the synthesized signal output from LPC synthesis filter 113-2, and adds the spatial information A synthesized signal is generated and output to the adder 1142.
- the spatial information quantization index which is the energy ratio and the quantization value of the delay time difference, respectively.
- the spatial information inverse quantization unit 321 is the main channel of the subchannel signal.
- Spatial information decoding section 322 obtains a subchannel signal after provision of spatial information according to the following equation (16).
- the subchannel signal after the provision of the spatial information is obtained by the following equation (17).
- the subchannel signal after the provision of the spatial information is obtained by the following equation (18).
- the subchannel signal before the spatial information is removed is used as the target signal of the code key processing, so that the code key is processed.
- the sign key performance can be further improved over the first and second embodiments for the following reason.
- the subchannel signal after spatial information removal is set as an actual encoding target, and this code distortion is minimized.
- the signal to be finally output as a decoded signal is a subchannel signal, not a subchannel signal after removal of spatial information.
- the subchannel signal after the removal of the spatial information is used as the target signal for the code processing, there is a possibility that the coding distortion included in the subchannel signal that is the final decoded signal is not sufficiently minimized.
- the sign-sign distortion of the sub-channel signal input to the distortion minimizing unit 116 may cause an energy difference from the main channel signal.
- the subchannel signal itself from which spatial information has not been removed is the target of encoding, and the minimum distortion is targeted for the encoding distortion that can be included when obtaining the final decoded signal. Perform processing. Therefore, the sign key performance can be further improved.
- the LPC coefficient used for the perceptual weighting process is obtained by separately performing LPC analysis on the subchannel signal that is the input signal of SC coding unit 310. That is, perceptual weighting is performed using perceptual weights that reflect the subchannel signal itself that should be the final decoded signal. Therefore, it is possible to obtain a sign key parameter with less distortion.
- the stereo encoding device and stereo encoding method according to the present invention are not limited to the above embodiments, and can be implemented with various modifications.
- the stereo coding apparatus is a communication terminal apparatus in a mobile communication system.
- the stereo encoding device and the stereo encoding method according to the present invention can also be used in a wired communication system.
- the present invention can also be realized by software.
- the stereo coding method processing algorithm according to the present invention is described in a programming language, the program is stored in a memory, and is executed by an information processing means. Similar functions can be realized.
- an adaptive codebook may be referred to as an adaptive excitation codebook
- a fixed codebook may be referred to as a fixed excitation codebook.
- fixed codebooks are sometimes called stochastic codebooks or random codebooks.
- Each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually integrated into a single chip, or may be combined into a single chip to include some or all of them!
- the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. It is also possible to use a field programmable gate array (FPGA) that can be programmed after LSI manufacturing, or a reconfigurable processor that can reconfigure the connection or setting of circuit cells inside the LSI.
- FPGA field programmable gate array
- the stereo encoding device, the stereo decoding device, and these methods according to the present invention can be applied to applications such as a communication terminal device and a base station device in a mobile communication system.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2007120056/09A RU2007120056A (ru) | 2004-11-30 | 2005-11-28 | Устройство стереокодирования, устройство стереодекодирования и способы стереокодирования и стереодекодирования |
US11/719,413 US7848932B2 (en) | 2004-11-30 | 2005-11-28 | Stereo encoding apparatus, stereo decoding apparatus, and their methods |
JP2006547900A JPWO2006059567A1 (ja) | 2004-11-30 | 2005-11-28 | ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 |
BRPI0516658-6A BRPI0516658A (pt) | 2004-11-30 | 2005-11-28 | aparelho de codificação de estéreo, aparelho de decodificação de estéreo e seus métodos |
EP05809758A EP1814104A4 (en) | 2004-11-30 | 2005-11-28 | STEREO ENCODING APPARATUS, STEREO DECODING APPARATUS, AND METHODS THEREOF |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004347273 | 2004-11-30 | ||
JP2004-347273 | 2004-11-30 | ||
JP2005100850 | 2005-03-31 | ||
JP2005-100850 | 2005-03-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006059567A1 true WO2006059567A1 (ja) | 2006-06-08 |
Family
ID=36565000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/021800 WO2006059567A1 (ja) | 2004-11-30 | 2005-11-28 | ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 |
Country Status (7)
Country | Link |
---|---|
US (1) | US7848932B2 (ja) |
EP (1) | EP1814104A4 (ja) |
JP (1) | JPWO2006059567A1 (ja) |
KR (1) | KR20070085532A (ja) |
BR (1) | BRPI0516658A (ja) |
RU (1) | RU2007120056A (ja) |
WO (1) | WO2006059567A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013033189A (ja) * | 2011-07-01 | 2013-02-14 | Sony Corp | オーディオ符号化装置、オーディオ符号化方法、およびプログラム |
JP2013114264A (ja) * | 2011-11-28 | 2013-06-10 | Samsung Electronics Co Ltd | 音声信号送信装置、音声信号受信装置及びその方法 |
JP2015528925A (ja) * | 2012-07-31 | 2015-10-01 | インテレクチュアル ディスカバリー シーオー エルティディIntellectual Discovery Co.,Ltd. | オーディオ信号処理装置および方法 |
JP2018533057A (ja) * | 2015-09-25 | 2018-11-08 | ヴォイスエイジ・コーポレーション | セカンダリチャンネルを符号化するためにプライマリチャンネルのコーディングパラメータを使用するステレオ音声信号を符号化するための方法およびシステム |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2623551T3 (es) * | 2005-03-25 | 2017-07-11 | Iii Holdings 12, Llc | Dispositivo de codificación de sonido y procedimiento de codificación de sonido |
US20100100372A1 (en) * | 2007-01-26 | 2010-04-22 | Panasonic Corporation | Stereo encoding device, stereo decoding device, and their method |
JP5153791B2 (ja) * | 2007-12-28 | 2013-02-27 | パナソニック株式会社 | ステレオ音声復号装置、ステレオ音声符号化装置、および消失フレーム補償方法 |
WO2009116280A1 (ja) * | 2008-03-19 | 2009-09-24 | パナソニック株式会社 | ステレオ信号符号化装置、ステレオ信号復号装置およびこれらの方法 |
WO2009146734A1 (en) * | 2008-06-03 | 2009-12-10 | Nokia Corporation | Multi-channel audio coding |
EP2395504B1 (en) * | 2009-02-13 | 2013-09-18 | Huawei Technologies Co., Ltd. | Stereo encoding method and apparatus |
CN101521013B (zh) * | 2009-04-08 | 2011-08-17 | 武汉大学 | 空间音频参数双向帧间预测编解码装置 |
KR101035070B1 (ko) * | 2009-06-09 | 2011-05-19 | 주식회사 라스텔 | 고음질 가상 공간 음향 생성 장치 및 방법 |
CN102280107B (zh) * | 2010-06-10 | 2013-01-23 | 华为技术有限公司 | 边带残差信号生成方法及装置 |
CN116741185A (zh) | 2016-11-08 | 2023-09-12 | 弗劳恩霍夫应用研究促进协会 | 用于下混频至少两声道的下混频器和方法以及多声道编码器和多声道解码器 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002244698A (ja) * | 2000-12-14 | 2002-08-30 | Sony Corp | 符号化装置および方法、復号装置および方法、並びに記録媒体 |
JP2003516555A (ja) * | 1999-12-08 | 2003-05-13 | フラオホッフェル−ゲゼルシャフト ツル フェルデルング デル アンゲヴァンドテン フォルシュング エー.ヴェー. | ステレオ音響信号の処理方法と装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SE519985C2 (sv) | 2000-09-15 | 2003-05-06 | Ericsson Telefon Ab L M | Kodning och avkodning av signaler från flera kanaler |
US6614365B2 (en) * | 2000-12-14 | 2003-09-02 | Sony Corporation | Coding device and method, decoding device and method, and recording medium |
US7292901B2 (en) | 2002-06-24 | 2007-11-06 | Agere Systems Inc. | Hybrid multi-channel/cue coding/decoding of audio signals |
US7644003B2 (en) * | 2001-05-04 | 2010-01-05 | Agere Systems Inc. | Cue-based audio coding/decoding |
SE0202159D0 (sv) * | 2001-07-10 | 2002-07-09 | Coding Technologies Sweden Ab | Efficientand scalable parametric stereo coding for low bitrate applications |
WO2003069954A2 (en) * | 2002-02-18 | 2003-08-21 | Koninklijke Philips Electronics N.V. | Parametric audio coding |
DE60311794T2 (de) * | 2002-04-22 | 2007-10-31 | Koninklijke Philips Electronics N.V. | Signalsynthese |
CN1307612C (zh) * | 2002-04-22 | 2007-03-28 | 皇家飞利浦电子股份有限公司 | 声频信号的编码解码方法、编码器、解码器及相关设备 |
US7519538B2 (en) * | 2003-10-30 | 2009-04-14 | Koninklijke Philips Electronics N.V. | Audio signal encoding or decoding |
US8204261B2 (en) * | 2004-10-20 | 2012-06-19 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Diffuse sound shaping for BCC schemes and the like |
-
2005
- 2005-11-28 RU RU2007120056/09A patent/RU2007120056A/ru not_active Application Discontinuation
- 2005-11-28 BR BRPI0516658-6A patent/BRPI0516658A/pt not_active Application Discontinuation
- 2005-11-28 EP EP05809758A patent/EP1814104A4/en not_active Withdrawn
- 2005-11-28 JP JP2006547900A patent/JPWO2006059567A1/ja not_active Ceased
- 2005-11-28 US US11/719,413 patent/US7848932B2/en active Active
- 2005-11-28 WO PCT/JP2005/021800 patent/WO2006059567A1/ja active Application Filing
- 2005-11-28 KR KR1020077012113A patent/KR20070085532A/ko not_active Application Discontinuation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003516555A (ja) * | 1999-12-08 | 2003-05-13 | フラオホッフェル−ゲゼルシャフト ツル フェルデルング デル アンゲヴァンドテン フォルシュング エー.ヴェー. | ステレオ音響信号の処理方法と装置 |
JP2002244698A (ja) * | 2000-12-14 | 2002-08-30 | Sony Corp | 符号化装置および方法、復号装置および方法、並びに記録媒体 |
Non-Patent Citations (5)
Title |
---|
DAVIDSON G. ET AL: "Complexity reduction methods for vector excitation coding", IEEE INTERNATIONAL CONFERENCE ON ICASSP '86, vol. 11, 1986, pages 3055 - 3058, XP003006841 * |
GOTO M. ET AL.: "Channel-kan Joho o Mochiita Onsei Tsushinyo Stereo Onsei Fugoka Hoho no Kento", 2005 NEN THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS SOGO TAIKAI KOEN RONBUNSHU, D-14-2, 7 March 2005 (2005-03-07), pages 119, XP003006842 * |
GOTO M. ET AL.: "Onsei Tsushinyo Scalable Stereo Onsei Fugoka Hoho no Kento", FIT2005 (4TH FORUM ON INFORMATION TECHNOLOGY) KOEN RONBUNSHU, G-017, 22 August 2005 (2005-08-22), pages 299 - 300, XP002995723 * |
GOTO M. ET AL.: "Onsei Tsushinyo Stereo Onsei Fugoka Hoho no Kento", 2004 NEN THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS ENGINEERING SCIENCES SOCIETY CONFERENCE KOEN RONBUNSHU, A-6-6, 8 September 2004 (2004-09-08), pages 119, XP003000725 * |
See also references of EP1814104A4 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013033189A (ja) * | 2011-07-01 | 2013-02-14 | Sony Corp | オーディオ符号化装置、オーディオ符号化方法、およびプログラム |
JP2013114264A (ja) * | 2011-11-28 | 2013-06-10 | Samsung Electronics Co Ltd | 音声信号送信装置、音声信号受信装置及びその方法 |
JP2015528925A (ja) * | 2012-07-31 | 2015-10-01 | インテレクチュアル ディスカバリー シーオー エルティディIntellectual Discovery Co.,Ltd. | オーディオ信号処理装置および方法 |
JP2018533057A (ja) * | 2015-09-25 | 2018-11-08 | ヴォイスエイジ・コーポレーション | セカンダリチャンネルを符号化するためにプライマリチャンネルのコーディングパラメータを使用するステレオ音声信号を符号化するための方法およびシステム |
US10984806B2 (en) | 2015-09-25 | 2021-04-20 | Voiceage Corporation | Method and system for encoding a stereo sound signal using coding parameters of a primary channel to encode a secondary channel |
US11056121B2 (en) | 2015-09-25 | 2021-07-06 | Voiceage Corporation | Method and system for encoding left and right channels of a stereo sound signal selecting between two and four sub-frames models depending on the bit budget |
JP2021131569A (ja) * | 2015-09-25 | 2021-09-09 | ヴォイスエイジ・コーポレーション | セカンダリチャンネルを符号化するためにプライマリチャンネルのコーディングパラメータを使用するステレオ音声信号を符号化するための方法およびシステム |
JP7124170B2 (ja) | 2015-09-25 | 2022-08-23 | ヴォイスエイジ・コーポレーション | セカンダリチャンネルを符号化するためにプライマリチャンネルのコーディングパラメータを使用するステレオ音声信号を符号化するための方法およびシステム |
Also Published As
Publication number | Publication date |
---|---|
BRPI0516658A (pt) | 2008-09-16 |
US7848932B2 (en) | 2010-12-07 |
EP1814104A1 (en) | 2007-08-01 |
KR20070085532A (ko) | 2007-08-27 |
US20090150162A1 (en) | 2009-06-11 |
RU2007120056A (ru) | 2008-12-10 |
JPWO2006059567A1 (ja) | 2008-06-05 |
EP1814104A4 (en) | 2008-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006059567A1 (ja) | ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 | |
JP5413839B2 (ja) | 符号化装置および復号装置 | |
JP4963965B2 (ja) | スケーラブル符号化装置、スケーラブル復号装置、及びこれらの方法 | |
JP4850827B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP4555299B2 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
US20090204397A1 (en) | Linear predictive coding of an audio signal | |
JP4842147B2 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
US8036390B2 (en) | Scalable encoding device and scalable encoding method | |
JPWO2008072701A1 (ja) | ポストフィルタおよびフィルタリング方法 | |
WO2007088853A1 (ja) | 音声符号化装置、音声復号装置、音声符号化システム、音声符号化方法及び音声復号方法 | |
KR20070029754A (ko) | 음성 부호화 장치 및 그 방법과, 음성 복호화 장치 및 그방법 | |
US8271275B2 (en) | Scalable encoding device, and scalable encoding method | |
WO2010016270A1 (ja) | 量子化装置、符号化装置、量子化方法及び符号化方法 | |
JP2006072269A (ja) | 音声符号化装置、通信端末装置、基地局装置および音声符号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006547900 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11719413 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005809758 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007120056 Country of ref document: RU Ref document number: 1020077012113 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580041181.1 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005809758 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: PI0516658 Country of ref document: BR |