WO2006075663A1 - 音声切替装置および音声切替方法 - Google Patents
音声切替装置および音声切替方法 Download PDFInfo
- Publication number
- WO2006075663A1 WO2006075663A1 PCT/JP2006/300295 JP2006300295W WO2006075663A1 WO 2006075663 A1 WO2006075663 A1 WO 2006075663A1 JP 2006300295 W JP2006300295 W JP 2006300295W WO 2006075663 A1 WO2006075663 A1 WO 2006075663A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- section
- enhancement layer
- signal
- switching device
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 23
- 230000005236 sound signal Effects 0.000 claims abstract description 149
- 230000008859 change Effects 0.000 claims abstract description 106
- 238000001514 detection method Methods 0.000 claims description 122
- 238000004891 communication Methods 0.000 claims description 7
- 238000001228 spectrum Methods 0.000 claims description 5
- 230000000630 rising effect Effects 0.000 claims description 4
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 239000010410 layer Substances 0.000 description 235
- 239000012792 core layer Substances 0.000 description 143
- 238000004364 calculation method Methods 0.000 description 32
- 238000009499 grossing Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 18
- 230000007774 longterm Effects 0.000 description 17
- 230000007423 decrease Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
Definitions
- the present invention relates to a voice switching device and a voice switching method that switch a band of a voice signal.
- scalable speech coding In a technology for hierarchically encoding speech signals, generally referred to as scalable speech coding, even if code data of a certain layer (layer) is lost, the code data of another layer is also stored.
- the audio signal can be decoded.
- One type of scalable coding is called band scalable speech coding.
- band scalable speech code ⁇ a processing layer that encodes and decodes a narrowband signal, a processing layer that performs encoding and decoding to improve the quality and bandwidth of a narrowband signal, Is used.
- the former processing layer is referred to as a core layer
- the latter processing layer is referred to as an extension layer.
- the band scalable voice code is applied to voice data communication on a communication network in which, for example, the transmission band is not guaranteed and encoded data may be partially lost or delayed.
- both core layer and enhancement layer code data coarrayer encoded data and enhancement layer encoded data
- coarrayer encoded data and enhancement layer encoded data can be received, and only coarrayer code data can be received. Therefore, in the speech decoding apparatus provided on the receiving side, the decoded speech signal to be output is divided into a narrowband decoded speech signal that can be obtained only from the core layer code data and a wideband obtained from the code data of both the core layer and the enhancement layer. It is necessary to switch between the decoded audio signals.
- both signals that is, the narrowband decoded audio signal and the wideband decoded audio signal
- both signals are combined with each other, and then both signals are weighted and added.
- the mixing ratio of both signals is set to a certain degree (increment or decrement) over time. The two signals are added while changing each other.
- Patent Document 1 Japanese Patent Laid-Open No. 2000-352999
- the degree of change in the mixing ratio used for weighted addition of both signals is always constant, so that the listener of the decoded voice may feel uncomfortable depending on the reception situation. May have a sense of variation. For example, if voice switching frequently occurs in a section in which a signal representing stationary background noise is included in the voice signal, it becomes easier for the listener to perceive the change in the sense of unity and band feeling associated with the switching. Therefore, there was a certain limit to the improvement of sound quality.
- an object of the present invention is to provide a voice switching device and a voice switching method that can improve the quality of decoded voice.
- the voice switching device of the present invention is a voice switching device that outputs a mixed signal in which a narrowband voice signal and a wideband voice signal are mixed when switching the band of the voice signal to be output.
- a setting means for variably setting the degree for variably setting the degree.
- FIG. 1 is a block diagram showing a configuration of a speech decoding apparatus according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing a configuration of a weighted addition unit according to an embodiment of the present invention.
- FIG. 3 is a diagram for explaining an example of a change with time of an enhancement layer gain according to an embodiment of the present invention.
- FIG. 4 is a diagram for explaining another example of the change over time of the enhancement layer gain according to the embodiment of the present invention.
- FIG. 5 is a block diagram showing an internal configuration of a permissible section detecting unit according to an embodiment of the present invention.
- FIG. 6 is a block diagram showing an internal configuration of a silent section detecting unit according to an embodiment of the present invention. 7] Block diagram showing the internal configuration of the power fluctuation section detector according to one embodiment of the present invention.
- FIG. 8 is a block diagram showing an internal configuration of a sound quality change section detecting unit according to one embodiment of the present invention.
- FIG. 9 is a block diagram showing an internal configuration of an enhancement layer power minute section detector according to an embodiment of the present invention.
- FIG. 1 is a block diagram showing a configuration of a speech decoding apparatus provided with a speech switching apparatus according to an embodiment of the present invention.
- the speech decoding apparatus 100 in FIG. 1 includes a core layer decoding unit 102, a core layer frame error detection unit 104, an enhancement layer frame error detection unit 106, an enhancement layer decoding unit 108, an allowable interval detection unit 110, and signal adjustment. Section 112 and weighted addition section 114.
- Core layer frame error detection section 104 detects whether or not the core layer code data is decodable. Specifically, the core layer frame error detection unit 104 detects a core layer frame error. Then, when a core layer frame error is detected, it is determined that the core layer code data cannot be decoded. The core layer frame error detection result is output to the coarrayer decoding unit 102 and the allowable interval detection unit 110.
- the core layer frame error refers to an error received during transmission of a frame of the core layer encoded data, a packet loss in packet communication (for example, packet loss on the communication path, packet not received due to jitter, etc.) )
- a packet loss in packet communication for example, packet loss on the communication path, packet not received due to jitter, etc.
- For the core layer code key Refers to a state where most or all cannot be used for decryption.
- the detection of the core layer frame error is realized, for example, by executing the following processing in the core layer frame error detecting unit 104.
- the core layer frame error detection unit 104 receives error information separately from the core layer code data.
- the core layer frame error detection unit 104 performs error detection using an error check code such as CRC (Cyclic Redundancy Check) added to the core layer encoded data.
- CRC Cyclic Redundancy Check
- the core layer frame error detection unit 104 determines that the core layer code data has not arrived by the decoding time. Alternatively, packet loss or non-arrival is detected.
- Unit 104 obtains information to that effect from core layer decoding unit 102.
- the core layer decoding unit 102 receives the core layer encoded data and decodes the core layer encoded data.
- the core layer decoded audio signal generated by this decoding is output to signal adjustment section 112.
- the core layer decoded audio signal is a narrowband signal.
- the core layer decoded audio signal may be used as a final output as it is.
- the core layer decoding unit 102 outputs a part of the core layer code data or the core layer LSP (Line Spectrum Pair) to the allowable interval detecting unit 110.
- the core layer LSP is a spectral parameter obtained in the process of coarrayer decoding.
- the core layer decoding unit 102 outputs the core layer LSP to the permissible interval detecting unit 110 is described as an example, but other spectral parameters obtained in the core layer decoding process and further the process of the core array decoding are described. Other parameters that are not the spectral parameters obtained in step 1 may be output.
- the core layer decoding unit 102 is included in the core layer code data when a core layer frame error is notified from the core layer frame error detection unit 104 or in the decoding process of the core layer code data.
- linear prediction coefficients and sound source interpolation are performed using past code information.
- the core layer decoded audio signal is continuously generated and output.
- error detection included in the core layer code data is performed.
- the core layer decoding unit 102 notifies the core layer frame error detecting unit 104 of the information to that effect.
- Enhancement layer frame error detection section 106 detects whether or not enhancement layer encoded data can be decoded. Specifically, the enhancement layer frame error detection unit 106 detects an enhancement layer frame error. When an enhancement layer frame error is detected, it is determined that the enhancement layer code data cannot be decoded. The enhancement layer frame error detection result is output to enhancement layer decoding section 108 and weighted addition section 114.
- the enhancement layer frame error refers to most or all of the enhancement layer code data due to an error received during transmission of the enhancement layer encoded data frame or a packet loss in packet communication. This refers to a state that cannot be used for decoding.
- the detection of the enhancement layer frame error is realized, for example, by executing the following processing in the enhancement layer frame error detection unit 106.
- the enhancement layer frame error detection unit 106 receives error information separately from the enhancement layer code key data.
- the enhancement layer frame error detection unit 106 performs error detection using an error check code such as CRC added to the enhancement layer code key data.
- enhancement layer frame error detection section 106 determines that enhancement layer code key data has not arrived by the decoding time.
- the extended layer frame error detection unit 106 detects packet loss or non-arrival.
- an enhancement layer frame error is detected.
- the detection unit 106 acquires information to that effect from the enhancement layer decoding unit 108.
- the enhancement layer frame error detection unit 106 detects when a core layer frame error is detected. Judge that an enhancement layer frame error has been detected. In this case, enhancement layer frame error detection section 106 receives an input of the core layer frame error detection result from core layer frame error detection section 104.
- the enhancement layer decoding unit 108 receives the enhancement layer code data and decodes the enhancement layer encoded data.
- the enhancement layer decoded speech signal generated by this decoding is output to allowable interval detection section 110 and weighted addition section 114.
- Enhanced layer decoded sound The voice signal is a broadband signal.
- the enhancement layer decoding unit 108 is provided with the enhancement layer code key data when the enhancement layer frame error is notified from the enhancement layer frame error detection unit 106 or in the decoding process of the enhancement layer code key data. If it is determined that there is a serious error due to the error detection code included in the code, linear prediction coefficients and excitation interpolation are performed using past coding information. As a result, an enhancement layer decoded audio signal is generated and output as necessary. Also, in the decoding process of the enhancement layer code key data, if it is determined that there is a serious error due to an error detection code included in the enhancement layer coded data, the enhancement layer decoding key unit 108 notifies that fact. Information is notified to enhancement layer frame error detection section 106.
- Signal adjustment section 112 adjusts the core layer decoded speech signal input from core layer decoding section 102. Specifically, the signal adjustment unit 112 performs upsampling on the core layer decoded audio signal and matches the sampling frequency of the enhancement layer decoded audio signal. In addition, the signal adjustment unit 112 adjusts the delay and phase of the core layer decoded audio signal in order to match the delay and phase to the enhancement layer decoded audio signal.
- the core layer decoded speech signal that has been subjected to these processes is output to tolerance section detecting section 110 and weighted adding section 114.
- Permissible section detection section 110 receives the coarrayer frame error detection result input from core layer frame error detection section 104, the core layer decoded speech signal input from signal adjustment section 112, and the input from core layer decoding section 102
- the enhancement layer decoded speech signal input from the core layer LSP and enhancement layer decoding unit 108 is analyzed, and an allowable period is detected based on the analysis result.
- the permissible section detection result is output to weighting addition section 114. For this reason, it is possible to limit the period in which the mixing ratio of the core layer decoded audio signal and the enhancement layer decoded audio signal is changed with time to a relatively high period only, and to change the degree of change of the mixing ratio with time. Timing can be controlled.
- the permissible section is a section in which the influence on the audibility is small even if the band of the output sound signal changes, that is, a section in which the change in the band of the output sound signal is not easily perceived by the listener.
- the band change of the output audio signal may be perceived by the listener. It is a pancreas section. Therefore, the allowable section is a section that allows a sudden change in the band of the output audio signal.
- the permissible section detection unit 110 detects a silent section, a power fluctuation section, a sound quality change section, an enhancement layer noise fine section, and the like as permissible sections, and outputs the detection result to the weighted addition section 114. Details of the internal configuration of the permissible section detection unit 110 and processing for detecting the permissible section will be described later.
- Weighting addition section 114 as an audio switching device switches the band of the output audio signal. Further, when the band of the output audio signal is switched, the weighted addition unit 114 outputs a mixed signal obtained by mixing the core layer audio signal and the enhancement layer audio signal as an output audio signal.
- the mixed signal is generated by performing weighted calorie calculation of the core layer decoded speech signal input from the signal adjustment unit 112 and the enhancement layer decoded speech signal input from the enhancement layer decoding unit 108. That is, the mixed signal is a weighted sum of the core layer decoded speech signal and the enhancement layer decoded speech signal. Details of weighted addition will be described later.
- FIG. 5 is a block diagram showing an internal configuration of the allowable section detection unit 110.
- the permissible section detector 110 includes a core layer decoded speech signal power calculator 501, a silent section detector 502, a single fluctuation section detector 503, a sound quality change section detector 504, an enhancement layer power minute section detector 505, and a permissible section A determination unit 506 is included.
- Core layer decoded speech signal power calculation section 501 receives the core layer decoded speech signal from core layer decoding section 102, and calculates a core layer decoded speech signal Pc (t) according to the following equation (1). .
- Pc (t) y Oc (i) * Oc (i) (1)
- t is the frame number
- Pc (t) is the power of the core layer decoded audio signal at frame t
- L-FRAME is the frame length
- i represents the sample number
- Oc (i) represents the core layer decoded speech signal.
- Core layer decoded speech signal power calculation section 501 has a core layer decoded sound obtained by calculation.
- the voice signal power Pc (t) is output to the silent interval detector 502, the power fluctuation interval detector 503, and the enhancement layer power minute interval detector 505.
- the silent section detection unit 502 detects a silent section using the core layer decoded speech signal power Pc (t) input from the core layer decoded speech signal power calculation unit 501 and determines the obtained silent section detection result as an allowable section determination. Output to part 506.
- the power fluctuation interval detection unit 503 detects the power fluctuation interval using the core layer decoded audio signal power Pc (t) input from the core layer decoded audio signal power calculation unit 501, and accepts the obtained power fluctuation interval detection result.
- the sound quality change interval detection unit 504 detects the sound quality change interval using the core layer frame error detection result input from the core layer frame error detection unit 104 and the core layer LSP input from the core layer decoding unit 102, The obtained sound quality change interval detection result is output to the allowable interval determination unit 506.
- the enhancement layer power minute section detection unit 505 detects the enhancement layer power minute section using the enhancement layer decoded speech signal input from the enhancement layer decoding unit 108, and obtains the obtained enhancement layer power minute section detection result. Output to allowable section judgment unit 506.
- the permissible section determination unit 506 Based on the detection results of the silent section detector 502, the power fluctuation section detector 503, the sound quality change section detector 504, and the enhancement layer power micro section detector 505, the permissible section determination unit 506 performs a silent section, a power fluctuation section, and a sound quality. It is determined whether or not the force at which the change interval or the extended layer power minute interval is detected. That is, it is determined whether or not the force is detected in the allowable section, and the allowable section detection result is output as the determination result.
- FIG. 6 is a block diagram showing an internal configuration of the silent section detection unit 502.
- the silent section is a section in which the power of the core layer decoded speech signal is very small. In the silent period, even if the gain of the enhancement layer decoded speech signal (in other words, the mixing ratio of the core layer decoded speech signal and the enhancement layer decoded speech signal) is rapidly changed, the change is hardly perceived.
- the silence period is detected by detecting that the power of the core layer decoded audio signal is equal to or less than a predetermined threshold.
- the silent section detection unit 502 that performs such detection includes a silent determination threshold storage unit 521 and a silent section determination unit 522.
- the silence determination threshold storage unit 521 stores a threshold ⁇ necessary for determining a silence interval, and outputs the threshold ⁇ to the silence interval determination unit 522.
- the silent section determination unit 522 receives the core layer decoded audio signal power Pc (t) input from the core layer decoded audio signal power calculation unit 501. Compared with the threshold value ⁇ , the silent section determination result d (t) is obtained according to the following equation (2). Since the allowable section includes the silent section, the silent section determination result is represented by d (t) in the same way as the allowable section detection result.
- the silent section determination unit 522 outputs the silent section determination result d (t) to the allowable section determination unit 502.
- FIG. 7 is a block diagram showing an internal configuration of power fluctuation section detecting section 503.
- the power fluctuation section is a section in which the noise level of the core layer decoded speech signal (or enhancement layer decoded speech signal) varies greatly.
- slight changes for example, changes in the timbre of the output audio signal and changes in the band feeling
- Absent therefore, even if the gain of the enhancement layer decoded audio signal (in other words, the mixing ratio of the core layer decoded audio signal and the enhancement layer decoded audio signal) is rapidly changed, the change is hardly perceived.
- the difference or ratio is equal to or greater than the threshold value as a result of comparing the difference or ratio between the short-term power and the long-term smoothed power of the core layer decoded speech signal (or enhancement layer decoded speech signal) with a predetermined threshold. It is detected by detecting this.
- the power fluctuation interval detection unit 503 that performs such detection includes a short-term smoothing coefficient storage unit 531, a short-term smoothing power calculation unit 532, a long-term smoothing coefficient storage unit 533, a long-term smoothing power calculation unit 534, and a determination adjustment.
- a coefficient storage unit 535 and a power fluctuation section determination unit 536 are provided.
- the short-term smoothing coefficient storage unit 531 stores the short-term smoothing coefficient ex and outputs the short-term smoothing coefficient ⁇ to the short-term smoothing power calculation unit 532.
- the short-term smoothing power calculator 532 uses the short-term smoothing coefficient ⁇ and the core layer decoded speech signal power Pc (t) input from the core layer decoded speech signal power calculator 501 according to the following equation (3).
- the short-term smoothing power Ps (t) of the coarrayer decoded speech signal power Pc (t) is calculated.
- the short-term smoothing power calculation unit 532 outputs the short-term smoothing power Ps (t) of the calculated core layer decoded speech signal power Pc (t) to the power fluctuation section determination unit 536.
- the long-term smoothing coefficient storage unit 533 stores the long-term smoothing coefficient
- the long-term smoothed power calculation unit 53 4 uses the long-term smoothing coefficient
- the long-term smoothing power calculation unit 534 outputs the long-term smoothing power Pl (t) of the calculated core layer decoded speech signal power Pc (t) to the power fluctuation section determination unit 536.
- the short-term smoothing coefficient ⁇ and the long-term smoothing coefficient j8 have a relationship of 0.0 ⁇ ⁇
- the short-term smoothing coefficient a and the long-term smoothing coefficient j8 have a relationship of (0.0 ⁇ ⁇
- Determination adjustment coefficient storage section 535 stores adjustment coefficient ⁇ for determining a power fluctuation section, and outputs adjustment coefficient ⁇ to power fluctuation section determination section 536.
- the power fluctuation interval determination unit 536 includes the adjustment coefficient ⁇ , Ps (t) input from the short-term smoothing power calculation unit 532, and long-term smoothing power PI (t) input from the long-term smoothing power calculation unit 534. Is used to obtain the power fluctuation interval determination result d (t) according to the following equation (5). Since the allowable section includes a single fluctuation section, here, the power fluctuation section determination result is represented by d (t) as with the allowable section detection result.
- the power fluctuation section determination unit 536 outputs the power fluctuation section determination result d (t) to the allowable section determination unit 506.
- the power fluctuation section is detected by comparing the short-term power and the long-term smoothed power.
- the power change By determining that the amount is greater than or equal to a predetermined threshold, May be issued.
- the power fluctuation interval may be detected by determining when the core layer decoded audio signal (or enhancement layer decoded audio signal) rises.
- FIG. 8 is a block diagram showing an internal configuration of the sound quality change section detecting unit 504.
- the sound quality change section is a section in which the sound quality of the core layer decoded speech signal (or enhancement layer decoded speech signal) varies greatly.
- the core layer decoded speech signal (or enhancement layer decoded speech signal) itself is in a state of losing temporal continuity audibly.
- the gain of the enhancement layer decoded speech signal in other words, the mixing ratio of the core layer decoded speech signal and the enhancement layer decoded speech signal
- the sound quality change section is detected by detecting a sudden change in the type of the background noise signal included in the core layer decoded speech signal (or enhancement layer decoded speech signal).
- the sound quality change section is detected by detecting a change in the spectrum parameter (for example, LSP) of the core layer code data. For example, in order to detect changes in LSP, the total distance between each element of the past LSP and each element of the current LSP is compared with a predetermined threshold. Detect that there is.
- the sound quality change interval detection unit 504 that performs such detection includes an LSP element distance calculation unit 541, an LSP element distance storage unit 542, an LSP element distance change rate calculation unit 543, a sound quality change determination threshold storage unit 544, and a core layer.
- An error recovery detection unit 545 and a sound quality change section determination unit 546 are provided.
- the LSP inter-element distance calculation unit 541 uses the core layer LSP input from the core layer decoding unit 102 to calculate the LSP inter-element distance dlsp (t) according to the following equation (6).
- the LSP element distance dlsp (t) is output to the LSP element distance accumulation unit 542 and the LSP element distance change rate calculation unit 543.
- the LSP inter-element distance accumulation unit 542 accumulates the LSP inter-element distance dlsp (t) input from the LSP inter-element distance calculation unit 541, and the past (one frame before) inter-LSP inter-element distance dlsp (t- 1) is output to the distance change rate calculation unit 543 between LSP elements.
- LSP element distance change rate calculator 5 43 calculates the LSP inter-element distance change rate by dividing the LSP inter-element distance dlsp (t) by the past inter-LSP inter-element distance dslp (t-1). The calculated inter-LSP element distance change rate is output to the sound quality change interval determination unit 546.
- the sound quality change determination threshold storage unit 544 stores a threshold A necessary for determination of the sound quality change section, and outputs the threshold A to the sound quality change section determination unit 546.
- the sound quality change interval determination unit 546 uses the threshold A and the LSP element distance change rate calculation unit 543 to input the LSP element distance change rate according to the following equation (7), and the sound quality change interval: The judgment result d (t) is obtained.
- lsp is the LSP coefficient of the core layer
- M is the analysis order of the linear prediction coefficient of the core layer
- m is the element number of the LSP
- dlsp is the distance between adjacent elements.
- the sound quality change interval determination result is represented by d (t) in the same manner as the allowable interval detection result.
- the sound quality change section determination unit 546 outputs the sound quality change section determination result d (t) to the allowable section determination unit 506.
- the core layer error recovery detection unit 545 detects that a frame error has been recovered (normal reception) based on the core layer frame error detection result input from the core layer frame error detection unit 102, the sound quality is detected.
- the change interval determination unit 546 is notified, and the sound quality change interval determination unit 546 determines a predetermined number of frames after the return as the sound quality change interval. That is, a predetermined number of frames after the interpolation processing is performed on the core layer decoded speech signal due to the coarrayer frame error is determined as the sound quality change section.
- FIG. 9 is a block diagram showing an internal configuration of enhancement layer power minute section detector 505
- the enhancement layer power minute section is a section in which the power of the enhancement layer decoded speech signal is very small.
- the extended layer power minute section even if the bandwidth of the output audio signal is changed rapidly, the change is difficult to perceive. Therefore, even if the gain of the enhancement layer decoded speech signal (in other words, the mixing ratio of the core layer decoded speech signal and the enhancement layer decoded speech signal) is rapidly changed, the change is hardly perceived.
- Extended layer power minute section is This is detected by detecting that the power of the enhancement layer decoded speech signal is equal to or less than a predetermined threshold.
- the enhancement layer power minute section is detected by detecting that the ratio of the power of the enhancement layer decoded speech signal to the power of the core layer decoded speech signal is not more than a predetermined value.
- the enhancement layer power minute section detection unit 505 that performs such detection includes an enhancement layer decoded speech signal power calculation unit 551, an enhancement layer power ratio calculation unit 552, an enhancement layer power minute determination threshold storage unit 553, an enhancement layer power minute section.
- a determination unit 554 is included.
- the enhancement layer decoded speech signal power calculation section 551 uses the enhancement layer decoded signal input from the enhancement layer decoding section 108 and uses the enhancement layer decoded signal according to the following equation (8): t) is calculated.
- Pe (t) J Oe (i) * Oe (i) (8)
- Oe (i) represents an enhancement layer decoded speech signal
- Pe (t) represents an enhancement layer decoded speech signal part.
- the enhancement layer decoded speech signal power Pe (t) is output to the enhancement layer power ratio calculation unit 552 and enhancement layer power minute section determination unit 554.
- Enhancement layer power ratio calculation section 552 divides this enhancement layer decoded speech signal power Pe (t) by the core layer decoded speech signal Pc (t) input from core layer decoded speech signal power computation section 501. Thus, the enhancement layer power ratio is calculated. The enhancement layer power ratio is output to enhancement layer power minute section determination unit 554.
- Enhancement layer power minute determination threshold storage section 553 stores thresholds B and C necessary for determination of enhancement layer power minute sections, and outputs thresholds B and C to enhancement layer power minute section determination section 554. .
- the enhancement layer power minute section determination unit 554 includes an enhancement layer decoded speech signal power Pe (t) input from the enhancement layer decoded speech signal power calculation unit 551, an enhancement layer power ratio input from the enhancement layer power ratio calculation unit 552, Using the threshold values B and C input from the enhancement layer power minute determination threshold storage unit 553, an enhancement layer power minute section determination result d (t) is obtained according to the following equation (9). Since the permissible section includes the enhancement layer no-minor section, here, the judgment result of the enhancement layer power minute section is allowed. It is expressed by d (t) as in the section detection result. The enhancement layer power minute section determination unit 554 outputs the enhancement layer power minute section determination result d (t) to the allowable section determination unit 506.
- the weighting calorie calculation unit 114 changes the mixture ratio relatively abruptly only in a section where the change in the bandwidth of the audio signal is difficult to perceive. At the same time, the mixing ratio is changed relatively slowly in the section where the change in the band of the audio signal is easily perceived. Therefore, if the listener feels uncomfortable with the audio signal, the possibility of having a sense of variation can be reliably reduced.
- FIG. 2 is a block diagram showing an internal configuration of the weighted addition unit 114.
- the weighted addition unit 114 includes an enhancement layer decoded speech gain controller 120, an enhancement layer decoded speech amplifier 122, and an adder 124.
- Enhancement layer decoded speech gain controller 120 serving as setting means determines the gain of the enhancement layer decoded speech signal (hereinafter referred to as "enhancement layer gain") based on the enhancement layer frame error detection result and the allowable interval detection result. Control.
- the degree of change with time of the gain of the enhancement layer decoded speech signal is variably set. Thereby, the mixing ratio when the core layer decoded audio signal and the enhancement layer decoded audio signal are mixed is variably set.
- the enhancement layer decoded speech gain controller 120 does not control the gain of the core layer decoded speech signal (hereinafter referred to as "core layer gain”), and does not perform core layer decoded speech when mixed with the enhancement layer decoded speech signal.
- the gain of the signal is fixed at a constant value. Therefore, the mixing ratio can be variably set more easily than when the gains of both signals are variably set.
- the enhancement layer gain but also the core layer gain may be controlled.
- Enhancement layer decoded speech amplifier 122 is controlled by enhancement layer decoded speech gain controller 120.
- the controlled gain is multiplied by the enhancement layer decoded speech signal input from enhancement layer decoding section 108.
- the enhancement layer decoded speech signal multiplied by the gain is output to adder 124.
- Adder 124 adds the enhancement layer decoded speech signal input from enhancement layer decoded speech amplifier 122 and the core layer decoded speech signal input from signal adjustment section 112. Thereby, the core layer decoded audio signal and the enhancement layer decoded audio signal are mixed to generate a mixed signal.
- the generated mixed signal becomes an output speech signal of speech decoding apparatus 100. That is, the combination of the enhancement layer decoded speech amplifier 122 and the adder 124 mixes the core layer decoded speech signal and the enhancement layer decoded speech signal while changing the mixing ratio of the core layer decoded speech signal and the enhancement layer decoded speech signal over time.
- a mixing unit for obtaining a mixed signal is configured.
- enhancement layer decoded speech gain controller 120 of weighted addition section 114 enhancement layer gain is attenuated when enhancement layer code key data cannot be received, and increases when enhancement layer code key data starts to be received. To be controlled.
- the enhancement layer gain is adaptively controlled in synchronization with the state of the core layer decoded speech signal or enhancement layer decoded speech signal.
- variable layer gain variable setting operation in enhancement layer decoded speech gain controller 120 will be described.
- the gain of the core layer decoded audio signal is fixed! /. Therefore, when the enhancement layer gain and the degree of change over time are changed by enhancement layer decoded audio gain controller 120, The mixing ratio of the core layer decoded audio signal and the extended layer decoded audio signal and the degree of change with time are changed.
- the enhancement layer decoded speech gain controller 120 includes the enhancement layer frame error detection result e (t) input from the enhancement layer frame error detection unit 106 and the allowable interval detection result d input from the allowable interval detection unit 110. and (t) is used to determine the enhancement layer gain g (t).
- the extended layer gain g (t) is determined by the following equations (10) to (12).
- g (t) g (t- l) + s (t) , 0.0 ⁇ g (t—l) + s (t) ⁇ l.0-(11)
- s (t) represents an increase / decrease value of the enhancement layer gain.
- the increase / decrease value s (t) is determined by the following equations (13) to (16) according to the enhancement layer frame error detection result e (t) and the allowable interval detection result d (t).
- each of the functions g (t), s (t), and d (t) described above is expressed in units of frames, but may be expressed in units of samples.
- the numerical values used in the above formulas (10) to (20) are merely examples, and other numerical values may be used.
- a function that linearly increases or decreases the enhancement layer gain is used, but any function that monotonously increases or decreases the enhancement layer gain can be used.
- the background noise signal is included in the coarrayer decoded audio signal, the core layer decoded audio signal is used to determine the audio signal to background noise signal ratio, etc., and the enhancement layer gain is increased or decreased according to the ratio. Minutes may be controlled appropriately.
- FIG. 3 is a diagram for explaining a first example of change with time of the enhancement layer gain
- FIG. 4 is a diagram for explaining a second example of change with time of the enhancement layer gain.
- FIG. 3B shows whether or not the enhancement layer encoded data can be received.
- An enhancement layer frame error is detected in the section from time T1 to time T2, the section from time T6 to time T8, and the section after time T10, and the enhancement layer frame error is detected in the other sections. Absent.
- FIG. 3C shows the permissible section detection result.
- the interval from time T3 to time T5 and the interval from time T9 to time T11 are detected tolerance intervals. In other sections, the allowable section is detected!
- FIG. 3A shows enhancement layer gain.
- the enhancement layer gain is gradually reduced. Since the enhancement layer frame error is no longer detected at time T2, the enhancement layer gain is now increased.
- the period from time T2 to time T3 is not an allowable period. Therefore, the enhancement layer gain is only slightly increased. The rise is relatively modest.
- the section from time T3 to time T5 is an allowable section. Therefore, the increase in the enhancement layer gain is large, and the increase in the enhancement layer gain is relatively steep. As a result, it is possible to prevent the band change from being perceived in the section from time T2 to time T3.
- the band change can be accelerated while maintaining a state in which the band change is hardly perceived, which can contribute to the appearance of a wide band, and the subjective quality can be improved.
- the enhancement layer gain is increased.
- the section from time T8 to time T10 since the enhancement layer frame error is not detected, the enhancement layer gain is increased.
- the section from time T8 to time T10 is not an allowable section. Therefore, the increase in the enhancement layer gain is suppressed to a relatively gradual state.
- the sections from time T9 to time T10 are allowable sections. Therefore, the increase in enhancement layer gain is relatively steep.
- FIG. 4B shows whether or not the enhancement layer encoded data can be received.
- An enhancement layer frame error has been detected in the section from time T21 to time T22, in the section from time T24 to time T27, in the section from time T28 to time T30, and in the section after time T31.
- An enhancement layer frame error is not detected.
- FIG. 4C shows the permissible section detection result. The interval from time T23 to time T26 is the detected allowable interval. In other sections, no allowable section has been detected.
- FIG. 4A shows enhancement layer gain.
- the frequency of enhancement layer frame errors being detected is higher than in the first example. Therefore, the frequency of conversion of the increase / decrease of the enhancement layer gain is high.
- the enhancement layer gain increases from time T22, decreases from time T24, increases from time T27, decreases from time T28, power increases at time T30, and decreases from time T31.
- the allowable interval is only the interval from time T23 to time T26. In other words, in the section after time T26, the degree of change in the enhancement layer gain is controlled to be small, and the change in the enhancement layer gain is suppressed to a relatively gentle state.
- the increase in the enhancement layer gain in the section from time T27 to time 28 and in the section from time T30 to time T31 is relatively moderate.
- the decrease in the expansion layer gain in the interval up to is relatively gradual. As a result, it is possible to prevent the listener from having a sense of fluctuation when the band change frequently occurs.
- the band switching is performed quickly in the permissible section, so that changes in the power of the core layer decoded speech signal and the fluctuations in the total decoded speech that can occur due to the band switching. A feeling can be eased.
- the output time of the mixed signal is changed as the degree of change of the enhancement layer gain with time is changed. For this reason, when the degree of change of the mixing ratio with time is changed, it is possible to prevent the loudness, discontinuity, and discontinuity of the band feeling from occurring.
- the core layer decoded audio signal that is, the narrowband audio signal
- the enhancement layer decoded audio signal that is, the wideband audio signal
- the band scalable speech coding scheme that can be employed is not limited to that described in the present embodiment.
- the wideband decoded speech signal is batch-decoded using both the core layer encoded data and the enhanced layer encoded data, and the core layer decoded speech signal is used when an enhancement layer frame error occurs.
- the configuration of the present embodiment can also be applied to such a system.
- overlapping processing is performed so that both the core layer decoded speech and the enhancement layer decoded speech are faded in or faded out.
- the speed of fade-in or fade-out is controlled in accordance with the above-described allowable section detection result. As a result, it is possible to obtain decoded speech in which deterioration of sound quality is suppressed.
- the configuration for detecting the interval allowing the change in the band is applied to the speech codec apparatus to which the band scalable speech codec method is applied. It may be provided.
- the speech coding apparatus suspends band switching (that is, switching to narrowband power or wideband or switching to wideband power or narrowband) in a section other than the section that allows the band change, and changes the band. Bandwidth switching is executed only in the section that allows When the speech encoded by the speech encoding device is decoded by the speech decoding device, even if the speech decoding device does not have a band switching function, the listener feels uncomfortable or fluctuates with respect to the decoded speech. The possibility of having a feeling can be reduced.
- Each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
- IC integrated circuit
- system LSI system LSI
- super LSI super LSI
- non-linear LSI depending on the difference in the power density of LSI.
- the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. It is also possible to use a field programmable gate array (FPGA) that can be programmed after LSI manufacturing and a reconfigurable processor that can reconfigure the connection and settings of circuit cells inside the LSI.
- FPGA field programmable gate array
- a reconfigurable processor that can reconfigure the connection and settings of circuit cells inside the LSI.
- a first aspect of the present invention is an audio switching device, which outputs a mixed signal in which a narrowband audio signal and a wideband audio signal are mixed when the band of the output audio signal is switched.
- a voice switching device wherein the narrowband voice signal and the wideband voice signal are mixed while the mixing ratio of the narrowband voice signal and the wideband voice signal is changed over time to obtain the mixed signal.
- setting means for variably setting the degree of change of the mixing ratio with time.
- the second aspect of the invention includes a detection unit that detects a specific section in a period in which the narrowband audio signal or the wideband audio signal is obtained.
- the specific section is detected, the degree is increased, and when the specific section is not detected, the degree is decreased.
- the period in which the degree of change in the mixing ratio with time is relatively high can be limited to a specific section in the period in which the audio signal is obtained, and the change in the mixing ratio with time can be reduced.
- the timing of changing the degree can be controlled.
- a third aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects, as the specific section, a section that allows a sudden change of a predetermined level or more in a band of the audio signal.
- a fourth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a silent section as the specific section.
- a fifth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section where the power of the narrowband audio signal is a predetermined level or less as the specific section.
- a sixth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section where the power of the wideband audio signal is a predetermined level or less as the specific section.
- the detection unit specifies the section in which the power level of the wideband audio signal with respect to the power of the narrowband audio signal is equal to or lower than a predetermined level. It detects as a section.
- An eighth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects, as the specific section, a section in which power fluctuation of the narrowband audio signal is equal to or higher than a predetermined level.
- a ninth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a rising edge of the narrowband audio signal as the specific section.
- a tenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section in which a fluctuation in power of the wideband audio signal is a predetermined level or more as the specific section.
- An eleventh aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a rising edge of the wideband audio signal.
- a twelfth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects, as the specific section, a section in which the type of background noise signal included in the narrowband audio signal changes.
- a thirteenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects, as the specific section, a section in which the type of background noise signal included in the broadband audio signal changes.
- a fourteenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section in which a change in a spectral parameter of the narrowband speech signal is a predetermined level or more as the specific section.
- a fifteenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section in which a change in spectrum parameter of the wideband audio signal is equal to or higher than a predetermined level as the specific section.
- a sixteenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section after interpolation processing is performed on the narrowband audio signal as the specific section.
- the seventeenth aspect of the present invention employs a configuration in which, in the above configuration, the detection means detects a section after the interpolation processing is performed on the wideband audio signal as the specific section.
- the mixing ratio can be changed relatively abruptly only in a section where it is difficult to perceive a change in the band of the audio signal, and in a section where a change in the band of the sound signal is easily perceived,
- the mixing ratio can be changed relatively slowly, and the possibility that the listener will feel uncomfortable or fluctuating with the audio signal can be reliably reduced.
- the setting means fixes a gain of the narrowband audio signal, while varying a degree of change V ⁇ ⁇ of the wideband audio signal with time V ⁇ . Use a configuration to set.
- the mixing ratio variable setting can be easily performed as compared with the case where the degree of change with time of the gains of both signals is variably set.
- the setting means changes the output time of the mixed signal.
- a twentieth aspect of the present invention is a communication terminal device, and this device has a configuration including the voice switching device having the above configuration.
- a twenty-first aspect of the present invention is an audio switching method, which outputs a mixed signal in which a narrowband audio signal and a wideband audio signal are mixed when the band of the output audio signal is switched.
- the degree of change in the mixing ratio that changes with time when a narrowband audio signal and a wideband audio signal are mixed is variably set, the listener can feel uncomfortable with the audio signal. The possibility of having a sense of variation can be reduced, and sound quality can be improved.
- the voice switching device and voice switching method of the present invention can be applied to switching of a band of a voice signal.
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200680002420.7A CN101107650B (zh) | 2005-01-14 | 2006-01-12 | 语音切换装置及语音切换方法 |
DE602006009215T DE602006009215D1 (de) | 2005-01-14 | 2006-01-12 | Audioumschaltungsvorrichtung und -methode |
EP06711618A EP1814106B1 (en) | 2005-01-14 | 2006-01-12 | Audio switching device and audio switching method |
JP2006552962A JP5046654B2 (ja) | 2005-01-14 | 2006-01-12 | スケーラブル復号装置及びスケーラブル復号方法 |
US11/722,904 US8010353B2 (en) | 2005-01-14 | 2006-01-12 | Audio switching device and audio switching method that vary a degree of change in mixing ratio of mixing narrow-band speech signal and wide-band speech signal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005008084 | 2005-01-14 | ||
JP2005-008084 | 2005-01-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006075663A1 true WO2006075663A1 (ja) | 2006-07-20 |
Family
ID=36677688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/300295 WO2006075663A1 (ja) | 2005-01-14 | 2006-01-12 | 音声切替装置および音声切替方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US8010353B2 (ja) |
EP (2) | EP1814106B1 (ja) |
JP (1) | JP5046654B2 (ja) |
CN (2) | CN102592604A (ja) |
DE (1) | DE602006009215D1 (ja) |
WO (1) | WO2006075663A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1968046A1 (en) * | 2007-03-09 | 2008-09-10 | Fujitsu Limited | Encoding device and encoding method |
JP2010520504A (ja) * | 2007-03-02 | 2010-06-10 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | レイヤード・コーデックのためのポストフィルタ |
US8254935B2 (en) | 2002-09-24 | 2012-08-28 | Fujitsu Limited | Packet transferring/transmitting method and mobile communication system |
JP2013512468A (ja) * | 2010-04-28 | 2013-04-11 | ▲ホア▼▲ウェイ▼技術有限公司 | 音声信号の切り替えの方法およびデバイス |
JP2013521536A (ja) * | 2010-03-09 | 2013-06-10 | フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. | オーディオ信号用の位相ボコーダに基づく帯域幅拡張方法における改善された振幅応答及び時間的整列のための装置及び方法 |
EP2993666A1 (en) | 2014-08-08 | 2016-03-09 | Fujitsu Limited | Voice switching device, voice switching method, and computer program for switching between voices |
JP2018528463A (ja) * | 2015-08-18 | 2018-09-27 | クアルコム,インコーポレイテッド | 帯域幅移行期間中の信号再使用 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101499278B (zh) * | 2008-02-01 | 2011-12-28 | 华为技术有限公司 | 音频信号切换处理方法和装置 |
CN101505288B (zh) * | 2009-02-18 | 2013-04-24 | 上海云视科技有限公司 | 一种宽带窄带双向通信中继装置 |
JP2010233207A (ja) * | 2009-03-05 | 2010-10-14 | Panasonic Corp | 高周波スイッチ回路及び半導体装置 |
JP5267257B2 (ja) * | 2009-03-23 | 2013-08-21 | 沖電気工業株式会社 | 音声ミキシング装置、方法及びプログラム、並びに、音声会議システム |
JP5589631B2 (ja) * | 2010-07-15 | 2014-09-17 | 富士通株式会社 | 音声処理装置、音声処理方法および電話装置 |
CN102142256B (zh) * | 2010-08-06 | 2012-08-01 | 华为技术有限公司 | 淡入时间的计算方法和装置 |
EP3518234B1 (en) * | 2010-11-22 | 2023-11-29 | NTT DoCoMo, Inc. | Audio encoding device and method |
KR102058980B1 (ko) * | 2012-04-10 | 2019-12-24 | 페어차일드 세미컨덕터 코포레이션 | 감소된 팝앤클릭을 갖는 오디오 장치 스위칭 |
US9827080B2 (en) | 2012-07-23 | 2017-11-28 | Shanghai Shift Electrics Co., Ltd. | Head structure of a brush appliance |
CN102743016B (zh) | 2012-07-23 | 2014-06-04 | 上海携福电器有限公司 | 刷类用品的头部结构 |
US9741350B2 (en) | 2013-02-08 | 2017-08-22 | Qualcomm Incorporated | Systems and methods of performing gain control |
US9711156B2 (en) * | 2013-02-08 | 2017-07-18 | Qualcomm Incorporated | Systems and methods of performing filtering for gain determination |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08248997A (ja) * | 1995-03-13 | 1996-09-27 | Matsushita Electric Ind Co Ltd | 音声帯域拡大装置 |
JPH0990992A (ja) * | 1995-09-27 | 1997-04-04 | Nippon Telegr & Teleph Corp <Ntt> | 広帯域音声信号復元方法 |
JPH09258787A (ja) * | 1996-03-21 | 1997-10-03 | Kokusai Electric Co Ltd | 狭帯域音声信号の周波数帯域拡張回路 |
JP2000206996A (ja) * | 1999-01-13 | 2000-07-28 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2000261529A (ja) * | 1999-03-10 | 2000-09-22 | Nippon Telegr & Teleph Corp <Ntt> | 通話装置 |
JP2003323199A (ja) * | 2002-04-26 | 2003-11-14 | Matsushita Electric Ind Co Ltd | 符号化装置、復号化装置及び符号化方法、復号化方法 |
WO2003104924A2 (en) * | 2002-06-05 | 2003-12-18 | Sonic Focus, Inc. | Acoustical virtual reality engine and advanced techniques for enhancing delivered sound |
JP2004101720A (ja) * | 2002-09-06 | 2004-04-02 | Matsushita Electric Ind Co Ltd | 音響符号化装置及び音響符号化方法 |
JP2004272052A (ja) * | 2003-03-11 | 2004-09-30 | Fujitsu Ltd | 音声区間検出装置 |
Family Cites Families (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5432859A (en) * | 1993-02-23 | 1995-07-11 | Novatel Communications Ltd. | Noise-reduction system |
US5699479A (en) | 1995-02-06 | 1997-12-16 | Lucent Technologies Inc. | Tonality for perceptual audio compression based on loudness uncertainty |
EP0732687B2 (en) | 1995-03-13 | 2005-10-12 | Matsushita Electric Industrial Co., Ltd. | Apparatus for expanding speech bandwidth |
US6449519B1 (en) * | 1997-10-22 | 2002-09-10 | Victor Company Of Japan, Limited | Audio information processing method, audio information processing apparatus, and method of recording audio information on recording medium |
DE19804581C2 (de) * | 1998-02-05 | 2000-08-17 | Siemens Ag | Verfahren und Funk-Kommunikationssystem zur Übertragung von Sprachinformation |
CA2252170A1 (en) * | 1998-10-27 | 2000-04-27 | Bruno Bessette | A method and device for high quality coding of wideband speech and audio signals |
JP2000206995A (ja) * | 1999-01-11 | 2000-07-28 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
US6377915B1 (en) * | 1999-03-17 | 2002-04-23 | Yrp Advanced Mobile Communication Systems Research Laboratories Co., Ltd. | Speech decoding using mix ratio table |
JP2000305599A (ja) * | 1999-04-22 | 2000-11-02 | Sony Corp | 音声合成装置及び方法、電話装置並びにプログラム提供媒体 |
JP2000352999A (ja) | 1999-06-11 | 2000-12-19 | Nec Corp | 音声切替装置 |
US6978236B1 (en) * | 1999-10-01 | 2005-12-20 | Coding Technologies Ab | Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching |
US6778966B2 (en) * | 1999-11-29 | 2004-08-17 | Syfx | Segmented mapping converter system and method |
FI119576B (fi) * | 2000-03-07 | 2008-12-31 | Nokia Corp | Puheenkäsittelylaite ja menetelmä puheen käsittelemiseksi, sekä digitaalinen radiopuhelin |
FI115329B (fi) * | 2000-05-08 | 2005-04-15 | Nokia Corp | Menetelmä ja järjestely lähdesignaalin kaistanleveyden vaihtamiseksi tietoliikenneyhteydessä, jossa on valmiudet useisiin kaistanleveyksiin |
US6691085B1 (en) * | 2000-10-18 | 2004-02-10 | Nokia Mobile Phones Ltd. | Method and system for estimating artificial high band signal in speech codec using voice activity information |
US20020128839A1 (en) * | 2001-01-12 | 2002-09-12 | Ulf Lindgren | Speech bandwidth extension |
KR100830857B1 (ko) * | 2001-01-19 | 2008-05-22 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 오디오 전송 시스템, 오디오 수신기, 전송 방법, 수신 방법 및 음성 디코더 |
US20030028386A1 (en) * | 2001-04-02 | 2003-02-06 | Zinser Richard L. | Compressed domain universal transcoder |
CN1244904C (zh) * | 2001-05-08 | 2006-03-08 | 皇家菲利浦电子有限公司 | 声频信号编码方法和设备 |
US6895375B2 (en) * | 2001-10-04 | 2005-05-17 | At&T Corp. | System for bandwidth extension of Narrow-band speech |
US6988066B2 (en) * | 2001-10-04 | 2006-01-17 | At&T Corp. | Method of bandwidth extension for narrow-band speech |
MXPA03005133A (es) * | 2001-11-14 | 2004-04-02 | Matsushita Electric Ind Co Ltd | Dispositivo de codificacion, dispositivo de decodificacion y sistema de los mismos. |
WO2003091989A1 (en) | 2002-04-26 | 2003-11-06 | Matsushita Electric Industrial Co., Ltd. | Coding device, decoding device, coding method, and decoding method |
US7283956B2 (en) * | 2002-09-18 | 2007-10-16 | Motorola, Inc. | Noise suppression |
EP1543307B1 (en) * | 2002-09-19 | 2006-02-22 | Matsushita Electric Industrial Co., Ltd. | Audio decoding apparatus and method |
US7577259B2 (en) * | 2003-05-20 | 2009-08-18 | Panasonic Corporation | Method and apparatus for extending band of audio signal using higher harmonic wave generator |
JP4436075B2 (ja) | 2003-06-19 | 2010-03-24 | 三菱農機株式会社 | スプロケット |
US20050004793A1 (en) * | 2003-07-03 | 2005-01-06 | Pasi Ojala | Signal adaptation for higher band coding in a codec utilizing band split coding |
DE602004004950T2 (de) * | 2003-07-09 | 2007-10-31 | Samsung Electronics Co., Ltd., Suwon | Vorrichtung und Verfahren zum bitraten-skalierbaren Sprachkodieren und -dekodieren |
KR100651712B1 (ko) * | 2003-07-10 | 2006-11-30 | 학교법인연세대학교 | 광대역 음성 부호화기 및 그 방법과 광대역 음성 복호화기및 그 방법 |
US7461003B1 (en) * | 2003-10-22 | 2008-12-02 | Tellabs Operations, Inc. | Methods and apparatus for improving the quality of speech signals |
US7613607B2 (en) * | 2003-12-18 | 2009-11-03 | Nokia Corporation | Audio enhancement in coded domain |
JP4733939B2 (ja) * | 2004-01-08 | 2011-07-27 | パナソニック株式会社 | 信号復号化装置及び信号復号化方法 |
-
2006
- 2006-01-12 EP EP06711618A patent/EP1814106B1/en not_active Not-in-force
- 2006-01-12 WO PCT/JP2006/300295 patent/WO2006075663A1/ja active Application Filing
- 2006-01-12 JP JP2006552962A patent/JP5046654B2/ja not_active Expired - Fee Related
- 2006-01-12 DE DE602006009215T patent/DE602006009215D1/de active Active
- 2006-01-12 EP EP09165516A patent/EP2107557A3/en not_active Withdrawn
- 2006-01-12 CN CN2012100237319A patent/CN102592604A/zh active Pending
- 2006-01-12 CN CN200680002420.7A patent/CN101107650B/zh not_active Expired - Fee Related
- 2006-01-12 US US11/722,904 patent/US8010353B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08248997A (ja) * | 1995-03-13 | 1996-09-27 | Matsushita Electric Ind Co Ltd | 音声帯域拡大装置 |
JPH0990992A (ja) * | 1995-09-27 | 1997-04-04 | Nippon Telegr & Teleph Corp <Ntt> | 広帯域音声信号復元方法 |
JPH09258787A (ja) * | 1996-03-21 | 1997-10-03 | Kokusai Electric Co Ltd | 狭帯域音声信号の周波数帯域拡張回路 |
JP2000206996A (ja) * | 1999-01-13 | 2000-07-28 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2000261529A (ja) * | 1999-03-10 | 2000-09-22 | Nippon Telegr & Teleph Corp <Ntt> | 通話装置 |
JP2003323199A (ja) * | 2002-04-26 | 2003-11-14 | Matsushita Electric Ind Co Ltd | 符号化装置、復号化装置及び符号化方法、復号化方法 |
WO2003104924A2 (en) * | 2002-06-05 | 2003-12-18 | Sonic Focus, Inc. | Acoustical virtual reality engine and advanced techniques for enhancing delivered sound |
JP2004101720A (ja) * | 2002-09-06 | 2004-04-02 | Matsushita Electric Ind Co Ltd | 音響符号化装置及び音響符号化方法 |
JP2004272052A (ja) * | 2003-03-11 | 2004-09-30 | Fujitsu Ltd | 音声区間検出装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1814106A4 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8254935B2 (en) | 2002-09-24 | 2012-08-28 | Fujitsu Limited | Packet transferring/transmitting method and mobile communication system |
JP2010520504A (ja) * | 2007-03-02 | 2010-06-10 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | レイヤード・コーデックのためのポストフィルタ |
EP1968046A1 (en) * | 2007-03-09 | 2008-09-10 | Fujitsu Limited | Encoding device and encoding method |
US8073050B2 (en) | 2007-03-09 | 2011-12-06 | Fujitsu Limited | Encoding device and encoding method |
JP2013521536A (ja) * | 2010-03-09 | 2013-06-10 | フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. | オーディオ信号用の位相ボコーダに基づく帯域幅拡張方法における改善された振幅応答及び時間的整列のための装置及び方法 |
JP2013512468A (ja) * | 2010-04-28 | 2013-04-11 | ▲ホア▼▲ウェイ▼技術有限公司 | 音声信号の切り替えの方法およびデバイス |
JP2015045888A (ja) * | 2010-04-28 | 2015-03-12 | ▲ホア▼▲ウェイ▼技術有限公司 | 音声信号の切り替えの方法およびデバイス |
JP2017033015A (ja) * | 2010-04-28 | 2017-02-09 | ▲ホア▼▲ウェイ▼技術有限公司Huawei Technologies Co.,Ltd. | 音声信号の切り替えの方法およびデバイス |
EP2993666A1 (en) | 2014-08-08 | 2016-03-09 | Fujitsu Limited | Voice switching device, voice switching method, and computer program for switching between voices |
US9679577B2 (en) | 2014-08-08 | 2017-06-13 | Fujitsu Limited | Voice switching device, voice switching method, and non-transitory computer-readable recording medium having stored therein a program for switching between voices |
JP2018528463A (ja) * | 2015-08-18 | 2018-09-27 | クアルコム,インコーポレイテッド | 帯域幅移行期間中の信号再使用 |
Also Published As
Publication number | Publication date |
---|---|
EP1814106A1 (en) | 2007-08-01 |
US8010353B2 (en) | 2011-08-30 |
CN101107650B (zh) | 2012-03-28 |
DE602006009215D1 (de) | 2009-10-29 |
US20100036656A1 (en) | 2010-02-11 |
EP2107557A3 (en) | 2010-08-25 |
JPWO2006075663A1 (ja) | 2008-06-12 |
CN101107650A (zh) | 2008-01-16 |
EP1814106B1 (en) | 2009-09-16 |
CN102592604A (zh) | 2012-07-18 |
JP5046654B2 (ja) | 2012-10-10 |
EP1814106A4 (en) | 2007-11-28 |
EP2107557A2 (en) | 2009-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006075663A1 (ja) | 音声切替装置および音声切替方法 | |
US8160868B2 (en) | Scalable decoder and scalable decoding method | |
JP5100380B2 (ja) | スケーラブル復号装置および消失データ補間方法 | |
JP6445460B2 (ja) | 新しいメディア装置に関する埋め込み音量メタデータを有する、および、有しないメディアの正規化音声再生のための方法と装置 | |
JP4698593B2 (ja) | 音声復号化装置および音声復号化方法 | |
CN105103222B (zh) | 用于响度和动态范围控制的元数据 | |
JP5129888B2 (ja) | トランスコード方法、トランスコーディングシステム及びセットトップボックス | |
EP1941500B1 (en) | Encoder-assisted frame loss concealment techniques for audio coding | |
US7050972B2 (en) | Enhancing the performance of coding systems that use high frequency reconstruction methods | |
RU2387025C2 (ru) | Способ и устройство для векторного квантования спектрального представления огибающей | |
US8571039B2 (en) | Encoding and decoding speech signals | |
CN105493182B (zh) | 混合波形编码和参数编码语音增强 | |
US9251798B2 (en) | Adaptive audio signal coding | |
US20080071549A1 (en) | Audio Signal Decoding Device and Audio Signal Encoding Device | |
US20030091194A1 (en) | Method and device for processing a stereo audio signal | |
WO2012026092A1 (ja) | 音声信号処理装置及び音声信号処理方法 | |
JP2008107415A (ja) | 符号化装置 | |
US20070118368A1 (en) | Audio encoding apparatus and audio encoding method | |
EP2806423A1 (en) | Speech decoding device and speech decoding method | |
JPWO2008132826A1 (ja) | ステレオ音声符号化装置およびステレオ音声符号化方法 | |
WO2024166647A1 (ja) | 符号化装置、及び、符号化方法 | |
WO2017094203A1 (ja) | 音声信号復号装置及び音声信号復号方法 | |
JP2024529556A (ja) | 音コーデックにおける出力合成歪みの制限を行うための方法およびデバイス | |
JPH03116197A (ja) | 音声復号化装置 | |
JP2005301002A (ja) | 音声符号化情報処理装置および音声符号化情報処理プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006711618 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11722904 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020/MUMNP/2007 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006552962 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200680002420.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2006711618 Country of ref document: EP |