WO2004097798A1 - 音声復号化装置、音声復号化方法、プログラム、記録媒体 - Google Patents
音声復号化装置、音声復号化方法、プログラム、記録媒体 Download PDFInfo
- Publication number
- WO2004097798A1 WO2004097798A1 PCT/JP2003/005582 JP0305582W WO2004097798A1 WO 2004097798 A1 WO2004097798 A1 WO 2004097798A1 JP 0305582 W JP0305582 W JP 0305582W WO 2004097798 A1 WO2004097798 A1 WO 2004097798A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- formant
- speech
- vocal tract
- sound source
- source signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 91
- 230000001755 vocal effect Effects 0.000 claims abstract description 118
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 13
- 230000003321 amplification Effects 0.000 claims description 79
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 79
- 238000001228 spectrum Methods 0.000 claims description 71
- 239000013598 vector Substances 0.000 claims description 53
- 238000003786 synthesis reaction Methods 0.000 claims description 47
- 230000015572 biosynthetic process Effects 0.000 claims description 41
- 238000012545 processing Methods 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 9
- 230000005236 sound signal Effects 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 7
- 230000002708 enhancing effect Effects 0.000 claims description 7
- 230000005284 excitation Effects 0.000 claims description 5
- 238000005311 autocorrelation function Methods 0.000 claims description 4
- 238000005314 correlation function Methods 0.000 claims description 2
- 239000002131 composite material Substances 0.000 claims 1
- 238000001914 filtration Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 30
- 238000000926 separation method Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 8
- 101150060088 ampp gene Proteins 0.000 description 7
- 230000006835 compression Effects 0.000 description 7
- 238000007906 compression Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 230000006866 deterioration Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000001965 increasing effect Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 230000002238 attenuated effect Effects 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
Definitions
- Speech decoding device Speech decoding method, program, recording medium
- the present invention relates to a communication device such as a mobile phone which performs communication by performing a voice encoding process, and particularly relates to a voice decoding device for improving the clarity of voice and making it easier to hear a received voice. Method and so on. Background art
- a Code Excited Linear Prediction (CELP) method is known as a coding method having excellent voice quality at a low bit rate.
- CELP Code Excited Linear Prediction
- Many voice coding standards such as the ITU-T G.729 system and the 3GPP AMR system, adopt a CELP-based coding system.
- the voice compression method used in, for example, voice over IP (VoIP) and TV conference systems, etc. is mainly based on the CEL algorithm.
- CELP is a speech coding method announced by MR Schroder and BS Atal in 1985, which extracts parameters from input speech based on a human speech generation model, encodes the parameters, and transmits them. As a result, highly efficient information compression is realized.
- ' Figure 16 shows a model of speech generation.
- the sound source signal generated by the sound source (vocal cord) 110 is input to the articulatory system (vocal tract) 111, and after the vocal tract characteristics are added in the vocal tract 111, the final It is output as a speech waveform from the lips 1 and 2 (see Non-Patent Document 1).
- speech consists of sound source characteristics and vocal tract characteristics.
- FIG. 17 shows a processing flow of the CELP encoder / decoder.
- a CELP encoder and a CELP decoder are mounted on a mobile phone or the like, and the CELP decoder 120 of the mobile phone on the receiving side is changed from the CELP encoder 120 of the mobile phone on the transmitting side.
- a voice signal (voice code) is transmitted via a transmission path (not shown) (wireless line, mobile phone network, etc.).
- the parameter extraction unit 121 analyzes the input speech based on the above-mentioned speech generation model, and converts the input speech into a linear prediction representing the vocal tract characteristics. Coefficients (Linear Predictor Coefficients: LPC coefficients) and sound source signals are separated.
- the parameter extracting unit 122 further includes, from the excitation signal, an adaptive codebook (ACB) vector representing a periodic component of the excitation signal, a noise code ⁇ 1 (Stochastic CodeBook: SCB) beta code representing an aperiodic component, And extract the gain of both vectors.
- ACB adaptive codebook
- SCB Stochastic CodeBook
- the encoding unit 122 encodes the LPC coefficient, the ACB vector, the SCB vector, and the gain to generate an LPC code, an ACB code, an SCB code, and a gain code. 23 multiplexes these into a speech code and transmits it to the mobile phone on the receiving side.
- the code separation unit 1 31 1 S the transmitted speech code, LPC code, AC B code , An SCB code, and a gain code
- the decoding unit 132 decodes them into an LPC coefficient, an ACB vector, an SCB vector, and a gain.
- the speech synthesis unit 133 synthesizes speech from each of the decoded parameters.
- CELP encoder and CELP decoder will be described in more detail.
- FIG. 18 shows a block diagram of the parameter extraction unit 121 of the CELP encoder.
- LPC analysis section 141 obtains LPC coefficients from input speech by a known linear prediction analysis (LPC analysis) technique. These LPC coefficients are filter coefficients when the vocal tract characteristics are approximated by an all-pole linear filter.
- the sound source signal is extracted.
- the analysis by synthesis (AbS) technique is used.
- CELP sound is reproduced by inputting a sound source signal to an LPC synthesis filter 142 composed of LPC coefficients. Therefore, a sound source candidate composed of multiple ACB vectors stored in adaptive codebook 144, multiple SCB vectors stored in noise codebook 144, and a combination of the gains of both vectors From this, the error power evaluator 144 searches for a codebook combination that minimizes the error with the input speech when the speech is synthesized by the LPC synthesis filter 142, and the ACB vector, SCB vector, Extract AC B gain and SCB gain.
- Each parameter extracted by the above operation is encoded by the encoding unit 122 as described above, and an LPC code, an ACB code, an SCB code, and a gain code are obtained.
- Each obtained code is encoded by the code multiplexing unit 1 2 3 It is multiplexed and transmitted as a speech code to the decoder side.
- FIG. 19 shows a block diagram of the CELP decoder 130.
- the code separation unit 1331 separates each parameter from the transmitted speech code, and converts the LPC code, ACB code, SCB code, and gain code. obtain.
- the LPC coefficient decoding unit 151, the ACB vector decoding unit 152, the SCB vector decoding unit 153, and the gain decoding unit The code, ACB code, SCB code, and gain code are decoded into LPC coefficient, ACB vector, SCB vector, and gain (ACB gain, SCB gain), respectively.
- the speech synthesis unit 133 generates a sound source signal from the input AC B vector, SCB vector, and gain (ACB gain, SCB gain) according to the illustrated configuration, and decodes the sound source signal as described above. It is input to an LPC synthesis filter 155 composed of LPC coefficients, and the speech is decoded and output by the LPC synthesis filter 155.
- a TV conference system used indoors usually includes noise generated from electric appliances such as an air conditioner, and background noise such as speech of others around.
- noise generated from electric appliances such as an air conditioner
- background noise such as speech of others around.
- several techniques are known for enhancing the intelligibility of the received voice by enhancing the formants of the voice spectrum with respect to the received voice and making the received voice easier to hear.
- the formant will be briefly described.
- FIG. 20 shows an example of a voice frequency spectrum.
- a speech frequency spectrum has a plurality of peaks (portions having local maxima), and these are called formants.
- Fig. 20 shows an example in which there are three formants (peaks) in the spectrum, which are referred to as the first formant, the second formant, and the third formant in order of frequency.
- the frequencies at which these maxima occur that is, the frequencies fp (l), fp (2), and fp (3) of each formant, are called formant frequencies.
- the amplitude (power) of a voice spectrum tends to decrease as the frequency increases.
- speech intelligibility is closely related to formants, and it is known that emphasis on higher-order (eg, second and third) formants improves speech intelligibility. You.
- FIG. 21 shows an example of honoremanto emphasis of a speech spectrum / record.
- the waveform shown by the solid line in FIG. 21 (a) and the dotted line in FIG. 21 (b) represent the speech spectrum before emphasis. Also, in FIG. 21 (b), the waveform indicated by the solid line in the figure represents the speech style after emphasis. The straight line in the figure represents the slope of the waveform.
- Fig. 21 (b) by emphasizing the speech spectrum so as to increase the amplitude of the higher-order formants, the slope of the entire spectrum is flattened. It is known that clarity can be improved.
- Patent Document 1 As a technique of applying formant enhancement to coded speech, For example, a technique described in Patent Document 1 is known.
- FIG. 22 shows a basic configuration diagram of the invention described in Patent Document 1.
- Patent Document 1 relates to a method using a band division filter.
- the spectrum of the input speech is obtained by the spectrum estimating unit 160, and the convex zone (mountain) is obtained from the obtained spectrum.
- the convex / concave band determining unit 16 1 that determines the HQ band (valley) determines the convex band and the concave band, and calculates the amplification factor (or attenuation rate) for the convex band and the concave band.
- the filter embodying unit 162 gives the filter unit 163 a coefficient for realizing the above amplification factor (or attenuation factor), and the input sound is input to the filter unit 163, thereby enhancing the spectrum emphasis. Realize.
- the method using a band division filter does not guarantee that the audio formant always enters the divided frequency band, so that components other than the formant are emphasized, and consequently the clarity may be degraded There was a problem.
- Patent Document 1 is a method using a band division filter, and realizes voice enhancement by individually amplifying and attenuating peaks and valleys of the voice spectrum. I have.
- the speech decoding unit uses an ABC vector index, SCB vector index, and gain index.
- a synthesis filter composed of LPC coefficients decoded by the LPC coefficient index to generate a synthesized signal.
- the synthesized signal and the LPC coefficient are input to the spectrum emphasis section. Then, the above-mentioned spectral emphasis is realized.
- Patent Document 2 The invention described in Patent Document 2 is an audio signal processing device applied to a post-filter (boost filter) of a speech synthesis system of a multi-band excitation coding (MBE) speech decoding device, and includes a parameter in a frequency domain. By directly manipulating the amplitude value of each band, the high frequency formant of the frequency spectrum is emphasized.
- a band including a formant is estimated from an average amplitude of a plurality of frequency bands divided by a pitch frequency, and a band including the formant is estimated. It is a way to emphasize only.
- Patent Document 3 discloses a technique of “analysis by synthesis” with a reference signal that is a signal in which the noise gain is suppressed, that is, a voice that performs an encoding process using the A—b_S method.
- An encoding device comprising: means for enhancing a formant of the reference signal; means for dividing an audio part and a noise part of the signal; and means for suppressing the level of the noise part.
- a linear prediction coefficient is extracted for each frame from the input signal, and the formant enhancement is performed based on the linear prediction coefficient.
- the invention described in Patent Document 4 is an invention related to a sound source search (multi-pulse search) for multi-pulse speech coding.
- a sound source search multi-pulse search
- the sound is emphasized in the form of a line spectrum, and then the sound source search is performed.
- Patent Document 1 Unexamined Japanese Patent Publication No.
- Patent Document 2
- Patent Document 1 the method described in Patent Document 1 has the following problems.
- Patent Document 1 in accordance with the case where the CELP method is used as in the seventh embodiment shown in FIG. 19, the synthesized signal and the LPC coefficient are input to the spectrum emphasizing unit.
- the sound source signal and the vocal tract characteristics are completely different characteristics as can be seen from the above-described speech generation model.
- the synthesized speech is emphasized by the emphasis filter obtained from the vocal tract characteristics. For this reason, the distortion of the sound source signal included in the synthesized speech is increased, and side effects such as an increase in noise and deterioration in clarity may occur.
- the invention described in Patent Document 2 is an invention for the purpose of improving the reproduced voice quality of the MBE vocoder.
- the voice compression method used in mobile phone systems, VoIP, video conferencing systems, etc. today is based on the CELP algorithm using linear prediction. What is done is the mainstream. Therefore, when the method described in Patent Document 2 is applied to a system using a compression method based on CELP, it is possible to extract coding parameters for the MBE vocoder from speech that has been compressed and decompressed and has degraded speech quality. However, there is a problem that voice quality may be further deteriorated.
- Patent Document 3 a simple IIR filter using LPC coefficients is used to enhance the formant.
- this method may misemphasize the formant. For example, it is known from the Proceedings of the Acoustical Society of Japan, March 2000, pp. 249-250, etc.).
- the invention of Patent Document 3 relates to a speech encoding device in the first place, and does not relate to a speech decoding device.
- the invention described in Patent Document 4 aims to increase the compression efficiency by performing a sound source search.Specifically, when searching for sound source information by approximating the sound source information with multi-pulses, the input voice is directly input.
- An object of the present invention is to suppress side effects such as deterioration of sound quality due to formant emphasis and an increase in noise sensation in a device (a mobile phone or the like) using an analysis-synthesis-based speech coding method, and further enhance the clarity of the restored speech.
- Another object of the present invention is to provide a voice decoding device, a voice decoding method, a program, a recording medium, and the like for making it easy to hear a received voice. Disclosure of the invention
- a speech decoding device is a speech decoding device provided in a communication device using an analysis-synthesis-based speech encoding method, wherein the speech decoding device separates a received speech code and restores a vocal tract characteristic and a sound source signal.
- Separation / decoding means vocal tract characteristic correcting means for correcting the vocal tract characteristics, a modified vocal tract characteristic corrected by the vocal tract characteristic correcting means, and a sound source signal obtained from the speech code are synthesized.
- Signal synthesizing means for outputting an audio signal.
- the correction of the vocal tract characteristics refers to, for example, performing formant emphasis processing on the vocal tract characteristics.
- the speech decoding apparatus when a communication apparatus such as a mobile phone that uses an analysis-synthesis speech coding scheme receives a speech code transmitted after being subjected to speech coding processing, the speech coding apparatus receives the speech code.
- the vocal tract characteristics and the sound source signal are restored from the speech code, and the restored vocal tract characteristics are subjected to formant emphasis processing and synthesized with the sound source signal. .
- the vocal tract characteristic is a linear prediction vector calculated from a first linear prediction coefficient decoded from the speech code, and the vocal tract characteristic correcting unit converts the linear prediction vector into a linear prediction vector.
- the signal synthesis means comprises a modified linear prediction coefficient calculating means for obtaining a second linear prediction coefficient corresponding to the formant-enhanced linear prediction vector, and a second linear prediction coefficient And a synthesis filter configured.
- the sound source signal is input, and the audio signal is generated and output.
- the vocal tract characteristic correcting unit includes a formant enhancement for the vocal tract characteristic.
- An anti-formant attenuation process is performed to generate a vocal tract characteristic in which an amplitude difference between the formant and the anti-formant is emphasized, and the signal synthesizing unit synthesizes with the sound source signal based on the emphasized vocal tract characteristic. May be performed.
- the formants are further emphasized relatively, and the clarity of the voice can be further increased.
- the anti-formant by attenuating the anti-formant, it is possible to suppress a sense of noise that is unpleasant in the decoded speech after the speech encoding processing.
- noise called quantization noise is likely to occur in the anti-formant in speech coded / decoded by a speech coding method such as CELP, which is a type of speech coding method for analysis and synthesis.
- CELP speech coding method for analysis and synthesis.
- the anti-formant is attenuated by the above configuration, the above-mentioned quantization noise is reduced, and it is possible to provide an easy-to-hear voice with a small noise feeling.
- the speech decoding apparatus further includes a pitch emphasis unit that performs pitch emphasis on the sound source signal, wherein the signal synthesizing unit includes the pitch-enhanced sound source signal and the corrected sound source signal.
- the vocal tract characteristics and may be combined to generate and output an audio signal.
- the input speech code is separated to restore the sound source characteristics (residual signal) and the vocal tract characteristics, and these are separately subjected to emphasis processing suitable for each characteristic.
- the program is read out from a computer-readable storage medium storing a program for causing a computer to perform the same control as the function performed by each configuration of the present invention and executed by the computer.
- the above-mentioned problem can be solved.
- FIG. 1 is a diagram showing a schematic configuration of a speech decoding device of the present example.
- FIG. 2 is a basic configuration diagram of the speech decoding device of the present example.
- FIG. 3 is a configuration block diagram of the speech decoding device 40 according to the first embodiment.
- FIG. 4 is a processing flowchart of the amplification factor calculation unit.
- FIG. 5 is a diagram showing how to obtain the formant amplification factor.
- FIG. 6 is a diagram illustrating an example of an interpolation curve.
- FIG. 7 is a configuration block diagram of a speech decoding device according to the second embodiment.
- FIG. 8 is a processing flowchart of the amplification factor calculation unit.
- FIG. 9 is a diagram showing how to determine the amplification factor of antiformant.
- FIG. 10 is a configuration block diagram of a speech decoding apparatus according to the third embodiment.
- FIG. 11 is a hardware configuration diagram of a mobile phone to which one of the applications of the audio decoding apparatus is applied.
- FIG. 12 is a hardware configuration diagram of a computer to which the speech decoding device is applied.
- FIG. 13 is a diagram showing an example of a recording medium on which a program is recorded, and download of the program.
- FIG. 14 is a diagram showing a basic configuration of a speech enhancement device proposed in the prior application.
- FIG. 15 shows a configuration example in which the speech enhancement device of the prior application is applied to a mobile phone or the like equipped with a CELP decoder.
- FIG. 16 is a diagram showing a speech generation model.
- FIG. 17 is a diagram showing a process flow of the CELP encoder / decoder.
- Figure 18 is a block diagram of the configuration of the parameter extraction unit of the CELP encoder.
- FIG. 19 is a block diagram of the configuration of the CELP decoder.
- FIG. 20 is a diagram showing an example of a voice frequency spectrum.
- FIG. 21 is a diagram showing an example of formant enhancement of a speech spectrum.
- FIG. 22 is a diagram showing a basic configuration diagram of the invention described in Patent Document 1. BEST MODE FOR CARRYING OUT THE INVENTION
- FIG. 1 shows a schematic configuration of the speech decoding apparatus according to this example.
- the speech decoding device 10 has a schematic configuration.
- a code separation / decoding unit 11 A code separation / decoding unit 11, a vocal tract characteristic correction unit 12, and a signal synthesis unit 13.
- the code separation / decoding unit 11 restores the vocal tract characteristics s Pl and the sound source signal from the speech code.
- the CELP encoder (not shown) of the transmitting mobile phone or the like converts the input speech into a linear prediction coefficient (LP). C code) and a sound source signal (residual signal), encode each of them, multiplex them, and transmit them as a speech code to the decoder of the mobile phone etc. on the receiving side. .
- the decoder that has received the speech code first decodes the vocal tract characteristics s Pl and the sound source signal from the speech code by the code separation / decoding unit 11. Then, the vocal tract characteristic correction unit 12 corrects the vocal tract characteristic s P l and outputs the corrected vocal tract characteristic sp 2 . This, for example, by performing Holman preparative enhancement processing directly to the vocal tract characteristic s P l, produces ⁇ outputs the emphasized vocal tract characteristics sp 2.
- the signal synthesis unit 13 synthesizes the corrected vocal tract characteristics sp 2 and the sound source signal r x to generate and output an output voice s. For example, it generates and outputs a formant-enhanced output sound s.
- the restored sound source signal (the output of the adder) is passed through a synthesis filter composed of decoded LPC coefficients to generate a synthesized signal ( (Synthesized speech), and the synthesized speech is emphasized by the emphasis filter obtained from the vocal tract characteristics. For this reason, the distortion of the sound source signal included in the synthesized speech becomes large, which may cause problems such as an increase in noise and deterioration in clarity.
- the speech decoding apparatus 10 of this example is almost the same up to the point where the sound source signal and the LPC coefficient are restored, but without generating a synthesized signal (synthesized speech), subjected to direct formant strong tone processing on s P l, combines the enhanced vocal tract characteristics sp 2 and the sound source signal (residual signal). Therefore, the above problem is solved, and it is possible to decode the speech into a speech without side effects such as sound quality deterioration due to enhancement and an increase in noise.
- FIG. 2 shows a basic configuration diagram of the speech decoding apparatus of the present example.
- CELP Code Excited Linear Prediction
- the illustrated speech decoding device 20 includes a code separation unit 21, an ACB vector decoding unit 22, a SCB vector decoding unit 23, a gain decoding unit 24, a sound source signal generation unit 25, and an LPC coefficient decoding unit. 26, an LPC spectrum calculation section 27, a spectrum emphasis section 28, a modified LPC coefficient calculation section 29, and a synthesis filter 30.
- the code separation unit 21, LPC coefficient decoding unit 26, ACB vector decoding unit 22, SCB vector decoding unit 23, and gain decoding unit 24 are the detailed configuration of the code separation Z decoding unit 11.
- the spectrum emphasis unit 28 is an example of the vocal tract characteristic correction unit 12.
- the modified LPC coefficient calculation unit 29 and the synthesis filter 30 correspond to an example of the detailed configuration of the signal synthesis unit 13. ,
- the code separation unit 21 separates the voice code transmitted from the transmission side after being multiplexed from the transmission side into an LPC code, an ACB code, an SCB code, and a gain code, and outputs them.
- the ACB vector decoding unit 22, the SCB vector decoding unit 23, and the gain decoding unit 24 are respectively based on the ACB code, SCB code, and gain code output from the code separation unit 21. Decode the AC B vector, SCB vector, and AC B gain and SCB gain.
- the sound source signal generation unit 25 Based on the ACB vector, SCB vector, ACB gain and 'SCB gain, the sound source signal generation unit 25 generates a sound source signal (residual signal) r (n ), (0 ⁇ n ⁇ N).
- N is the frame length of the encoding method.
- the LPC coefficient decoding unit 26 decodes the LPC coefficient (i), (l ⁇ i ⁇ NP!) From the LPC code output from the code separation unit 21 and calculates the LPC spectrum. Output to part 27. Where NPi is the order of the PC coefficient.
- LPC spectrum calculation unit 27 the input LPC coefficient ⁇ ⁇ (i) power, LPC spectrum s Pl (1), which is a parameter representing vocal tract characteristics, (0 1 ⁇ N F ).
- N F is the spectrum number, and N ⁇ N F.
- LPC spectrum calculation section 27 outputs the obtained LPC spectrum s Pl (1) to spectrum enhancement section 28.
- the spectrum emphasizing unit 28 finds the enhanced LPC spectrum sp 2 (1) based on the LPC spectrum s Pl (1), and converts the obtained sp 2 (1) to the modified LPC coefficient calculating unit 29. Output.
- the modified LPC coefficient calculating section 29 calculates a modified LPC coefficient a 2 (i), (1 ⁇ i ⁇ NP 2 ) based on the emphasized LPC spectrum sp 2 (1).
- NP 2 is the order of the modified LPC coefficient.
- Fixed PC coefficient calculator 2 9 the corrected LPC coefficients alpha 2 was determined, and outputs the synthesis filter 3 0. Then, the sound source signal r (n) is input to the synthesis filter 30 composed of the obtained modified LPC coefficients H 2 (i), and the output voice s (n), (0 ⁇ n ⁇ N) is determined.
- the sound source signal r (n) is input to the synthesis filter 30 composed of the obtained modified LPC coefficients H 2 (i), and the output voice s (n), (0 ⁇ n ⁇ N) is determined.
- the vocal tract characteristics calculated from the speech code are directly subjected to formant enhancement to enhance the vocal tract characteristics, and then synthesized with the sound source signal.
- the vocal tract characteristics calculated from the speech code the LPC spectrum obtained from the LPC coefficients
- FIG. 3 is a configuration block diagram of the speech decoding device 40 according to the first embodiment.
- the code separation unit 21 separates the speech code transmitted from the transmission side into an LPC code, an ACB code, an SCB code, and a gain code.
- the ACB vector decoding unit 22 decodes the ACB vector p (n), (0 ⁇ nN) from the ACB code.
- N is the frame length of the coding scheme.
- the SCB vector decoding unit 22 decodes the SCB vector c (n), (O ⁇ n N) from the SCB code.
- the gain decoding section 24 decodes the ACB gain g p and the SCB gain g c from the gain code.
- the sound source signal generation section 25 calculates the decoded ACB vector p (n), SCB vector c (n), ACB gain g p , and SCB gain g c force, according to the following equation (1). , Sound source signal! ⁇ Find (n), (0 ⁇ n ⁇ N).
- the LPC spectrum calculation unit 27 obtains an LPC spectrum sPi (1) as a vocal tract characteristic by Fourier transforming the LPC coefficient (i) by the following equation (2).
- N F is the number of data points spectrum.
- P 1 is the order of the LPC filter.
- the sampling frequency is F s
- the frequency resolution of the LPC space data Torr s Pl (1) becomes F s / N F.
- Variable 1 is the statistic of the statue and represents the discrete frequency. 1 the frequency (Hz) Convert Then int [1 ⁇ F s / N F] to (H z). Note that int [x] means that the variable x is converted to an integer.
- the LPC spectrum s Pl (1) obtained by the LPC spectrum calculation unit 27 is sent to the formant estimation unit 41, the amplification factor calculation unit 42, and the spectrum emphasis unit 43. Is entered.
- formant estimation unit 4 1, the LPC spectrum s Pl (1) human power Then, Holman preparative frequency fp (k), (l ⁇ k ⁇ k max) and its amplitude ampp (k), (l ⁇ k kpmax ) Is estimated.
- kpmax indicates the number of formants to be estimated.
- the method of estimating the formant frequency is arbitrary. For example, a known technique such as a peak picking method of estimating the formant from the peak of the frequency spectrum can be used.
- a threshold may be set for the band width of the formant, and only the frequency at which the band width is equal to or smaller than the threshold may be set as the formant frequency.
- the amplification factor calculating section 42 calculates the LPC spectrum s Pl (1) and the formant frequency and amplitude ⁇ fp (k), ampp (k) ⁇ estimated by the formant estimating section 41. And calculate the amplification factor / 3 (1) for the LPC spectrum s Pl (1).
- FIG. 4 is a processing flowchart of the amplification factor calculation unit 42.
- the processing of the amplification factor calculation unit 42 includes the calculation of the amplification reference power (step S11), the calculation of the formant amplification factor (step S12), and the interpolation of the amplification factor (step S11). Perform processing in the order of 13).
- step S 1 that is, the process of calculating an amplification reference power Po W _ref from LPC spectrum s Pl (1) will be described.
- the method of calculating the amplification reference power p ow — ref is arbitrary. For example, there is a method of using the average power in the entire frequency band, and a method of using the largest amplitude among the formant amplitudes ampp (k) and (l ⁇ k ⁇ kpmax) as the reference power.
- the reference power may be obtained as a function using the frequency and the order of the formant as variables.
- the amplification reference power Pow-ref is expressed by equation (3).
- Pow _ ref (3)
- step S12 the formant amplitude Gp (k) is adjusted so that the formant amplitude ampp (k), (l ⁇ k ⁇ k pmax) is adjusted to the amplification reference power Pow_ref obtained in step S11. ) Is determined.
- Fig. 5 shows how the formant amplitude ampp (k) is adjusted to the width reference power Pow_ref.
- Equation (4) is an equation for calculating the amplification factor Gp (k).
- step S 13 between the adjacent formants (the difference between fp (k) and f P (k + 1)
- the amplification factor j3 (1) of the frequency band in the middle is determined by the interpolation curve R (k, 1).
- the shape of the interpolation curve is arbitrary, the following shows an example in which the interpolation curve R (k, 1) is a quadratic curve.
- R (k, 1) can be expressed as the following equation (5).
- Gp (k) a-fpilCf + b-fp ⁇ k) + c (6)
- the interpolation curve R (k, 1) can be obtained by obtaining a, b, and c using equations (6), (7), and (8) as simultaneous equations. Based on this R (k, 1), the amplification factor] 3 (1) is interpolated by calculating the amplification factor for the spectrum during the interval [fp (k), fp (k + l)].
- steps S11 to S13 described above are performed for all formants, and the amplification factors of all frequency bands are determined.
- the amplification factor for frequencies lower than the lowest order formant fp (l) is calculated using the amplification factor Gp (l) at fp (l), and the amplification factor for frequencies higher than the highest order formant fp (kpmax).
- the amplification factor Gp (kpmax) at fp (kpmax) is used.
- the amplification factor] 3 (1) is expressed by the following equation (9). ⁇ (1), (/ ⁇ (1))
- the amplification factor ⁇ (1) obtained by the amplification factor calculation unit 42 by the above-described processing and the PC spectrum s Pl (1) are input to the spectrum enhancement unit 43. .
- the spectrum emphasizing unit 43 obtains the emphasized spectrum sp 2 (1) according to the following equation (10).
- ⁇ 2 1) ⁇ ⁇ 1) ⁇ (0 ⁇ 1 ⁇ N F ) (10) Equation
- the emphasized spectrum sp 2 (1) obtained by the spectrum emphasis unit 43 is
- the modified LPC coefficient calculator 29 is input to the modified LPC coefficient calculator 29.
- the modified LPC coefficient calculator 29 obtains an autocorrelation function ac 2 (i) from the inverse Fourier transform of the emphasized spectrum sp 2 (1). Next, modified by methods known Levinson 'algorithm or the like from the self-correlation function ac 2 (i) PC factor 2 (i), obtaining the (1 ⁇ i ⁇ NP 2).
- NP 2 is the order of the modified LPC coefficient.
- the sound source signal r (n) is input to the synthesis filter 30 composed of the corrected LPC coefficient a 2 (i) obtained by the corrected LPC coefficient calculator 29.
- the synthesis filter 30 obtains the output voice s (n) by the following equation (11).
- the emphasized vocal tract characteristics and the sound source characteristics are synthesized.
- the vocal tract characteristics decoded from the speech code are emphasized and then combined with the sound source signal.
- the amplification factor is calculated based on the amplification factor of the formant for the frequency components other than the formant, and the emphasis processing is performed, so that the vocal tract characteristics are smoothly enhanced. Can be.
- the amplification factor for the spectrum s P l (1) in the present embodiment is determined with the 1-spectrum le number units, divides the spectrum into a plurality of frequency bands, the individual for each band You may make it have an amplification factor.
- FIG. 7 is a configuration block diagram of a speech decoding device 50 according to the second embodiment.
- the second embodiment is characterized in that, in addition to the enhancement of the formants, the anti-formant whose amplitude takes a minimum value is attenuated to enhance the amplitude difference between the formants and the anti-formants.
- the anti- The romant is described as being present only between two adjacent formants, but it is not limited to this example, but in other cases, i.e., when the anti-formant is at a lower frequency than the lowest order formant or when the highest order formant is present. It can be applied even if it exists at a frequency higher than the formant.
- the illustrated speech decoding device 50 includes a formant / anti-formant estimation unit 51 and an amplification factor calculation unit in place of the formant estimation unit 4.1 and the amplification factor calculation unit 42 in the speech decoding device 40 of FIG.
- the configuration other than the above is substantially the same as the configuration of the audio decoding device 40.
- the formant / antiformant estimator 51 receives the formant frequencies fp (k), (l ⁇ k ⁇ kpmax) and the Estimate the amplitude p (k), (1 ⁇ k ⁇ kpmax), and in addition to this, the antiformant frequency fv (k), (l ⁇ k ⁇ kvmax) and its amplitude ampv (k), ( 1 ⁇ k ⁇ kvmax).
- the method of estimating the anti-formant is arbitrary. For example, there is a method of applying a peak picking method to the reciprocal of the spectrum s Pl (1).
- the obtained antiformants are assumed to be fv (l), fv (2), ⁇ -• fv (kvmax) in ascending order.
- kvmax is the number of antiformants.
- the amplitude value at fv (k) is assumed to be ampv (k).
- the estimation result of the formant / antiformant obtained by the formant / antiformant estimation unit 51 is input to the amplification factor calculation unit 52.
- FIG. 8 is a processing flowchart of the amplification factor calculation unit 52.
- the amplification factor calculating section 52 calculates the formant amplification reference power (step S 21), determines the formant amplification factor (step S 22), and performs the anti-formant amplification Power calculation (step S 23), determination of the amplification factor of antiformant (step S 24), and capture of the amplification factor (step S 25) are performed in this order.
- the processing in steps S21 and S22 is the same as the processing in steps Sll and S12 in the first embodiment, and a description thereof will be omitted.
- step S23 the calculation processing of the anti-formant amplification reference power in step S23 will be described.
- the amplification reference power Pow_refv of the formant is obtained from the LPC spectrum s Pi (1).
- the calculation method is arbitrary.For example, a method using a value obtained by multiplying the formant amplification reference power Pow_ref by a constant less than 1 or the minimum value among the completion formants ampv (k) and (1 ⁇ k ⁇ kvmax) For example, there is a method of using the amplitude taking the value as the reference power.
- the following formula (12) shows the calculation formula when the product of the formant amplification reference power Pow-ref multiplied by a constant is used as the anti-formant reference power.
- Pow rep Pow ref (1 2) where ⁇ is any constant that satisfies 0 ⁇ 1.
- step S24 the process of determining the amplification factor of the anti-formant in step S24 will be described.
- Figure 9 shows how to determine the amplification factor Gv (k) of the antiformant.
- Fig. 9 As shown in Fig. 9, the anti-homoremant amplitude i ⁇ ampv (k), (1 ⁇ k ⁇ kvmax) is added to the anti-formant amplification reference power Pow-refv obtained in step S23. Determine the amplification factor Gv (k) of the antiformant in such a way.
- the following equation (13) shows an equation for calculating the amplification factor Gv (k) of the antiformant.
- the amplification factor at a frequency between the adjacent formant frequency and anti-formant frequency is obtained from the interpolation curve R i (k, 1).
- R 2 (k, 1) be the interpolation curve of.
- the interpolation curve may be obtained by any method.
- the following shows an example of calculating the interpolation curve R i (k, 1) using a quadratic curve.
- the shape of the quadratic curve is defined as a quadratic curve that passes through ⁇ fp (k), Gp (k) ⁇ and takes the minimum value at ⁇ fv (k), Gv (k) ⁇ .
- this quadratic curve can be expressed as in equation (14).
- ? (/) ⁇ !-Fv ⁇ k) f + Gv (k) (1 4)
- ⁇ -2 calculates a from equation (1 5), can Mel seek quadratic curve Ri (k, 1).
- An interpolation curve R 2 (k, 1) between fv (k) and fp (k + l) can be similarly obtained.
- the amplification factor] 3 (1) is expressed by the above equation (9).
- the amplification factor calculation unit 52 outputs this amplification factor] 3 (1) to the spectrum emphasizing unit 43, and the spectrum calculation unit 43 uses this to calculate the amplification factor according to the above equation (10). Find the emphasized spectrum sp 2 (1).
- the anti-formant is attenuated in addition to the amplification of the formant.
- the formants are further emphasized relatively, and the clarity can be further increased as compared with the first embodiment.
- the anti-formant by attenuating the anti-formant, it is possible to suppress a sense of noise that is unpleasant in the decoded speech after the speech encoding processing. It is known that noise called quantization noise is likely to occur in the anti-formant in speech encoded and decoded by speech encoding methods used in mobile phones and other devices such as CELP. According to the present invention, since the anti-formant is attenuated, the quantization noise is reduced, so that it is possible to provide an easy-to-hear voice with a small noise level.
- FIG. 10 is a block diagram of the configuration of the voice decoding device 60 according to the third embodiment.
- the third embodiment is characterized in that, in addition to the configuration of the first embodiment, a configuration is further provided in which pitch enhancement is applied to a sound source signal. That is, it is characterized in that it has a pitch enhancement filter construction section 62 and a pitch enhancement section 63.
- the ACB vector decoding unit 61 not only decodes the ACB vector p (n), (0 ⁇ n ⁇ N). From the ACB code, but also obtains the pitch lag integer part T from the ACB code, Output to the emphasis filter construction section 62.
- the method of pitch enhancement is arbitrary force S, for example, the following method.
- the pitch enhancement filter construction unit 62 uses the integer part T of the pitch lag output from the ACB vector decoding unit 61 to calculate the autocorrelation function rscor (rscor () of the sound source signal r (n) near T. T-1), rscor (T) rscor (T + 1) is obtained by the following equation (16).
- the pitch emphasis unit 63 converts the sound source signal! "(N) by a pitch enhancement filter (transfer function is the following equation (17), g p is a weighting coefficient) composed of a pitch prediction coefficient pc (i). Filtered and pitch-enhanced residual signal (sound source Signal) r '(n) is output.
- the synthesis filter 30 substitutes the sound source signal r '(n) obtained as described above into Expression (11) instead of r (n) to obtain an output voice s (n).
- a three-tap IR filter is used as the pitch enhancement filter.
- the tap length may be changed, or an arbitrary filter such as a FIR filter may be used.
- FIG. 11 is a hardware configuration diagram of a mobile phone PHS which is one of application destinations of the voice decoding device of the present example. Note that a mobile phone can be treated as a type of computer because it can execute arbitrary processing by executing programs and the like.
- the mobile phone / PHS 70 shown has an antenna 71, a radio section 72, an AD / DA conversion section 73, a DSP (Digital Signal Processor) 74, a CPU 75, a memory 76, a display section 77, and a speaker. 7 8 and microphone 7 9 Have.
- the DSP 74 executes a predetermined program stored in the memory 76 with respect to the voice code received via the antenna 71, the radio unit 2, and the ADZDA conversion unit 73. Then, the audio decoding process described with reference to FIGS. 1 to 10 is executed, and the output audio is output.
- the application destination of the voice decoding apparatus of the present invention is not limited to a mobile phone, but may be, for example, a voice over IP (VoIP), a video conference system, or the like.
- VoIP voice over IP
- a computer that has a function of performing wireless / wired communication by applying a voice coding method for compressing voice and that can execute the voice decoding process described with reference to FIGS. 1 to 10 above. Anything is fine.
- FIG. 12 is a diagram showing an example of a schematic hardware configuration of such a computer.
- the computer 80 shown in the figure has a CPU 81, a memory 82, an input device 83, an output device 84, an external storage device 85, a medium drive device 86, a network connection device 87, etc. These are connected to the bus 88. Configuration shown in the figure is an example, have Na D intended limited thereto
- the memory 82 is a memory such as a RAM for temporarily storing a program or data stored in the external storage device 85 (or the portable recording medium 89) when executing a program, updating data, or the like. .
- the CPU 81 executes the program read into the memory 82 to execute the various processes / functions described above (the processes and the like shown in FIGS. 4 and 8 and the processes shown in FIGS. 0).
- the input device 83 is, for example, a keyboard, a mouse, a touch panel, And so on.
- the output device 84 is, for example, a display, a speaker, or the like.
- the external storage device 85 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, or the like, and stores programs / data for realizing various functions as the above-described image coupling device. .
- the medium driving device 86 reads out programs / data and the like stored in the portable recording medium 89.
- the portable recording medium 89 is, for example, an FD (flexible disk), a CD-ROM, a DVD, a magneto-optical disk, or the like.
- the network connection device 87 is configured to be connected to a network to enable transmission / reception of programs / data to / from an external information processing device.
- FIG. 13 is a diagram showing a recording medium on which the above-mentioned program is recorded, and an example of downloading the program.
- the program Z for realizing the function of the present invention is read out from a portable recording medium 89 storing a program Z data to the computer 80 and stored in the memory 82, the program is executed.
- the program / data is stored in the storage unit 2 of the external server 1 via the network 3 (such as the Internet) connected by the network connection device 87. May be downloaded.
- the present invention is not limited to the apparatus method, and may be configured as a recording medium (portable recording medium 89 or the like) storing the program / data itself, or may be configured as the program itself. You can also.
- a recording medium portable recording medium 89 or the like
- JP02 / 1 1332 the prior application (international application number) already filed by the applicant of the present application JP02 / 1 1332).
- FIG. 14 shows the basic configuration of the speech enhancement device 90 proposed in the prior application.
- the illustrated voice enhancement device 90 first analyzes the signal analysis / separation unit 91 S and the input voice X, and separates this into a sound source signal r and a vocal tract characteristic s Pl .
- Is vocal tract characteristic modification unit 9 2 to modify the vocal tract characteristics s P l (e.g. emphasize formant), and outputs the modified (highlighted) has been vocal tract characteristics sp 2.
- the signal synthesizer 93 re-synthesizes the sound source signal r with the vocal tract characteristic sp 2 thus corrected (emphasized), so that a formant-enhanced voice is output. .
- the input voice X is separated into the sound source signal 'r and the vocal tract characteristics s P l to emphasize the vocal tract characteristics. No distortion is caused. Therefore, it is possible to perform formant emphasis without increasing noise and clarity.
- the speech enhancement device described in the prior application When applied to a mobile phone equipped with an ELP decoder, it becomes as shown in Fig.15. Since the speech enhancement device 90 described in the prior application inputs the speech X as described above, as shown in FIG. 15, a decoding processing device 100 is provided in front of the speech enhancement device 90, and Is decoded by the decoding processing device 100, and the decoded voice s is input to the voice emphasis device 90.
- Decryption equipment 1 0 for example, code from speech coding co de by separation / decoding unit 1 0 1 generates a sound source signal ri and the vocal tract characteristics s P l, combining these by signal combining unit 1 0 2 To generate and output the decoded speech s. At this time, the decoded speech s decoded from the speech code Therefore, the amount of speech information is reduced as compared to the speech before encoding, and the quality is poor.
- the speech enhancement device 90 that receives the decoded speech s having the deteriorated quality as an input, the speech having the deteriorated quality is re-analyzed and separated into the sound source signal and the vocal tract characteristics.
- the accuracy of the separation is degraded, and the vocal tract characteristic component may remain in the vocal tract characteristic s P l 'separated from the decoded speech s, or the vocal tract characteristic component may remain in the vocal tract signal r. Therefore, when the vocal tract characteristics are emphasized, the sound source signal components remaining in the vocal tract characteristics may be emphasized, or the vocal tract characteristic components remaining in the sound source signal may not be emphasized. For this reason, there was a possibility that the sound quality of the output speech s, re-synthesized from the sound source signal and the vocal tract characteristics after formant emphasis was degraded.
- the speech decoding apparatus of the present invention since the vocal tract characteristics decoded from the speech code are used, quality degradation due to re-analysis from the deteriorated speech does not occur. Furthermore, since re-analysis is not required, the processing amount can be reduced. Industrial potential
- the speech encoding processing is performed in a communication device such as a mobile phone using an analysis / synthesis speech encoding method.
- the vocal tract characteristics and the sound source signal are restored from the speech code when generating and outputting speech based on the speech code, and the restored vocal tract characteristics are restored.
- Formant enhancement processing is applied to this, and this is combined with the sound source signal.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2003/005582 WO2004097798A1 (ja) | 2003-05-01 | 2003-05-01 | 音声復号化装置、音声復号化方法、プログラム、記録媒体 |
JP2004571323A JP4786183B2 (ja) | 2003-05-01 | 2003-05-01 | 音声復号化装置、音声復号化方法、プログラム、記録媒体 |
DE60330715T DE60330715D1 (de) | 2003-05-01 | 2003-05-01 | Sprachdecodierer, sprachdecodierungsverfahren, programm,aufzeichnungsmedium |
EP03721013A EP1619666B1 (en) | 2003-05-01 | 2003-05-01 | Speech decoder, speech decoding method, program, recording medium |
US11/115,478 US7606702B2 (en) | 2003-05-01 | 2005-04-27 | Speech decoder, speech decoding method, program and storage media to improve voice clarity by emphasizing voice tract characteristics using estimated formants |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2003/005582 WO2004097798A1 (ja) | 2003-05-01 | 2003-05-01 | 音声復号化装置、音声復号化方法、プログラム、記録媒体 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/115,478 Continuation US7606702B2 (en) | 2003-05-01 | 2005-04-27 | Speech decoder, speech decoding method, program and storage media to improve voice clarity by emphasizing voice tract characteristics using estimated formants |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004097798A1 true WO2004097798A1 (ja) | 2004-11-11 |
Family
ID=33398154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2003/005582 WO2004097798A1 (ja) | 2003-05-01 | 2003-05-01 | 音声復号化装置、音声復号化方法、プログラム、記録媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US7606702B2 (ja) |
EP (1) | EP1619666B1 (ja) |
JP (1) | JP4786183B2 (ja) |
DE (1) | DE60330715D1 (ja) |
WO (1) | WO2004097798A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010191302A (ja) * | 2009-02-20 | 2010-09-02 | Sharp Corp | 音声出力装置 |
JP2021064009A (ja) * | 2014-07-28 | 2021-04-22 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 高調波ポストフィルタを使用してオーディオ信号を処理するための装置および方法 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008108082A1 (ja) * | 2007-03-02 | 2008-09-12 | Panasonic Corporation | 音声復号装置および音声復号方法 |
US9031834B2 (en) | 2009-09-04 | 2015-05-12 | Nuance Communications, Inc. | Speech enhancement techniques on the power spectrum |
US9536534B2 (en) * | 2011-04-20 | 2017-01-03 | Panasonic Intellectual Property Corporation Of America | Speech/audio encoding apparatus, speech/audio decoding apparatus, and methods thereof |
EP2951814B1 (en) * | 2013-01-29 | 2017-05-10 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low-frequency emphasis for lpc-based coding in frequency domain |
HRP20231248T1 (hr) | 2013-03-04 | 2024-02-02 | Voiceage Evs Llc | Uređaj i postupak za smanјenјe šuma kvantizacije u dekoderu vremenskog domena |
CN107851433B (zh) * | 2015-12-10 | 2021-06-29 | 华侃如 | 基于谐波模型和声源-声道特征分解的语音分析合成方法 |
JP2018159759A (ja) | 2017-03-22 | 2018-10-11 | 株式会社東芝 | 音声処理装置、音声処理方法およびプログラム |
JP6646001B2 (ja) * | 2017-03-22 | 2020-02-14 | 株式会社東芝 | 音声処理装置、音声処理方法およびプログラム |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08248996A (ja) * | 1995-03-10 | 1996-09-27 | Nippon Telegr & Teleph Corp <Ntt> | ディジタルフィルタのフィルタ係数決定方法 |
JPH0981192A (ja) * | 1995-09-14 | 1997-03-28 | Toshiba Corp | ピッチ強調方法および装置 |
JP2000099094A (ja) * | 1998-09-25 | 2000-04-07 | Matsushita Electric Ind Co Ltd | 時系列信号処理装置 |
JP2001117573A (ja) * | 1999-10-20 | 2001-04-27 | Toshiba Corp | 音声スペクトル強調方法/装置及び音声復号化装置 |
JP2001242899A (ja) * | 2000-02-29 | 2001-09-07 | Toshiba Corp | 音声符号化方法及び装置並びに及び音声復号方法及び装置 |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0738118B2 (ja) * | 1987-02-04 | 1995-04-26 | 日本電気株式会社 | マルチパルス符号化装置 |
JPH05323997A (ja) * | 1991-04-25 | 1993-12-07 | Matsushita Electric Ind Co Ltd | 音声符号化器、音声復号化器、音声符号化装置 |
WO1993018505A1 (en) * | 1992-03-02 | 1993-09-16 | The Walt Disney Company | Voice transformation system |
JPH0738118A (ja) | 1992-12-22 | 1995-02-07 | Korea Electron Telecommun | 薄膜トランジスタの製造方法 |
JPH06202695A (ja) | 1993-01-07 | 1994-07-22 | Sony Corp | 音声信号処理装置 |
JP3510643B2 (ja) * | 1993-01-07 | 2004-03-29 | 株式会社東芝 | 音声信号のピッチ周期処理方法 |
JP3360423B2 (ja) * | 1994-06-21 | 2002-12-24 | 三菱電機株式会社 | 音声強調装置 |
JPH08272394A (ja) | 1995-03-30 | 1996-10-18 | Olympus Optical Co Ltd | 音声符号化装置 |
JP2993396B2 (ja) * | 1995-05-12 | 1999-12-20 | 三菱電機株式会社 | 音声加工フィルタ及び音声合成装置 |
DE69628103T2 (de) * | 1995-09-14 | 2004-04-01 | Kabushiki Kaisha Toshiba, Kawasaki | Verfahren und Filter zur Hervorbebung von Formanten |
JP3319556B2 (ja) * | 1995-09-14 | 2002-09-03 | 株式会社東芝 | ホルマント強調方法 |
EP0788091A3 (en) * | 1996-01-31 | 1999-02-24 | Kabushiki Kaisha Toshiba | Speech encoding and decoding method and apparatus therefor |
JP3357795B2 (ja) * | 1996-08-16 | 2002-12-16 | 株式会社東芝 | 音声符号化方法および装置 |
JPH10105200A (ja) * | 1996-09-26 | 1998-04-24 | Toshiba Corp | 音声符号化/復号化方法 |
US6003000A (en) * | 1997-04-29 | 1999-12-14 | Meta-C Corporation | Method and system for speech processing with greatly reduced harmonic and intermodulation distortion |
US6098036A (en) * | 1998-07-13 | 2000-08-01 | Lockheed Martin Corp. | Speech coding system and method including spectral formant enhancer |
US6665638B1 (en) * | 2000-04-17 | 2003-12-16 | At&T Corp. | Adaptive short-term post-filters for speech coders |
JP4413480B2 (ja) | 2002-08-29 | 2010-02-10 | 富士通株式会社 | 音声処理装置及び移動通信端末装置 |
CN100369111C (zh) * | 2002-10-31 | 2008-02-13 | 富士通株式会社 | 话音增强装置 |
-
2003
- 2003-05-01 EP EP03721013A patent/EP1619666B1/en not_active Expired - Fee Related
- 2003-05-01 DE DE60330715T patent/DE60330715D1/de not_active Expired - Lifetime
- 2003-05-01 WO PCT/JP2003/005582 patent/WO2004097798A1/ja active Application Filing
- 2003-05-01 JP JP2004571323A patent/JP4786183B2/ja not_active Expired - Fee Related
-
2005
- 2005-04-27 US US11/115,478 patent/US7606702B2/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08248996A (ja) * | 1995-03-10 | 1996-09-27 | Nippon Telegr & Teleph Corp <Ntt> | ディジタルフィルタのフィルタ係数決定方法 |
JPH0981192A (ja) * | 1995-09-14 | 1997-03-28 | Toshiba Corp | ピッチ強調方法および装置 |
JP2000099094A (ja) * | 1998-09-25 | 2000-04-07 | Matsushita Electric Ind Co Ltd | 時系列信号処理装置 |
JP2001117573A (ja) * | 1999-10-20 | 2001-04-27 | Toshiba Corp | 音声スペクトル強調方法/装置及び音声復号化装置 |
JP2001242899A (ja) * | 2000-02-29 | 2001-09-07 | Toshiba Corp | 音声符号化方法及び装置並びに及び音声復号方法及び装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1619666A4 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010191302A (ja) * | 2009-02-20 | 2010-09-02 | Sharp Corp | 音声出力装置 |
JP2021064009A (ja) * | 2014-07-28 | 2021-04-22 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 高調波ポストフィルタを使用してオーディオ信号を処理するための装置および方法 |
US11694704B2 (en) | 2014-07-28 | 2023-07-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for processing an audio signal using a harmonic post-filter |
JP7340553B2 (ja) | 2014-07-28 | 2023-09-07 | フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 高調波ポストフィルタを使用してオーディオ信号を処理するための装置および方法 |
Also Published As
Publication number | Publication date |
---|---|
JP4786183B2 (ja) | 2011-10-05 |
EP1619666A4 (en) | 2007-08-01 |
US7606702B2 (en) | 2009-10-20 |
EP1619666A1 (en) | 2006-01-25 |
JPWO2004097798A1 (ja) | 2006-07-13 |
DE60330715D1 (de) | 2010-02-04 |
EP1619666B1 (en) | 2009-12-23 |
US20050187762A1 (en) | 2005-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3881943B2 (ja) | 音響符号化装置及び音響符号化方法 | |
JP5226777B2 (ja) | 音声信号中に埋め込まれた隠れデータの回復 | |
JP5942358B2 (ja) | 符号化装置および方法、復号装置および方法、並びにプログラム | |
JP3881946B2 (ja) | 音響符号化装置及び音響符号化方法 | |
US7606702B2 (en) | Speech decoder, speech decoding method, program and storage media to improve voice clarity by emphasizing voice tract characteristics using estimated formants | |
JPWO2009057327A1 (ja) | 符号化装置および復号装置 | |
JP2008519990A (ja) | 信号符号化の方法 | |
KR20060135699A (ko) | 신호 복호화 장치 및 신호 복호화 방법 | |
WO2005106850A1 (ja) | 階層符号化装置および階層符号化方法 | |
JP2004302259A (ja) | 音響信号の階層符号化方法および階層復号化方法 | |
JPH1083200A (ja) | 符号化,復号化方法及び符号化,復号化装置 | |
JP4373693B2 (ja) | 音響信号の階層符号化方法および階層復号化方法 | |
JP4343302B2 (ja) | ピッチ強調方法及びその装置 | |
JP3785363B2 (ja) | 音声信号符号化装置、音声信号復号装置及び音声信号符号化方法 | |
WO2004040552A1 (ja) | トランスコーダ及び符号変換方法 | |
JP2002149198A (ja) | 音声符号化装置及び音声復号化装置 | |
JP3770901B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3748081B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP4447546B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3560964B2 (ja) | 広帯域音声復元装置及び広帯域音声復元方法及び音声伝送システム及び音声伝送方法 | |
JP3770899B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3748080B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3770900B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3598112B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3773509B2 (ja) | 広帯域音声復元装置及び広帯域音声復元方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004571323 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11115478 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003721013 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2003721013 Country of ref document: EP |