WO2006093019A1 - 音声処理方法と装置及び記憶媒体並びに音声システム - Google Patents
音声処理方法と装置及び記憶媒体並びに音声システム Download PDFInfo
- Publication number
- WO2006093019A1 WO2006093019A1 PCT/JP2006/303290 JP2006303290W WO2006093019A1 WO 2006093019 A1 WO2006093019 A1 WO 2006093019A1 JP 2006303290 W JP2006303290 W JP 2006303290W WO 2006093019 A1 WO2006093019 A1 WO 2006093019A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- envelope
- deformed
- spectral
- high frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/1752—Masking
- G10K11/1754—Speech masking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
Definitions
- the present invention relates to an audio system that prevents a third party from hearing the contents of conversational speech, and an audio processing method, apparatus, and storage medium used for the system.
- conversational speech may leak around to cause problems. For example, if a customer is heard by a third party when a customer and a store clerk talk in a bank, or when an outpatient and a receptionist or a doctor talk in a hospital, the conversation may be heard by a third party. And privacy may be impaired.
- the sound such as pink noise or background music (BGM) is superimposed on the original voice as a masking sound.
- BGM background music
- An object of the present invention is to prevent a third party from perceiving the contents of conversational speech that does not make the surrounding people feel loud.
- a spectral envelope and a spectral fine structure of an input speech signal are extracted, and the spectral envelope is deformed to generate a deformed spatial envelope.
- the deformation spectrum envelope and the spectrum fine structure are combined to generate a deformation spectrum, and an output speech signal is generated based on the deformation spectrum.
- the high frequency component of the spectrum of the input speech signal is extracted, and the high frequency component included in the deformed spectrum is replaced by the extracted high frequency component, and the high frequency component is substituted.
- An output speech signal is generated based on the transformed spectrum.
- FIG. 1 is a view schematically showing an audio system according to an embodiment of the present invention.
- FIG. 2A is a diagram showing an example of a spectrum of a conversational voice collected by a microphone in the voice system of FIG.
- FIG. 2B is a diagram showing the spectrum of the disturbance sound radiated from the speaker in the voice system of FIG.
- FIG. 2C is a diagram showing an example of the spectrum of the fusion sound of the disturbance sound and the speech in the audio system of FIG.
- FIG. 3 is a block diagram showing the configuration of the speech processing apparatus according to the first embodiment of the present invention.
- FIG. 4 is a flow chart showing an example of spectral analysis and processing associated with spectral analysis.
- FIG. 5A is a diagram showing an example of an audio spectrum of an input audio signal.
- FIG. 5B is a diagram showing an example of a spectral envelope of the speech spectrum of FIG. 5A.
- FIG. 5C shows an example of a modified spectral envelope obtained by modifying the spectral envelope of FIG. 5B.
- FIG. 5D is a diagram showing an example of the spectral fine structure of the speech spectrum of FIG. 5A.
- FIG. 5E is a view showing an example of a deformed spectrum generated by combining the deformed spectrum of FIG. 5C and the spectral fine structure of FIG. 5D.
- FIG. 6 is a flowchart showing the overall flow of voice processing in the first embodiment.
- FIG. 7A is a diagram showing an example of a spectral envelope of a speech spectrum.
- FIG. 7B is a diagram for explaining a first example of a method of performing spectrum deformation in the amplitude direction on the spectrum envelope in the first embodiment.
- FIG. 7C is a diagram for explaining a second example of the method of performing spectrum deformation in the amplitude direction on the spectrum envelope in the first embodiment.
- FIG. 7D is a diagram for explaining a third example of a method of performing spectrum deformation in the amplitude direction on the spectrum envelope in the first embodiment.
- FIG. 7E is a diagram for explaining a fourth example of the method of performing spectral deformation in the amplitude direction with respect to the spectral envelope in the first embodiment.
- FIG. 8A is a diagram showing an example of a spectral envelope of a speech spectrum.
- FIG. 8B is a diagram for explaining a first example of a method of performing spectrum vector deformation in the frequency axis direction on the spectrum envelope in the first embodiment.
- FIG. 8C is a diagram for explaining a second example of the method of performing spectrum vector deformation in the frequency axis direction on the spectrum envelope in the first embodiment.
- FIG. 9A is a diagram showing an example of a spectrum of frictional noise.
- FIG. 9B is a diagram showing an example of a spectral envelope of frictional noise.
- FIG. 9C is a diagram for explaining a first example of a method of applying spectral deformation in the amplitude direction to the spectral envelope of the friction sound in the first embodiment.
- FIG. 9D is a view for explaining a second example of the method of applying spectral deformation in the amplitude direction to the spectral envelope of the friction sound in the first embodiment.
- FIG. 10 is a block diagram showing the configuration of the speech processing apparatus according to the second embodiment of the present invention.
- FIG. 10 is a block diagram showing the configuration of the speech processing apparatus according to the second embodiment of the present invention.
- FIG. 11 is a flowchart showing a part of the process of the spectrum envelope deformation unit and the process of the high frequency component extraction unit in the second embodiment.
- FIG. 12A is a diagram showing an example of the speech spectrum of an input speech signal in which low-pass components are strong.
- FIG. 12B is a diagram showing a spectral envelope of the speech spectrum of FIG. 12A.
- FIG. 12C is a view showing an example of a deformed spectrum obtained by modifying the speech spectrum of FIG. 12A in the second embodiment.
- FIG. 12D is a diagram showing an example of the spectrum of the interference sound generated by replacing the high-frequency component of the modified spectrum of FIG. 12C in the second embodiment.
- FIG. 13A is a diagram showing an example of the speech spectrum of an input speech signal with a strong high frequency component.
- FIG. 13B is a diagram showing a spectral envelope of the speech spectrum of FIG. 13A.
- FIG. 13C is a diagram showing an example of a deformed spectrum obtained by modifying the speech spectrum of FIG. 13A in the second embodiment.
- FIG. 13D is a diagram showing an example of the spectrum of an interference sound generated by replacing high-frequency components of the modified spectrum of FIG. 13C in the second embodiment.
- FIG. 14 is a flowchart showing the overall flow of audio processing in the second embodiment.
- FIG. 1 shows a conceptual diagram of an audio system including an audio processing device 10 according to an embodiment of the present invention.
- the voice processing device 10 processes an input voice signal obtained by collecting speech voice by a microphone 11 placed at a position A near a place where a plurality of persons 1 and 2 are talking in the figure. And produce an output audio signal.
- the output audio signal output from the audio processing device 10 is supplied to the speaker 20 placed at the position B, and the speaker 20 emits a sound.
- the sound source information of the input speech signal is maintained while the phoneme is maintained.
- the sex is broken and if the sound emitted from the speaker 20 is fused to the sound of the conversational voice, the person 3 at position C can not hear the conversational voice between the person 1 and the person 2 .
- the sound emitted from the speaker 20 is called interference sound since it is intended to prevent the third party from listening to the speech in this manner.
- the sound emitted from the speaker 20 may be referred to as “hearing sound” because the purpose is to prevent the speech from being heard by a third party.
- the voice processing device 10 processes the input voice signal to generate an output voice signal that breaks the phonality while maintaining the sound source information of the input voice signal as described above.
- the speaker 20 emits a disturbing sound in which the phonological property of the conversational voice is broken.
- the spectrum of the conversational speech collected by the microphone 11 is as shown in FIG. 2A
- the spectrum of the interference sound radiated from the speaker 20 through the voice processing device 10 is as shown in FIG. 2B, for example.
- the third person hears a sound having a spectrum as shown in FIG. 2C in which the disturbance sound and the direct sound of the speech sound are fused.
- FIG. 3 shows the configuration of the speech processing apparatus according to the first embodiment.
- the microphone 11 is installed, for example, in a place near a bank window or a hospital's outpatient reception desk, and collects speech sound and outputs a speech signal.
- An audio signal from the microphone 11 is input to the audio input processing unit 12.
- the voice input processing unit 12 has, for example, an amplifier and an AZD converter, amplifies a voice signal from the microphone 11 (hereinafter referred to as an input voice signal), and digitally outputs the amplified signal.
- the digitized input speech signal from the speech input processing unit 12 is input to the spectrum analysis unit 13.
- the spectrum analysis unit 13 analyzes the input speech signal by, for example, FFT cepstral analysis or processing of a vocoder type speech analysis and synthesis system.
- the low quefrance unit is input to the spectrum envelope extraction unit 14.
- the high quefrency part is input to the spectral fine structure extraction part 16.
- the spectral envelope extraction unit 14 extracts the spectral envelope of the speech spectrum of the input speech signal.
- the spectral envelope represents phonological information of the input speech signal. For example, assuming the speech spectrum of the input speech signal as shown in FIG. 5A, the spectrum envelope is shown in FIG. 5B. Extraction of the spectral envelope is performed, for example, by applying an FFT (step S6) to the low-queries portion of the cepstrum coefficient as shown in FIG.
- the extracted spectral envelope is deformed by the spectral envelope deformation unit 15 to generate a deformed spectral envelope.
- the spectral envelope deformation section 15 applies a deformation to the spectral envelope by inverting the spectral envelope as shown in FIG. 5C.
- the spectrum envelope is expressed by lower order cepstrum coefficients.
- the spectral envelope transformation unit 15 performs sign inversion on such low-order cepstral coefficients. A more specific example of the scan vector envelope deformation unit 15 will be described in detail later.
- the spectral fine structure extraction unit 16 extracts the spectral fine structure of the speech spectrum of the input speech signal.
- the spectral fine structure represents the sound source information of the input speech signal. For example, given the speech spectrum of the input speech signal as in FIG. 5A, the spectral fine structure is shown in FIG. 5D.
- the extraction of the spectral fine structure is achieved, for example, by applying an FFT (step S7) to the high-queries of the cepstral coefficients as shown in FIG.
- the deformed spectral envelope generated by the spectral envelope deformation unit 15 and the spectral fine structure extracted by the spectral fine structure extraction unit 16 are input to a deformed spectrum generation unit 17.
- the deformation spectrum generation unit 17 has a deformation spectrum envelope and a spectrum fine structure. By synthesizing the structure, a deformed spectrum which is a deformed spectrum of the speech spectrum of the input speech signal is generated. For example, assuming that the deformed spectral envelope is as shown in FIG. 5C and the fine spectrum structure is as shown in FIG. 5D, a deformed spectrum generated by combining them is shown in FIG. 5E.
- the deformed spectrum generated by the deformed spectrum generation unit 17 is input to the sound generation unit 18.
- the sound generation unit 18 generates an output sound signal digitized based on the deformed spectrum.
- the digitized output audio signal is input to the audio output processing unit 19.
- the voice output processing unit 19 converts the output voice signal into an analog signal by the DZA converter, further amplifies the signal by the power amplifier, and supplies the amplified signal to the speaker 20. As a result, the disturbance sound is emitted from the speaker 20.
- the number of microphones and the number of speakers may be two or more.
- the audio processing device may process the input audio signals of a plurality of channels from a plurality of microphones individually and emit interference noise from a plurality of speakers.
- the voice processing device 10 shown in FIG. 3 can be realized by hardware such as a digital signal processing device (DSP), but can also be executed by a computer using a program. .
- DSP digital signal processing device
- the processing procedure in the case where the processing of the speech processing device 10 is realized by a computer will be described below with reference to FIG.
- step S101 For the digital input speech signal input in step S101, through spectral analysis (step S102), extraction of the spectral envelope (step S103), modification of the spectral envelope (step S104), and spectral fine structure Extraction (step S105) as described above.
- step S102 For the digital input speech signal input in step S101, through spectral analysis (step S102), extraction of the spectral envelope (step S103), modification of the spectral envelope (step S104), and spectral fine structure Extraction (step S105) as described above.
- the order of the processes in steps S103 and S104 and step S105 is arbitrary. Further, the processing of steps S103 and S104 and the processing of step S105 may be performed in parallel.
- a deformed spectrum is generated by combining the deformed spectral envelope generated through steps S103 and S104 and the spectral fine structure generated by step S105 (step S106).
- step S106 a speech signal of modified spectrum is generated and output (steps S107 to S108).
- Spectral envelope deformation Is basically achieved by changing the formant frequency of the spectral envelope (ie the position of the peaks and valleys of the spectral envelope).
- the transformation of the spectral envelope here is aimed at breaking the phoneme. Since the positional relationship between the peaks and valleys of the spectral envelope is important for phonological perception, the positions of these peaks and valleys should be different from those before deformation. Specifically, this can be achieved by subjecting the spectral envelope to at least one of the amplitude direction and the frequency axis direction.
- Figures 7A, 7B, 7C, 7D and 7E show how to change the position of peaks and valleys by applying deformation in the amplitude direction to the spectral envelope!
- the spectrum envelope deformation unit 15 sets an inversion axis with respect to the spectrum envelope shown in FIG. 7A, and inverts the spectrum envelope around the inversion axis.
- Various approximation functions can be used as the inversion axis.
- FIG. 7B is an example in which the inversion axis is set by a cos function
- FIG. 7C is an example in which the inversion axis is set by a straight line
- FIG. 7D is an example in which the inversion axis is set by logarithm.
- FIG. 7E is an example in which the inversion axis is set to the average of the amplitude of the spectral envelope, that is, parallel to the frequency axis.
- FIGS. 7B, 7C, 7D and 7E it can be seen that the position of the peaks and valleys (frequency) changes with respect to the original spectral envelope of FIG. 7A.
- FIG. 8A, 8B and 8C show a method of changing the position of peaks and valleys by applying deformation in the frequency axis direction to the spectral envelope! /.
- the spectral envelope shown in FIG. 8A is shifted to the low band side as shown in FIG. 8B or is shifted to the high band side as shown in FIG. 8C.
- a deformation method of the spectral envelope in the frequency axis direction a method of performing linear expansion or non-linear expansion or contraction on the frequency axis may be considered.
- the need for performing the transformation on the frequency axis for the entire band of the vector envelope may not necessarily be partial.
- the spectrum of the input speech signal is low. This is effective for phonemes that have the 1st and 2nd formants in the low range, like vowels, because they are processed to transform the range components.
- the modification methods 1 and 2 are not effective for ZeZ and ZiZ where the second formant is in the high region, friction sound ZsZ that is characterized in the high region, and popping sound ZkZ. For this reason, it is desirable to dynamically control the frequency band to be transformed for the spectral envelope and the inversion axis in accordance with the shape of the phoneme vector.
- FIG. 9A shows the spectrum of frictional noise
- FIG. 9B shows the spectral envelope of frictional noise.
- the characteristic change is made remarkable by inverting the spectral envelope centering on the inversion axis set to the average of the amplitude of the spectral envelope as in FIG. 7E. Can. This is only an example, and so long as it is a variation that significantly changes the characteristics of the spectral envelope!
- the spectral envelope of the input speech signal is deformed to generate a deformed spectral envelope, and this deformed spectral envelope is synthesized with the spatial fine structure of the input speech signal. And generate an output speech signal based on the deformed spectrum.
- the above-described processing is performed on the input voice signal obtained by collecting the speech voice by the microphone 11 placed at the position A as shown in FIG. 1 to generate an output voice signal
- the third party perceptually combines the disturbance sound and the direct sound of the speech for a third party Speech sounds become unclear because As a result, the contents of the conversational speech are less likely to be perceived by third parties.
- FIG. 10 shows a speech processing apparatus according to the second embodiment, and for the speech processing apparatus according to the first embodiment shown in FIG. Part 22 has been added.
- the spectral high-frequency component extraction unit 21 passes through the spectrum analysis unit 13 to extract high-frequency components of the spectrum of the input speech signal.
- the high frequency component of the spectrum represents personal information, and can be extracted from, for example, the FFT result (spectrum of the input speech signal) in step S2 in FIG.
- the extracted high frequency component is input to the high frequency component replacing unit 22.
- the high-frequency component replacing unit 22 is inserted between the output of the modified spectrum generation unit 17 and the input of the voice generation unit 18, and the high-frequency component in the deformed spectrum generated by the deformed spectrum generation unit 17 has a spectral height.
- a process of replacing with the high frequency component extracted by the region component extraction unit 21 is performed.
- the voice generation unit 18 generates an output voice signal based on the deformed spectrum after the high frequency component has been replaced.
- FIG. 11 shows a process when the spectrum envelope deformation unit 15 performs the spectrum envelope deformation shown in FIG. 7B, FIG. 7C and FIG. 7D, and a part of the process of the high frequency component replacement unit 22.
- the spectrum envelope deformation unit 15 detects the slope of the spectrum envelope (step S201).
- the vector transform unit 15 determines, for example, a cos function, a straight line, a logarithm and an approximation function (step S202).
- Invert the spectral envelope according to this approximate function step S203.
- the processing of the spectrum envelope deformation unit 15 is the same as that of the first embodiment.
- the high-frequency component replacing unit 22 determines the gradient power replacement band of the spectrum envelope detected in step S201, and the high-frequency component that is the frequency component in this replacement band is extra-high-frequency component extraction Replace with the high frequency component extracted by the unit 21.
- FIGS. 12A to 12D and FIGS. 13A to 13D For example, as shown in FIG. 12A, in the case where the input speech signal is a strong low-pass component spectrum like a vowel part, the spectral envelope of the input speech signal Has a negative slope as shown in FIG. 12B. In such a case, for example, the transformed spectrum envelope in which the spectrum envelope is inverted about the inversion axis according to the approximation function which has the above-mentioned cos function, the straight line or the logarithm, and the spectrum structure of the input speech signal And generate the deformed spectrum shown in FIG. 12C.
- the low frequency component including phonological information for example, the frequency component of 2.5 to 3 kHz or less
- the high frequency component including personality information for example, by replacing the frequency component of 3 kHz or more with the high frequency component of the original speech spectrum of FIG. 12A, an interference sound of the spectrum as shown in FIG. 12D is generated.
- the lower limit frequency of the replacement band variable according to the position of the valley of the spectral envelope. In this way, it is possible to determine the band containing personal information, regardless of the gender and voice quality of the speaker.
- the spectral envelope of the input speech signal is positive as shown in FIG. 13B.
- Indicates the slope of in such a case, for example, as described above, combining the deformed spectrum envelope in which the spectrum envelope is inverted around the inversion axis set to the average of the amplitude of the spectrum envelope and the spectrum fine structure of the input speech signal
- the deformed spectrum shown in FIG. 13C is generated.
- the replacement band is set to a higher frequency side, for example, a frequency band of 6 kHz or more.
- the lower limit frequency of the replacement band can be made variable according to the position of the mountain of the spectrum envelope. In this way, it is possible to determine the band that contains personal information, regardless of the gender and voice quality of the speaker.
- the speech processing device shown in FIG. 10 can also be realized by hardware such as a DSP, but can also be executed by a program using a computer. Furthermore, according to the present invention, a storage medium storing the program can be provided.
- the processing procedure in the case of realizing the processing of the voice processing device by a computer will be described using FIG. 14.
- the processing from step S101 to step S106 is the same as that in the first embodiment. .
- step S 109 extraction of spectral high-frequency components
- step S 110 replacement of high-frequency components
- the modified spectrum-carried speech signal after high-frequency component replacement is generated and output (steps 3107 to 3108).
- the processing order of steps S103 to S105 and step S109 is arbitrary, and the processing of steps S103 and S104 and the processing of step S105 are performed in parallel, or the processing of step S109 is performed in parallel. You may go
- the deformed spectrum obtained by replacing the high frequency component of the deformed spectrum generated by combining the deformed spectrum envelope and the spectral fine structure with the high frequency component of the input speech signal is used.
- the deformation of the spectral envelope destroys the phonological properties of the conversational speech, and it is possible to generate a disturbing sound in which the individuality information, which is the high-frequency component of the speech speech spectrum, is stored. That is, the inversion of the spectral envelope increases the power of the high frequency band of the disturbing sound and the sound quality is not degraded.
- the information on the individuality of the speech is also broken, and the effect of the fusion of the disturbing sound and the speech Will not be enough. By this, it is possible to more effectively exert the effect of preventing the third party from hearing the contents of the conversational voice which makes the surrounding feel loud.
- the high frequency component is replaced to generate a deformed spectrum in which the high frequency component is substituted.
- the same result can be obtained by selectively performing the deformation of the spectral envelope only on the frequency bands other than the high band component (low and middle bands).
- the aspect of the present invention it is possible to generate an output speech signal in which the phonological property is broken due to the transformation of the spectral envelope from the input speech signal by speech speech. Therefore, by emitting an interference sound using this output voice signal, the contents of the conversation voice can be kept from being heard by a third party, which is effective for confidentiality and privacy protection. That is, in the aspect of the present invention, since the output speech signal is generated by the deformed spectrum obtained by combining the spectrum fine structure of the input speech signal with the deformed spectrum envelope, the sound source information of the speaker is maintained, and the cocktail party effect is obtained. Even with human auditory characteristics, the original speech and the disturbing sound are perceptually fused. This makes the speech sound unclear and perceptible to third parties. Therefore, it can protect the confidentiality and privacy of conversations.
- the present invention can be applied to a technology for preventing the surrounding third parties from hearing the contents of conversational speech or the contents of the conversation of a caller in a cellular phone or other telephone.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Telephone Function (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020077019988A KR100931419B1 (ko) | 2005-03-01 | 2006-02-23 | 음성 처리 방법과 장치, 기억 매체 및 음성 시스템 |
| DE602006014096T DE602006014096D1 (de) | 2005-03-01 | 2006-02-23 | Sprachverarbeitungsverfahren und -einrichtung, Speichermedium und Sprachsystem |
| EP06714430A EP1855269B1 (en) | 2005-03-01 | 2006-02-23 | Speech processing method and device, storage medium, and speech system |
| CN2006800066680A CN101138020B (zh) | 2005-03-01 | 2006-02-23 | 声音处理方法和装置及存储媒体以及声音系统 |
| US11/849,106 US8065138B2 (en) | 2005-03-01 | 2007-08-31 | Speech processing method and apparatus, storage medium, and speech system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2005-056342 | 2005-03-01 | ||
| JP2005056342A JP4761506B2 (ja) | 2005-03-01 | 2005-03-01 | 音声処理方法と装置及びプログラム並びに音声システム |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US11/849,106 Continuation US8065138B2 (en) | 2005-03-01 | 2007-08-31 | Speech processing method and apparatus, storage medium, and speech system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2006093019A1 true WO2006093019A1 (ja) | 2006-09-08 |
Family
ID=36941053
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2006/303290 Ceased WO2006093019A1 (ja) | 2005-03-01 | 2006-02-23 | 音声処理方法と装置及び記憶媒体並びに音声システム |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US8065138B2 (https=) |
| EP (1) | EP1855269B1 (https=) |
| JP (1) | JP4761506B2 (https=) |
| KR (1) | KR100931419B1 (https=) |
| CN (1) | CN101138020B (https=) |
| DE (1) | DE602006014096D1 (https=) |
| WO (1) | WO2006093019A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008245203A (ja) * | 2007-03-29 | 2008-10-09 | Yamaha Corp | 拡声装置、拡声装置の遅延時間決定方法および拡声装置のフィルタ係数決定方法 |
Families Citing this family (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4757158B2 (ja) * | 2006-09-20 | 2011-08-24 | 富士通株式会社 | 音信号処理方法、音信号処理装置及びコンピュータプログラム |
| US8229130B2 (en) * | 2006-10-17 | 2012-07-24 | Massachusetts Institute Of Technology | Distributed acoustic conversation shielding system |
| US8140326B2 (en) * | 2008-06-06 | 2012-03-20 | Fuji Xerox Co., Ltd. | Systems and methods for reducing speech intelligibility while preserving environmental sounds |
| JP5511342B2 (ja) * | 2009-12-09 | 2014-06-04 | 日本板硝子環境アメニティ株式会社 | 音声変更装置、音声変更方法および音声情報秘話システム |
| JP5489778B2 (ja) * | 2010-02-25 | 2014-05-14 | キヤノン株式会社 | 情報処理装置およびその処理方法 |
| JP5605062B2 (ja) * | 2010-08-03 | 2014-10-15 | 大日本印刷株式会社 | 騒音源の快音化方法および快音化装置 |
| JP5569291B2 (ja) * | 2010-09-17 | 2014-08-13 | 大日本印刷株式会社 | 騒音源の快音化方法および快音化装置 |
| JP6007481B2 (ja) * | 2010-11-25 | 2016-10-12 | ヤマハ株式会社 | マスカ音生成装置、マスカ音信号を記憶した記憶媒体、マスカ音再生装置、およびプログラム |
| WO2012128678A1 (en) | 2011-03-21 | 2012-09-27 | Telefonaktiebolaget L M Ericsson (Publ) | Method and arrangement for damping of dominant frequencies in an audio signal |
| JP2014513320A (ja) * | 2011-03-21 | 2014-05-29 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | オーディオ信号におけるドミナント周波数を減衰する方法及び装置 |
| US8972251B2 (en) | 2011-06-07 | 2015-03-03 | Qualcomm Incorporated | Generating a masking signal on an electronic device |
| US8583425B2 (en) * | 2011-06-21 | 2013-11-12 | Genband Us Llc | Methods, systems, and computer readable media for fricatives and high frequencies detection |
| WO2013012312A2 (en) * | 2011-07-19 | 2013-01-24 | Jin Hem Thong | Wave modification method and system thereof |
| JP5849508B2 (ja) * | 2011-08-09 | 2016-01-27 | 株式会社大林組 | Bgmのマスキング効果評価方法及びbgmのマスキング効果評価装置 |
| JP5925493B2 (ja) * | 2012-01-11 | 2016-05-25 | グローリー株式会社 | 会話保護システム及び会話保護方法 |
| EP2862169A4 (en) * | 2012-06-15 | 2016-03-02 | Jemardator Ab | DIFFERENCE OF CEPSTRAL SEPARATION |
| US8670986B2 (en) | 2012-10-04 | 2014-03-11 | Medical Privacy Solutions, Llc | Method and apparatus for masking speech in a private environment |
| CN103818290A (zh) * | 2012-11-16 | 2014-05-28 | 黄金富 | 一种用于汽车司机与老板的隔声装置 |
| CN103826176A (zh) * | 2012-11-16 | 2014-05-28 | 黄金富 | 一种用于汽车司机与乘客之间的司机专用保密耳筒 |
| JP2014130251A (ja) * | 2012-12-28 | 2014-07-10 | Glory Ltd | 会話保護システム及び会話保護方法 |
| JP5929786B2 (ja) * | 2013-03-07 | 2016-06-08 | ソニー株式会社 | 信号処理装置、信号処理方法及び記憶媒体 |
| JP6371516B2 (ja) * | 2013-11-15 | 2018-08-08 | キヤノン株式会社 | 音響信号処理装置および方法 |
| JP6098654B2 (ja) * | 2014-03-10 | 2017-03-22 | ヤマハ株式会社 | マスキング音データ生成装置およびプログラム |
| JP7145596B2 (ja) * | 2017-09-15 | 2022-10-03 | 株式会社Lixil | 擬音装置 |
| CN108540680B (zh) * | 2018-02-02 | 2021-03-02 | 广州视源电子科技股份有限公司 | 讲话状态的切换方法及装置、通话系统 |
| US10757507B2 (en) * | 2018-02-13 | 2020-08-25 | Ppip, Llc | Sound shaping apparatus |
| WO2019245916A1 (en) * | 2018-06-19 | 2019-12-26 | Georgetown University | Method and system for parametric speech synthesis |
| US12556927B2 (en) | 2024-01-19 | 2026-02-17 | Cisco Technology, Inc. | Speech confidentiality monitoring and alerting |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000003197A (ja) * | 1998-06-16 | 2000-01-07 | Yamaha Corp | 音声変換装置、音声変換方法、および音声変換プログラムを記録した記録媒体 |
| JP2002123298A (ja) * | 2000-10-18 | 2002-04-26 | Nippon Telegr & Teleph Corp <Ntt> | 信号符号化方法、装置及び信号符号化プログラムを記録した記録媒体 |
| WO2002054732A1 (fr) | 2001-01-05 | 2002-07-11 | Travere Rene | Attenuateur brouilleur de conversation applique au telephone |
| JP2002215198A (ja) * | 2001-01-16 | 2002-07-31 | Sharp Corp | 声質変換装置および声質変換方法およびプログラム記憶媒体 |
| JP2002251199A (ja) | 2001-02-27 | 2002-09-06 | Ricoh Co Ltd | 音声入力情報処理装置 |
| JP2003514265A (ja) * | 1999-11-16 | 2003-04-15 | ロイヤルカレッジ オブ アート | 音環境を改善するための装置及びその方法 |
| WO2004010627A1 (en) * | 2002-07-24 | 2004-01-29 | Applied Minds, Inc. | Method and system for masking speech |
| JP2005084645A (ja) * | 2003-09-11 | 2005-03-31 | Glory Ltd | マスキング装置 |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3681530A (en) * | 1970-06-15 | 1972-08-01 | Gte Sylvania Inc | Method and apparatus for signal bandwidth compression utilizing the fourier transform of the logarithm of the frequency spectrum magnitude |
| US4827516A (en) * | 1985-10-16 | 1989-05-02 | Toppan Printing Co., Ltd. | Method of analyzing input speech and speech analysis apparatus therefor |
| JPH0522391A (ja) | 1991-07-10 | 1993-01-29 | Sony Corp | 音声マスキング装置 |
| JP3557662B2 (ja) * | 1994-08-30 | 2004-08-25 | ソニー株式会社 | 音声符号化方法及び音声復号化方法、並びに音声符号化装置及び音声復号化装置 |
| JPH09319389A (ja) * | 1996-03-28 | 1997-12-12 | Matsushita Electric Ind Co Ltd | 環境音発生装置 |
| US6904404B1 (en) * | 1996-07-01 | 2005-06-07 | Matsushita Electric Industrial Co., Ltd. | Multistage inverse quantization having the plurality of frequency bands |
| JP3246715B2 (ja) * | 1996-07-01 | 2002-01-15 | 松下電器産業株式会社 | オーディオ信号圧縮方法,およびオーディオ信号圧縮装置 |
| JP3266819B2 (ja) * | 1996-07-30 | 2002-03-18 | 株式会社エイ・ティ・アール人間情報通信研究所 | 周期信号変換方法、音変換方法および信号分析方法 |
| JP3707153B2 (ja) * | 1996-09-24 | 2005-10-19 | ソニー株式会社 | ベクトル量子化方法、音声符号化方法及び装置 |
| US6073100A (en) * | 1997-03-31 | 2000-06-06 | Goodridge, Jr.; Alan G | Method and apparatus for synthesizing signals using transform-domain match-output extension |
| SE512719C2 (sv) * | 1997-06-10 | 2000-05-02 | Lars Gustaf Liljeryd | En metod och anordning för reduktion av dataflöde baserad på harmonisk bandbreddsexpansion |
| FR2813722B1 (fr) * | 2000-09-05 | 2003-01-24 | France Telecom | Procede et dispositif de dissimulation d'erreurs et systeme de transmission comportant un tel dispositif |
| AU2003213439A1 (en) * | 2002-03-08 | 2003-09-22 | Nippon Telegraph And Telephone Corporation | Digital signal encoding method, decoding method, encoding device, decoding device, digital signal encoding program, and decoding program |
| JP4195267B2 (ja) * | 2002-03-14 | 2008-12-10 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 音声認識装置、その音声認識方法及びプログラム |
| US20030187663A1 (en) * | 2002-03-28 | 2003-10-02 | Truman Michael Mead | Broadband frequency translation for high frequency regeneration |
| US7451082B2 (en) * | 2003-08-27 | 2008-11-11 | Texas Instruments Incorporated | Noise-resistant utterance detector |
-
2005
- 2005-03-01 JP JP2005056342A patent/JP4761506B2/ja not_active Expired - Lifetime
-
2006
- 2006-02-23 CN CN2006800066680A patent/CN101138020B/zh not_active Expired - Fee Related
- 2006-02-23 KR KR1020077019988A patent/KR100931419B1/ko not_active Expired - Fee Related
- 2006-02-23 DE DE602006014096T patent/DE602006014096D1/de not_active Expired - Lifetime
- 2006-02-23 WO PCT/JP2006/303290 patent/WO2006093019A1/ja not_active Ceased
- 2006-02-23 EP EP06714430A patent/EP1855269B1/en not_active Expired - Lifetime
-
2007
- 2007-08-31 US US11/849,106 patent/US8065138B2/en not_active Expired - Fee Related
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000003197A (ja) * | 1998-06-16 | 2000-01-07 | Yamaha Corp | 音声変換装置、音声変換方法、および音声変換プログラムを記録した記録媒体 |
| JP2003514265A (ja) * | 1999-11-16 | 2003-04-15 | ロイヤルカレッジ オブ アート | 音環境を改善するための装置及びその方法 |
| JP2002123298A (ja) * | 2000-10-18 | 2002-04-26 | Nippon Telegr & Teleph Corp <Ntt> | 信号符号化方法、装置及び信号符号化プログラムを記録した記録媒体 |
| WO2002054732A1 (fr) | 2001-01-05 | 2002-07-11 | Travere Rene | Attenuateur brouilleur de conversation applique au telephone |
| JP2002215198A (ja) * | 2001-01-16 | 2002-07-31 | Sharp Corp | 声質変換装置および声質変換方法およびプログラム記憶媒体 |
| JP2002251199A (ja) | 2001-02-27 | 2002-09-06 | Ricoh Co Ltd | 音声入力情報処理装置 |
| WO2004010627A1 (en) * | 2002-07-24 | 2004-01-29 | Applied Minds, Inc. | Method and system for masking speech |
| JP2005084645A (ja) * | 2003-09-11 | 2005-03-31 | Glory Ltd | マスキング装置 |
Non-Patent Citations (2)
| Title |
|---|
| See also references of EP1855269A4 |
| TETSURO SAEKI ET AL.: "Selection of Meaningless Steady Noise for Masking of Speech", INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J86-A, no. 2, 2003, pages 187 - 191 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008245203A (ja) * | 2007-03-29 | 2008-10-09 | Yamaha Corp | 拡声装置、拡声装置の遅延時間決定方法および拡声装置のフィルタ係数決定方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| DE602006014096D1 (de) | 2010-06-17 |
| US8065138B2 (en) | 2011-11-22 |
| CN101138020A (zh) | 2008-03-05 |
| EP1855269A4 (en) | 2009-04-22 |
| EP1855269A1 (en) | 2007-11-14 |
| JP4761506B2 (ja) | 2011-08-31 |
| US20080281588A1 (en) | 2008-11-13 |
| KR20070099681A (ko) | 2007-10-09 |
| CN101138020B (zh) | 2010-10-13 |
| KR100931419B1 (ko) | 2009-12-11 |
| JP2006243178A (ja) | 2006-09-14 |
| EP1855269B1 (en) | 2010-05-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4761506B2 (ja) | 音声処理方法と装置及びプログラム並びに音声システム | |
| JP5665134B2 (ja) | ヒアリングアシスタンス装置 | |
| AU771444B2 (en) | Noise reduction apparatus and method | |
| US8085941B2 (en) | System and method for dynamic sound delivery | |
| JP5955340B2 (ja) | 音響システム | |
| JP4649546B2 (ja) | 補聴器 | |
| KR100643310B1 (ko) | 음성 데이터의 포먼트와 유사한 교란 신호를 출력하여송화자 음성을 차폐하는 방법 및 장치 | |
| Nathwani et al. | Speech intelligibility improvement in car noise environment by voice transformation | |
| Kusumoto et al. | Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments | |
| JP2014130251A (ja) | 会話保護システム及び会話保護方法 | |
| JP5662711B2 (ja) | 音声変更装置、音声変更方法および音声情報秘話システム | |
| JP5741175B2 (ja) | 秘匿化データ生成装置、秘匿化データ生成方法、秘匿化装置、秘匿化方法及びプログラム | |
| JP5707944B2 (ja) | 快音化データ生成装置、快音化データ生成方法、快音化装置、快音化方法及びプログラム | |
| JP2012008393A (ja) | 音声変更装置、音声変更方法および音声情報秘話システム | |
| JP4680099B2 (ja) | 音声処理装置および音声処理方法 | |
| JP4785563B2 (ja) | 音声処理装置および音声処理方法 | |
| RU2589298C1 (ru) | Способ повышения разборчивости и информативности звуковых сигналов в шумовой обстановке | |
| Jokinen et al. | Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions–evaluation of two methods | |
| Gvozdeva et al. | Joint Changes in First and Second Formants of/a/,/i/,/u/Vowels in Babble Noise-a New Statistical Approach | |
| JP5662712B2 (ja) | 音声変更装置、音声変更方法および音声情報秘話システム | |
| Song et al. | Smart Wristwatches Employing Finger-Conducted Voice Transmission System | |
| Aharonson et al. | Harnessing Music to Enhance Speech Recognition | |
| JPH07129194A (ja) | 音声合成方法及び音声合成装置 | |
| Heimgärtner | Adding voice to whisper using a simple heuristic algorithm inferred from empirical observation | |
| KR20030016199A (ko) | 음성신호의 발성변환 처리에 의한 소프트사운드 기능의헤드폰 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 200680006668.0 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 2006714430 Country of ref document: EP Ref document number: 1020077019988 Country of ref document: KR |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| NENP | Non-entry into the national phase |
Ref country code: RU |
|
| WWP | Wipo information: published in national office |
Ref document number: 2006714430 Country of ref document: EP |