WO2011048813A1 - 音響処理装置、音響処理方法及び補聴器 - Google Patents
音響処理装置、音響処理方法及び補聴器 Download PDFInfo
- Publication number
- WO2011048813A1 WO2011048813A1 PCT/JP2010/006231 JP2010006231W WO2011048813A1 WO 2011048813 A1 WO2011048813 A1 WO 2011048813A1 JP 2010006231 W JP2010006231 W JP 2010006231W WO 2011048813 A1 WO2011048813 A1 WO 2011048813A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- level
- speaker
- voice
- signal
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 132
- 238000001514 detection method Methods 0.000 claims description 88
- 230000005236 sound signal Effects 0.000 description 135
- 238000010586 diagram Methods 0.000 description 41
- 238000000034 method Methods 0.000 description 41
- 230000015572 biosynthetic process Effects 0.000 description 16
- 230000003044 adaptive effect Effects 0.000 description 15
- 238000009795 derivation Methods 0.000 description 15
- 238000004590 computer program Methods 0.000 description 12
- 230000003111 delayed effect Effects 0.000 description 12
- 230000003321 amplification Effects 0.000 description 10
- 230000002238 attenuated effect Effects 0.000 description 10
- 238000003199 nucleic acid amplification method Methods 0.000 description 10
- 230000002269 spontaneous effect Effects 0.000 description 8
- 239000000470 constituent Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 4
- 230000001934 delay Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 231100000895 deafness Toxicity 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 208000016354 hearing loss disease Diseases 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L2021/065—Aids for the handicapped in understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
Definitions
- the present invention relates to an acoustic processing device and an acoustic processing method that make it easier to hear the voice of a nearby speaker by relatively enhancing the voice of the speaker near the user than the voice of the speaker far from the user. And a hearing aid.
- Patent Document 1 is an example of an acoustic processing device that emphasizes only the voice of a speaker near the user.
- a weight function calculated in advance in association with an amplitude ratio using the amplitude ratio of speech input to two microphones arranged at a distance of about 50 [cm] to 1 [m] is used. Based on the above, the near field sound is enhanced.
- FIG. 30 is a block diagram illustrating an internal configuration of the sound processing apparatus disclosed in Patent Document 1.
- the divider 1614 receives the amplitude value of the microphone 1601A calculated by the first amplitude extractor 1613A and the amplitude value of the microphone 1601B calculated by the second amplitude extractor 1613B. Next, the divider 1614 obtains the amplitude ratio between the microphones A and B based on the amplitude value of the microphone 1601A and the amplitude value of the microphone 1601B.
- the coefficient calculator 1615 calculates a weighting coefficient corresponding to the amplitude ratio calculated by the divider 1614.
- the near-field sound source separation device 1602 is configured to perform near-field speech enhancement processing using a weight function calculated in advance according to the amplitude ratio value calculated by the coefficient calculator 1615.
- the near-field sound source separation device 1602 described above is used to enhance the sound of a sound source or a speaker near the user, it is necessary to obtain a large amplitude ratio between the microphones 1601A and 1601B. For this reason, the two microphones 1601A and 1602B need to be arranged at a considerable interval. Therefore, it is difficult to apply the present invention to a small acoustic processing device that is arranged particularly when the distance between the microphones is in the range of several [mm] (millimeter) to several [cm] (centimeter).
- the amplitude ratio between the two microphones is small, it is possible to appropriately discriminate between a sound source or speaker near the user and a sound source or speaker far away from the user. Have difficulty.
- the present invention has been made in view of the above-described conventional circumstances, and provides an acoustic processing device, an acoustic processing method, and a hearing aid that efficiently emphasizes speech by a speaker in the vicinity of a user regardless of the arrangement interval of microphones.
- the purpose is to do.
- the acoustic processing apparatus of the present invention uses a respective output signal from a plurality of omnidirectional microphones to output a first directivity forming unit that outputs a first directivity signal in which a main axis of directivity is formed in the direction of the speaker.
- a second directivity forming unit that outputs a second directivity signal in which a blind spot of directivity is formed in the direction of a speaker using each output signal from the plurality of omnidirectional microphones;
- a first level calculation unit for calculating the level of the first directivity signal output by the directivity forming unit, and a level of the second directivity signal output by the second directivity formation unit Based on the second level calculation unit, the level of the first directional signal and the level of the second directional signal calculated by the first and second level calculation units,
- a speaker distance determination unit for determining the distance of the A gain deriving unit for deriving a gain to be given to the first directivity signal according to a result of the person distance determination unit, and a level of the first directivity signal using the gain derived by the gain deriving unit
- a level control unit for controlling.
- the acoustic processing method of the present invention includes a step of outputting a first directional signal in which a principal axis of directivity is formed in a speaker direction using each output signal from a plurality of omnidirectional microphones; A step of outputting a second directional signal in which a directional blind spot is formed in the direction of the speaker using each output signal of the omnidirectional microphone; and a level of the output first directional signal.
- the step of calculating the level of the output second directional signal, the level of the first directional signal and the level of the second directional signal calculated Determining the distance to the speaker, deriving a gain to be applied to the first directional signal according to the determined distance to the speaker, and using the derived gain And a step of controlling the level of the first directional signal.
- the hearing aid of the present invention includes the above sound processing device.
- the acoustic processing device According to the acoustic processing device, the acoustic processing method, and the hearing aid of the present invention, it is possible to efficiently enhance the voice of a speaker near the user regardless of the arrangement interval of the microphones.
- the block diagram which shows the internal structure of the sound processing apparatus in 1st Embodiment.
- voice waveform output by the 2nd directional microphone, and the level calculated by the 2nd level calculation part (a) Time of the audio
- the figure which shows an example of the relationship between the calculated level difference and instantaneous gain The flowchart explaining operation
- the flowchart explaining the process of gain derivation by the gain derivation unit of the sound processing apparatus in the first embodiment The block diagram which shows the internal structure of the sound processing apparatus in 2nd Embodiment.
- the block diagram which showed the internal structure of the 1st and 2nd directivity formation part The figure which shows an example of the time change of the audio
- FIG. 4A is a diagram showing a time change of a waveform of a voice signal output by the first directivity forming unit
- 4B is a diagram showing a time change of a voice segment detection result detected by the voice segment detection unit, c) A diagram showing a comparison between the level of the waveform of the audio signal output by the first directivity forming unit and the estimated noise level calculated by the audio interval detecting unit by the audio interval detecting unit.
- the figure which shows an example which the distance determination result information and the self-speech voice determination result information represent on the same time axis The figure which shows another example which the perspective determination result information and the self-speech voice determination result information represent on the same time axis
- Input / output characteristics at a level that compensates for the user's auditory characteristics The flowchart explaining operation
- FIG. 1 is a block diagram illustrating an internal configuration of the sound processing apparatus 10 according to the first embodiment.
- the sound processing apparatus 10 includes a first directional microphone 101, a second directional microphone 102, a first level calculation unit 103, a second level calculation unit 104, and a speaker distance determination unit. 105, a gain deriving unit 106, and a level control unit 107.
- the first directional microphone 101 is a unidirectional microphone having a directional main axis in the direction of the speaker, and mainly collects the direct sound of the speaker's voice.
- the first directional microphone 101 outputs the collected sound signal x1 (t) to the first level calculation unit 103 and the level control unit 107, respectively.
- the second directional microphone 102 is a unidirectional microphone or a bi-directional microphone having a directional blind spot in the direction of the speaker, does not pick up the direct sound of the speaker's voice, and is mainly a wall surface of the room.
- the reverberation sound of the speaker's voice generated by the reflection of the sound is collected.
- the second directional microphone 102 outputs the collected sound signal x2 (t) to the second level calculation unit 104.
- the arrangement interval between the first directional microphone 101 and the second directional microphone 102 is a distance of several [mm] to several [cm].
- the first level calculation unit 103 acquires the audio signal x1 (t) output from the first directional microphone 101, and uses the level Lx1 (t) [dB] of the acquired audio signal x1 (t). calculate.
- the first level calculation unit 103 outputs the calculated level Lx1 (t) of the audio signal x1 (t) to the speaker distance determination unit 105.
- Formula (1) shows an example of a calculation formula for the level Lx1 (t) calculated by the first level calculation unit 103.
- N is the number of samples necessary for level calculation.
- N 160 when the sampling frequency is 8 [kHz] and the analysis time for level calculation is 20 [msec].
- ⁇ represents a time constant and takes a value of 0 ⁇ ⁇ 1 and is determined in advance. As shown in the following formula (2), the time constant ⁇ follows the rising edge of speech quickly.
- a small time constant is used when the relationship is established.
- the relationship shown in the mathematical formula (2) is not established (the mathematical formula (3))
- a large time constant is used in order to reduce the decrease in the level between the consonant sections or phrases of the speech.
- FIG. 2 shows the sound waveform output by the first directional microphone 101 and the level Lx1 (t) when the first level calculation unit 103 calculates.
- the level Lx1 (t) is 100 [msec] when the first level calculation unit 103 is represented by the equation (2), and 400 [msec] when the time constant is represented by the equation (3). This is a calculated example.
- FIG. 2A is a diagram showing the time change of the sound waveform output from the first directional microphone 101
- FIG. 2B is the time change of the level calculated by the first level calculation unit 103. It is drawing which shows.
- the vertical axis represents amplitude
- the horizontal axis represents time [seconds].
- the vertical axis indicates the level
- the horizontal axis indicates time [seconds].
- the second level calculation unit 104 acquires the audio signal x2 (t) output from the second directional microphone 102, and calculates the level Lx2 (t) of the acquired audio signal x2.
- the second level calculation unit 104 outputs the calculated level Lx2 (t) of the audio signal x2 (t) to the speaker distance determination unit 105.
- the equation for calculating the level Lx2 (t) calculated by the second level calculator 104 is the same as the equation (1) for calculating the level Lx1 (t).
- FIG. 3 shows the sound waveform output from the second directional microphone 102 and the level Lx2 (t) when the second level calculation unit 104 calculates.
- the level Lx2 (t) is 100 [msec] when the second level calculation unit 104 is the formula (2) and 400 [msec] when the second level calculation unit 104 is the formula (3). This is a calculated example.
- FIG. 3A is a diagram showing a time change of a speech waveform output from the second directional microphone 102.
- FIG. 3B is a diagram showing a temporal change in the level calculated by the second level calculation unit 104.
- the vertical axis represents amplitude and the horizontal axis represents time [seconds].
- the vertical axis indicates the level, and the horizontal axis indicates time [seconds].
- the speaker distance determination unit 105 includes the level Lx1 (t) of the audio signal x1 (t) calculated by the first level calculation unit 103 and the audio signal x2 (t) calculated by the second level calculation unit 103. Level Lx2 (t) is acquired. The speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the acquired level Lx1 (t) and level Lx2 (t). The speaker distance determination unit 105 outputs the distance determination result information that is the determination result to the gain derivation unit 106.
- the speaker distance determination unit 105 determines the level Lx1 (t) of the audio signal x1 (t) calculated by the first level calculation unit 103 and the audio calculated by the second level calculation unit 104.
- the level Lx2 (t) of the signal x2 (t) is input.
- the speaker distance determination unit 105 determines whether or not the speaker is near the user based on the calculated level difference ⁇ Lx (t). As a distance indicating that the speaker is close to the user, for example, a case where the distance between the speaker and the user is within 2 [m] is applicable. However, the distance indicating that the speaker is close to the user is not limited to within 2 [m].
- the speaker distance determination unit 105 determines that the speaker is close to the user.
- the first threshold value ⁇ 1 is, for example, 12 [dB].
- the speaker distance determination unit 105 determines that the speaker is far away from the user.
- the second threshold value ⁇ 2 is, for example, 8 [dB].
- the speaker distance determination unit 105 determines that the speaker is slightly away from the user. judge.
- the speaker distance determination unit 105 When ⁇ Lx (t) ⁇ ⁇ 1, the speaker distance determination unit 105 outputs distance determination result information “1” indicating that the speaker is close to the user to the gain deriving unit 106.
- the perspective determination result information “1” indicates that there are many direct sounds collected by the first directional microphone 101 and there are few reverberant sounds collected by the second directional microphone 102.
- the speaker distance determination unit 105 outputs distance determination result information “ ⁇ 1” indicating that the speaker is far from the user when ⁇ Lx (t) ⁇ 2.
- the perspective determination result information “ ⁇ 1” indicates that the direct sound collected by the first directional microphone 101 is small and the reverberant sound collected by the second directional microphone 102 is large.
- the speaker distance determination unit 105 When ⁇ 2 ⁇ ⁇ Lx (t) ⁇ 1, the speaker distance determination unit 105 outputs distance determination result information “0” indicating that the speaker is slightly away from the user.
- determining the distance of the speaker based only on the level Lx1 (t) calculated by the first level calculation unit 103 is not efficient in the determination. Due to the characteristics of the first directional microphone 101, when the level Lx1 (t) alone is used, a person who is far away from the user speaks at a loud volume, and a person who is close to the user is normal. It is difficult to determine whether the volume is spoken.
- the characteristics of the first and second directional microphones 101 and 102 are as follows. When the speaker is near the user, the audio signal x1 (t) output from the first directional microphone 101 is compared with the audio signal x2 (t) output from the second directional microphone 102. Is relatively large.
- the audio signal x1 (t) output from the first directional microphone 101 is the audio signal x2 (t) output from the second directional microphone 102. And almost the same. This tendency is particularly noticeable when used in a room with a lot of reverberation.
- the speaker distance determination unit 105 does not determine whether the speaker is near or far from the user only by the level Lx1 (t) calculated by the first level calculation unit 103. Accordingly, the speaker distance determination unit 105 determines the level Lx1 (t) of the audio signal x1 (t) in which the direct sound is mainly collected and the audio signal x2 (t) in which the reverberant sound is mainly collected. The distance of the speaker is determined based on the difference from the level Lx2 (t).
- the gain deriving unit 106 derives a gain ⁇ (t) for the audio signal x1 (t) output by the first directional microphone 101 based on the perspective determination result information output by the speaker distance determination unit 105. .
- the gain deriving unit 106 outputs the derived gain ⁇ (t) to the level control unit 107.
- FIG. 4 is a diagram illustrating an example of the relationship between the level difference ⁇ Lx (t) calculated by the speaker distance determination unit 105 and the gain ⁇ (t).
- a gain ⁇ 1 is given as the gain ⁇ (t). For example, by setting “2.0” to the gain ⁇ 1, the audio signal x1 (t) is relatively emphasized.
- the gain ⁇ with respect to the audio signal x1 (t) is low.
- a gain ⁇ 2 is given as (t). For example, by setting “0.5” in the gain ⁇ 2, the audio signal x1 (t) is relatively attenuated.
- the audio signal x1 (t) is not particularly emphasized or attenuated, and thus “1.0” is given as the gain ⁇ (t).
- the value derived as the gain ⁇ (t) in the above description is the instantaneous gain ⁇ . It is given as' (t).
- the gain deriving unit 106 finally calculates the gain ⁇ (t) according to the following formula (4).
- ⁇ ⁇ represents a time constant and takes a value of 0 ⁇ ⁇ ⁇ 1, and is predetermined.
- the level control unit 107 acquires the gain ⁇ (t) derived by the gain deriving unit 106 according to the above equation (4) and the audio signal x1 (t) output by the first directional microphone 101.
- the level control unit 107 multiplies the audio signal x1 (t) output from the first directional microphone 101 by an output signal y (t) obtained by multiplying the gain ⁇ (t) derived by the gain deriving unit 106. Generate.
- FIG. 5 is a flowchart for explaining the operation of the sound processing apparatus 10 according to the first embodiment.
- the first directional microphone 101 picks up the direct sound of the speaker's voice (S101).
- the second directional microphone 102 collects the reverberant sound of the speaker's voice (S102).
- the sound collection processing of each sound by the first directional microphone 101 and the second directional microphone 102 is performed at the same timing.
- the first directional microphone 101 outputs the collected audio signal x1 (t) to the first level calculation unit 103 and the level control unit 107, respectively. Further, the second directional microphone 102 outputs the collected audio signal x2 (t) to the second level calculation unit 104.
- the first level calculation unit 103 acquires the audio signal x1 (t) output from the first directional microphone 101, and calculates the level Lx1 (t) of the acquired audio signal x1 (t) ( S103).
- the second level calculation unit 104 acquires the audio signal x2 (t) output from the second directional microphone 102, and calculates the level Lx2 (t) of the acquired audio signal x2. (S104).
- the first level calculation unit 103 outputs the calculated level Lx1 (t) to the speaker distance determination unit 105. Also, the second level calculation unit 104 outputs the calculated level Lx2 (t) to the speaker distance determination unit 105.
- the speaker distance determination unit 105 acquires the level Lx1 (t) calculated by the first level calculation unit 103 and the level Lx2 (t) calculated by the second level calculation unit 104.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the acquired level difference ⁇ Lx (t) between the level Lx1 (t) and the level Lx2 (t) ( S105).
- the speaker distance determination unit 105 outputs the distance determination result information, which is the determined result, to the gain deriving unit 106.
- the gain derivation unit 106 acquires the perspective determination result information output by the speaker distance determination unit 105.
- the gain deriving unit 106 derives a gain ⁇ (t) for the audio signal x1 (t) output by the first directional microphone 101 based on the perspective determination result information output by the speaker distance determination unit 105. (S106).
- the gain deriving unit 106 outputs the derived gain ⁇ (t) to the level control unit 107.
- the level control unit 107 acquires the gain ⁇ (t) derived by the gain deriving unit 106 and the audio signal x1 (t) output by the first directional microphone 101.
- the level control unit 107 multiplies the audio signal x1 (t) output from the first directional microphone 101 by an output signal y (t) obtained by multiplying the gain ⁇ (t) derived by the gain deriving unit 106.
- Generate (S107) generates the gain ⁇ (t) derived by the gain deriving unit 106 and the audio signal x1 (t) output by the first directional microphone 101.
- FIG. 6 is a flowchart illustrating details of the operation of the gain deriving unit 106.
- the perspective determination result information is “1”, that is, if the level difference ⁇ Lx (t) ⁇ ⁇ 1 (S1061, YES), “2.0” is set as the instantaneous gain ⁇ ′ (t) for the audio signal x1 (t). Is derived (S1062).
- the perspective determination result information is “ ⁇ 1”, that is, when the level difference ⁇ Lx (t) ⁇ 2 (YES in S1063), the instantaneous gain ⁇ ′ (t) with respect to the audio signal x1 (t) is “0.5”. Is derived (S1064).
- the gain deriving unit 106 calculates the gain ⁇ (t) according to the above equation (4) (S1066).
- the speaker is not received from the user. It is determined whether the person is near or far away. Specifically, in the present embodiment, the audio signals x1 (t) and x2 (t) collected from the first and second directional microphones having an arrangement interval of about several [mm] to several [cm], respectively. ), The distance of the speaker is determined based on the level difference ⁇ Lx (t).
- the gain calculated according to the determination result is multiplied by the voice signal output to the first directional microphone that picks up the direct sound of the speaker, and the level is controlled.
- the voice of a speaker who is close to the user such as a conversation partner
- the voice of a speaker who is far from the user is attenuated or suppressed.
- FIG. 7 is a block diagram illustrating an internal configuration of the sound processing apparatus 11 according to the first embodiment.
- the sound processing apparatus 11 includes a directivity sound collection unit 1101, a first level calculation unit 103, a second level calculation unit 104, a speaker distance determination unit 105, a gain derivation unit 106, and a level.
- a control unit 107 is included.
- the directivity sound collection unit 1101 includes a microphone array 1102, a first directivity formation unit 1103, and a second directivity formation unit 1104.
- the microphone array 1102 is an array in which a plurality of omnidirectional microphones are arranged.
- the configuration of FIG. 7 is an example of an array configured by two omnidirectional microphones.
- the first directivity forming unit 1103 uses the audio signals output from the two omnidirectional microphones of the microphone array 1102 to form directivity having a main axis of directivity in the direction of the speaker. Pick up the direct sound of.
- the first directivity forming unit 1103 outputs the sound signal x1 (t) on which directivity is formed to the first level calculation unit 103 and the level control unit 107, respectively.
- 2nd directivity formation part 1104 forms the directivity which has a directional blind spot in the direction of a speaker using the audio
- the second directivity forming unit 1104 does not pick up the direct sound of the speaker's voice, but picks up the reverberant sound of the speaker's voice generated mainly by reflection of the wall surface of the room.
- the second directivity forming unit 1104 outputs the audio signal x2 (t) on which directivity is formed to the second level calculating unit 104.
- FIG. 8 is a block diagram showing an internal configuration of the directivity sound collecting unit 1101 shown in FIG. 7, and is a diagram for explaining a method of forming a sound pressure gradient type directivity.
- the microphone array 1102 uses two omnidirectional microphones 1201-1 and 1201-2.
- the first directivity forming unit 1103 includes a delay unit 1202, an arithmetic unit 1203, and an EQ 1204.
- the delay unit 1202 acquires the audio signal output from the omnidirectional microphone 1201-2, and delays the acquired audio signal by a predetermined amount.
- the amount of delay by the delay device 1202 is a value corresponding to the delay time D / c [s], for example, where the microphone interval is D [m] and the sound speed is c [m / s].
- the delay unit 1202 outputs the audio signal delayed by a predetermined amount to the arithmetic unit 1203.
- the computing unit 1203 acquires the audio signal output from the omnidirectional microphone 1201-1 and the audio signal delayed by the delay unit 1202.
- the computing unit 1203 calculates a difference obtained by subtracting the audio signal delayed by the delay unit 1202 from the audio signal output by the omnidirectional microphone 1201-1, and outputs the calculated audio signal to the EQ 1204.
- the equalizer EQ1204 compensates mainly for the low frequency band of the audio signal output by the computing unit 1203.
- the difference between the audio signal output from the omnidirectional microphone 1201-1 by the arithmetic unit 1203 and the audio signal delayed by the delay unit 1202 is a low frequency band signal.
- EQ 1204 is inserted in order to flatten the frequency characteristics in the direction of the speaker.
- the second directivity forming unit 1104 includes a delay unit 1205, a computing unit 1206, and an EQ 1207.
- the second directivity forming unit 1104 has an input signal opposite to that of the first directivity forming unit 1103.
- the delay unit 1205 acquires the audio signal output from the omnidirectional microphone 1201-1 and delays the acquired audio signal by a predetermined amount.
- the delay amount by the delay unit 1205 is a value corresponding to, for example, the delay time D / c [s], where D [m] is the microphone interval and c [m / s] is the sound speed.
- the delay unit 1205 outputs the audio signal delayed by a predetermined amount to the computing unit 1206.
- the computing unit 1206 acquires the audio signal output from the omnidirectional microphone 1201-2 and the audio signal delayed by the delay unit 1205, respectively.
- the arithmetic unit 1206 calculates the difference between the audio signal output from the omnidirectional microphone 1201-2 and the audio signal delayed by the delay unit 1205, and outputs the calculated audio signal to the EQ 1207.
- the equalizer EQ 1207 compensates mainly for the low frequency band of the audio signal output by the computing unit 1206.
- the difference between the audio signal output from the omnidirectional microphone 1201-2 by the arithmetic unit 1206 and the audio signal delayed by the delay unit 1205 is a low frequency band signal.
- EQ1207 is inserted in order to flatten the frequency characteristic in the direction of the speaker.
- the first level calculation unit 103 acquires the audio signal x1 (t) output by the first directivity forming unit 1103, and the level Lx1 (t) [dB] of the acquired audio signal x1 (t). Is calculated according to Equation (1) above.
- the first level calculation unit 103 outputs the calculated level Lx1 (t) of the audio signal x1 (t) to the speaker distance determination unit 105.
- N is the number of samples necessary for level calculation.
- ⁇ represents a time constant and takes a value of 0 ⁇ ⁇ 1 and is determined in advance.
- ⁇ a small time constant is used when the relationship shown in the above formula (2) is established so as to quickly follow the rising of the voice.
- the relationship shown in the mathematical expression (2) is not established (the mathematical expression (3))
- a large time constant is used in order to reduce a decrease in the level between consonant sections and phrases of speech.
- FIG. 9 shows the speech waveform output by the first directivity forming unit 1103 and the level Lx1 (t) calculated by the first level calculation unit 103.
- the calculated level Lx1 (t) is calculated by the first level calculation unit 103 so that the time constant is 100 [msec] in the equation (2) and 400 [msec] in the equation (3). This is an example.
- FIG. 9A is a diagram showing a time change of the speech waveform output by the first directivity forming unit 1103, and FIG. 9B is a diagram showing the level calculated by the first level calculating unit 103. It is drawing which shows a time change.
- the vertical axis represents amplitude
- the horizontal axis represents time [seconds].
- the vertical axis indicates the level
- the horizontal axis indicates time [seconds].
- the second level calculation unit 104 acquires the audio signal x2 (t) output by the second directivity forming unit 1104, and calculates the level Lx2 (t) of the acquired audio signal x2.
- the second level calculation unit 104 outputs the calculated level Lx2 (t) of the audio signal x2 (t) to the speaker distance determination unit 105.
- the equation for calculating the level Lx2 (t) calculated by the second level calculator 104 is the same as the equation (1) for calculating the level Lx1 (t).
- FIG. 10 shows the speech waveform output by the second directivity forming unit 1104 and the level Lx2 (t) when the second level calculating unit 104 calculates.
- the calculated level Lx2 (t) is calculated by the second level calculation unit 104 so that the time constant is 100 [msec] in the equation (2) and 400 [msec] in the equation (3). This is an example.
- FIG. 10A is a diagram showing a time change of the speech waveform output by the second directivity forming unit 1104.
- FIG. 10B is a diagram showing the time change of the level calculated by the second level calculation unit 104.
- the vertical axis represents amplitude
- the horizontal axis represents time [seconds].
- the vertical axis indicates the level
- the horizontal axis indicates time [seconds].
- the speaker distance determination unit 105 includes the level Lx1 (t) of the audio signal x1 (t) calculated by the first level calculation unit 103 and the audio signal x2 (t) calculated by the second level calculation unit 103. Level Lx2 (t) is acquired. The speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the acquired level Lx1 (t) and level Lx2 (t). The speaker distance determination unit 105 outputs the distance determination result information that is the determination result to the gain derivation unit 106.
- the speaker distance determination unit 105 determines the level Lx1 (t) of the audio signal x1 (t) calculated by the first level calculation unit 103 and the audio calculated by the second level calculation unit 104.
- the level Lx2 (t) of the signal x2 (t) is input.
- the speaker distance determination unit 105 determines whether or not the speaker is near the user based on the calculated level difference ⁇ Lx (t). As a distance indicating that the speaker is close to the user, for example, a case where the distance between the speaker and the user is within 2 [m] is applicable. However, the distance indicating that the speaker is close to the user is not limited to within 2 [m].
- the speaker distance determination unit 105 determines that the speaker is close to the user.
- the first threshold value ⁇ 1 is, for example, 12 [dB].
- the speaker distance determination unit 105 determines that the speaker is far away from the user.
- the second threshold value ⁇ 2 is, for example, 8 [dB].
- the speaker distance determination unit 105 determines that the speaker is slightly away from the user. judge.
- FIG. 11 shows the relationship between the level difference ⁇ Lx (t) calculated by the above method using the data recorded by two actual omnidirectional microphones and the distance between the user and the speaker. It is shown in a graph. From FIG. 11, it can be confirmed that the level difference ⁇ Lx (t) decreases as the speaker becomes farther from the user.
- the voice of the speaker within about 2 [m] is emphasized.
- the voice of a speaker of about 4 [m] or more can be attenuated.
- the speaker distance determination unit 105 When ⁇ Lx (t) ⁇ ⁇ 1, the speaker distance determination unit 105 outputs distance determination result information “1” indicating that the speaker is close to the user to the gain deriving unit 106.
- the perspective determination result information “1” indicates that there are many direct sounds collected by the first directivity forming unit 1103 and few reverberant sounds collected by the second directivity forming unit 1104.
- the speaker distance determination unit 105 outputs distance determination result information “ ⁇ 1” indicating that the speaker is far from the user when ⁇ Lx (t) ⁇ 2.
- the perspective determination result information “ ⁇ 1” indicates that the direct sound collected by the first directivity forming unit 1103 is small and the reverberant sound collected by the second directivity forming unit 1104 is large.
- the speaker distance determination unit 105 When ⁇ 2 ⁇ ⁇ Lx (t) ⁇ 1, the speaker distance determination unit 105 outputs distance determination result information “0” indicating that the speaker is slightly away from the user.
- determining the distance of the speaker based only on the level Lx1 (t) calculated by the first level calculation unit 103 is not efficient in the determination. . Due to the characteristics of the first directivity forming unit 1103, when the level Lx1 (t) alone is used, a person who is far away from the user speaks at a loud volume, and a person who is close to the user is normal. It is difficult to determine whether or not you speak at a volume of.
- the characteristics of the first and second directivity forming units 1103 and 1104 are as follows. When the speaker is near the user, the audio signal x1 (t) output from the first directivity forming unit 1103 is the audio signal x2 (t) output from the second directivity forming unit 1104. Is relatively large compared to
- the audio signal x1 (t) output by the first directivity forming unit 1103 is the audio signal x2 (2) output by the second directivity forming unit 1104. It is almost the same as t). This tendency is particularly noticeable when used in a room with a lot of reverberation.
- the speaker distance determination unit 105 does not determine whether the speaker is near or far from the user only by the level Lx1 (t) calculated by the first level calculation unit 103. Accordingly, the speaker distance determination unit 105 determines the level Lx1 (t) of the audio signal x1 (t) in which the direct sound is mainly collected and the audio signal x2 (t) in which the reverberant sound is mainly collected. The distance of the speaker is determined based on the difference from the level Lx2 (t).
- the gain deriving unit 106 derives a gain ⁇ (t) for the audio signal x1 (t) output by the first directivity forming unit 1103 based on the perspective determination result information output by the speaker distance determining unit 105. To do.
- the gain deriving unit 106 outputs the derived gain ⁇ (t) to the level control unit 107.
- the gain ⁇ (t) is determined based on the perspective determination result information or the level difference ⁇ Lx (t).
- the relationship between the level difference ⁇ Lx (t) calculated by the speaker distance determination unit 105 and the gain ⁇ (t) is the same as the relationship illustrated in FIG. 4 in the first embodiment.
- a gain ⁇ 1 is given as the gain ⁇ (t). For example, by setting “2.0” to the gain ⁇ 1, the audio signal x1 (t) is relatively emphasized.
- the gain ⁇ with respect to the audio signal x1 (t) is low.
- a gain ⁇ 2 is given as (t). For example, by setting “0.5” in the gain ⁇ 2, the audio signal x1 (t) is relatively attenuated.
- the audio signal x1 (t) is not particularly emphasized or attenuated, and thus “1.0” is given as the gain ⁇ (t).
- the value derived as the gain ⁇ (t) in the above description is the instantaneous gain ⁇ . It is given as' (t).
- the gain deriving unit 106 calculates the gain ⁇ (t) according to the equation (4).
- ⁇ represents a time constant and takes a value of 0 ⁇ ⁇ ⁇ 1, and is predetermined.
- the level control unit 107 acquires the gain ⁇ (t) derived by the gain deriving unit 106 according to the above equation (4) and the audio signal x1 (t) output by the first directivity forming unit 1103.
- the level control unit 107 multiplies the audio signal x1 (t) output by the first directivity forming unit 1103 by the gain ⁇ (t) derived by the gain deriving unit 106 to output signal y (t). Is generated.
- FIG. 12 is a flowchart for explaining the operation of the sound processing apparatus 11 according to the second embodiment.
- the first directivity forming unit 1103 forms directivity related to the direct sound component from the speaker with respect to the audio signals output from the microphone array 1102 of the directivity sound collecting unit 1101 (S651).
- the first directivity forming unit 1103 outputs the sound signal having the directivity formed to the first level calculation unit 103 and the level control unit 107, respectively.
- the second directivity forming unit 1104 forms directivity related to the reverberant sound component from the speaker with respect to the audio signals respectively output from the microphone array 1102 of the directivity sound collecting unit 1101 (S652). .
- the second directivity forming unit 1104 outputs the audio signal in which the directivity is formed to the second level calculating unit 104.
- the first level calculation unit 103 acquires the audio signal x1 (t) output from the first directivity forming unit 1103, and calculates the level Lx1 (t) of the acquired audio signal x1 (t). (S103).
- the second level calculation unit 104 acquires the audio signal x2 (t) output by the second directivity forming unit 1104, and calculates the level Lx2 (t) of the acquired audio signal x2. (S104).
- the first level calculation unit 103 outputs the calculated level Lx1 (t) to the speaker distance determination unit 105. Also, the second level calculation unit 104 outputs the calculated level Lx2 (t) to the speaker distance determination unit 105.
- the speaker distance determination unit 105 acquires the level Lx1 (t) calculated by the first level calculation unit 103 and the level Lx2 (t) calculated by the second level calculation unit 104.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the acquired level difference ⁇ Lx (t) between the level Lx1 (t) and the level Lx2 (t) ( S105).
- the speaker distance determination unit 105 outputs the distance determination result information, which is the determined result, to the gain deriving unit 106.
- the gain derivation unit 106 acquires the perspective determination result information output by the speaker distance determination unit 105.
- the gain deriving unit 106 derives a gain ⁇ (t) for the audio signal x1 (t) output by the first directivity forming unit 1103 based on the perspective determination result information output by the speaker distance determining unit 105. (S106).
- the details of deriving the gain ⁇ (t) have been described with reference to FIG. 6 in the first embodiment, and thus the description thereof is omitted.
- the gain deriving unit 106 outputs the derived gain ⁇ (t) to the level control unit 107.
- the level control unit 107 acquires the gain ⁇ (t) derived by the gain deriving unit 106 and the audio signal x1 (t) output by the first directivity forming unit 1103.
- the level control unit 107 multiplies the audio signal x1 (t) output by the first directivity forming unit 1103 by the gain ⁇ (t) derived by the gain deriving unit 106 to output signal y (t). Is generated (S107).
- the sound processing device As described above, in the sound processing device according to the second embodiment, sound is collected by a microphone array having a plurality of omnidirectional microphones arranged at intervals of several [mm] to several [cm].
- the apparatus speaks according to the magnitude of the level difference ⁇ Lx (t) between the audio signal x1 (t) and the x2 (t) whose directivities are respectively formed by the first and second directivity forming units. It is determined whether the person is near or far from the user.
- the gain calculated according to the determination result is multiplied by the voice signal output to the first directivity forming unit that picks up the direct sound of the speaker, and the level is controlled.
- the voice of a speaker who is close to the user such as a conversation partner is emphasized, and conversely, the voice of a speaker who is far from the user is attenuated or suppressed.
- sharp directivity can be formed in the direction of the speaker by increasing the number of non-directional microphones constituting the microphone array, and the distance of the speaker can be determined with high accuracy.
- FIG. 13 is a block diagram illustrating an internal configuration of the sound processing apparatus 12 according to the third embodiment.
- the sound processing device 12 of the third embodiment is different from the sound processing device 11 of the second embodiment in that the sound processing device 12 further includes a component that is a voice section detection unit 501 as shown in FIG.
- the same components as those in FIG. 7 are denoted by the same reference numerals, and description of the components is omitted.
- the voice section detection unit 501 acquires the voice signal x1 (t) output by the first directivity forming unit 1103.
- the voice section detection unit 501 detects a section in which a speaker who does not include the user of the sound processing device 12 is speaking using the voice signal x1 (t) output from the first directivity forming unit 1103. To do.
- the voice segment detection unit 501 outputs the detected voice segment detection result information to the speaker distance determination unit 105.
- FIG. 14 is a block diagram showing an example of the internal configuration of the speech section detection unit 501.
- the speech segment detection unit 501 includes a third level calculation unit 601, an estimated noise level calculation unit 602, a level comparison unit 603, and a speech segment determination unit 604.
- the third level calculation unit 601 calculates the level Lx3 (t) of the audio signal x1 (t) output by the first directivity forming unit 1103 according to the above mathematical formula (1). Note that the estimated noise level calculation unit 602 and the level comparison unit 603 use the level Lx1 (t) of the audio signal x1 (t) calculated by the first level calculation unit 103 instead of the level Lx3 (t), respectively. You can enter it.
- the third level calculation unit 601 outputs the calculated level Lx3 (t) to the estimated noise level calculation unit 602 and the level comparison unit 603, respectively.
- the estimated noise level calculation unit 602 acquires the level Lx3 (t) output by the third level calculation unit 601.
- the estimated noise level calculation unit 602 calculates an estimated noise level Nx (t) [dB] with respect to the acquired level Lx3 (t).
- Formula (5) shows an example of a calculation formula for the estimated noise level Nx (t) calculated by the estimated noise level calculation unit 602.
- ⁇ N is a time constant and takes a value of 0 ⁇ N ⁇ 1, and is predetermined.
- ⁇ N a large time constant is used when Lx3 (t)> Nx (t ⁇ 1) so that the estimated noise level Nx (t) does not increase in the speech section.
- the estimated noise level calculation unit 602 outputs the calculated estimated noise level Nx (t) to the level comparison unit 603.
- the level comparison unit 603 acquires the estimated noise level Nx (t) calculated by the estimated noise level calculation unit 602 and the level Lx3 (t) calculated by the third level calculation unit 601. The level comparison unit 603 compares the level Lx3 (t) with the noise level Nx (t), and outputs the compared comparison result information to the speech section determination unit 604.
- the voice segment determination unit 604 acquires the comparison result information output by the level comparison unit 603. Based on the acquired comparison result information, the voice section determination unit 604 is a section in which the speaker utters a voice with respect to the voice signal x1 (t) output by the first directivity forming unit 1103. Determine.
- the speech segment determination unit 604 outputs speech segment detection result information, which is a speech segment detection result determined as a speech segment, to the speaker distance determination unit 105.
- the level comparison unit 603 In the comparison between the level Lx3 (t) and the estimated noise level Nx (t), the level comparison unit 603 has a difference between the level Lx3 (t) and the estimated noise level Nx (t) equal to or greater than the third threshold value ⁇ N.
- the section is output as “voice section” to the voice section determination unit 604.
- the third threshold value ⁇ N is, for example, 6 [dB].
- the level comparison unit 603 compares the level Lx3 (t) with the estimated noise level Nx (t), and sets a section whose difference is less than the third threshold value ⁇ N as a “non-speech section” as a speech section determination unit 604. Output to.
- FIG. 15 shows a comparison between the waveform of the audio signal output from the first directivity forming unit 1103, the detection result by the audio section determining unit 604, and the level calculated by the third level calculating unit 601 and the noise estimation level. It is drawing which showed the time change of the result.
- FIG. 15A is a diagram showing the time change of the waveform of the audio signal x1 (t) output by the first directivity forming unit 1103.
- FIG. 15A the vertical axis represents amplitude, and the horizontal axis represents time [seconds].
- FIG. 15 (b) is a diagram showing a change over time in the speech segment detection result detected by the speech segment determination unit 604.
- the vertical axis indicates the voice section detection result
- the horizontal axis indicates time [seconds].
- FIG. 15C shows the level Lx3 (t) and the estimated noise level Nx (t) for the waveform of the audio signal x1 (t) output by the first directivity forming unit 1103 in the audio section determination unit 604. It is a figure which shows comparison of these.
- shaft shows a level and a horizontal axis shows time [second].
- FIG. 15C shows an example in which the time constant for Lx3 (t) ⁇ Nx (t ⁇ 1) is 1 [second], and the time constant for Lx3 (t)> Nx (t ⁇ 1) is 120 [second]. is there. 15B and 15C show the level Lx3 (t), the noise level Nx (t), and (Nx (t) + ⁇ N) when the third threshold value ⁇ N is 6 [dB]. A voice detection result is shown.
- the speaker distance determination unit 105 acquires the voice segment detection result information output by the voice segment determination unit 604 of the voice segment detection unit 501.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user only in the voice section detected by the voice section detection unit 501 based on the acquired voice section detection result information.
- the speaker distance determination unit 105 outputs the determined distance determination result information to the gain deriving unit 106.
- FIG. 16 is a flowchart for explaining the operation of the sound processing apparatus 12 according to the third embodiment.
- the description of the same operation as that of the sound processing apparatus 11 of the second embodiment shown in FIG. 12 is omitted, and the processes related to the above-described components are mainly described.
- the first directivity forming unit 1103 outputs the audio signal x1 (t) formed in step S651 to the audio section detection unit 501 and the level control unit 107, respectively.
- the voice section detection unit 501 acquires the voice signal x1 (t) output by the first directivity forming unit 1103.
- the speech section detection unit 501 detects a section where the speaker is speaking using the speech signal x1 (t) output by the first directivity forming unit 1103 in step S651 (S321).
- the voice segment detection unit 501 outputs the detected voice segment detection result information to the speaker distance determination unit 105.
- the third level calculation unit 601 uses the level Lx3 (t) of the voice signal x1 (t) output by the first directivity forming unit 1103 according to the above-described equation (1). calculate.
- the third level calculation unit 601 outputs the calculated level Lx3 (t) to the estimated noise level calculation unit 602 and the level comparison unit 603, respectively.
- the estimated noise level calculation unit 602 acquires the level Lx3 (t) output by the third level calculation unit 601. The estimated noise level calculation unit 602 calculates an estimated noise level Nx (t) for the acquired level Lx3 (t). The estimated noise level calculation unit 602 outputs the calculated estimated noise level Nx (t) to the level comparison unit 603.
- the level comparison unit 603 acquires the estimated noise level Nx (t) calculated by the estimated noise level calculation unit 602 and the level Lx3 (t) calculated by the third level calculation unit 601. The level comparison unit 603 compares the level Lx3 (t) with the noise level Nx (t), and outputs the compared comparison result information to the speech section determination unit 604.
- the voice segment determination unit 604 acquires the comparison result information output by the level comparison unit 603. Based on the acquired comparison result information, the voice section determination unit 604 is a section in which the speaker utters a voice with respect to the voice signal x1 (t) output by the first directivity forming unit 1103. Determine.
- the speech segment determination unit 604 outputs speech segment detection result information, which is a speech segment detection result determined as a speech segment, to the speaker distance determination unit 105.
- the speaker distance determination unit 105 acquires the voice segment detection result information output by the voice segment determination unit 604 of the voice segment detection unit 501.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user only in the voice section detected by the voice section detection unit 501 based on the acquired voice section detection result information (S105). ). Since the contents after these processes are the same as those of the second embodiment (see FIG. 12), the description thereof is omitted.
- the sound formed by the first directivity forming unit by the sound section detection unit 501 added to the internal configuration of the sound processing device according to the second embodiment.
- a speech segment of the signal is detected. Only in the detected voice section, it is determined whether the speaker is near or far from the user.
- the gain calculated according to the determination result is multiplied by the voice signal output to the first directivity forming unit that picks up the direct sound of the speaker, and the level is controlled.
- the voice of a speaker who is close to the user such as a conversation partner
- the voice of a speaker who is far from the user is attenuated or suppressed.
- the distance to the speaker is determined only in the voice section of the voice signal x1 (t) output by the first directivity forming unit, the distance to the speaker can be determined with high accuracy.
- FIG. 17 is a block diagram illustrating an internal configuration of the sound processing apparatus 13 according to the fourth embodiment.
- the acoustic processing device 13 of the fourth embodiment is different from the acoustic processing device 12 of the third embodiment in that the constituent elements of the self-speech speech determination unit 801 and the perspective determination threshold setting unit 802 are as shown in FIG. Furthermore, it has a point.
- the same components as those in FIG. 13 are denoted by the same reference numerals and description thereof is omitted.
- the self-spoken voice represents the voice uttered by the user wearing the hearing aid equipped with the acoustic processing device 13 of the fourth embodiment.
- the voice section detection unit 501 acquires the voice signal x1 (t) output by the first directivity forming unit 1103.
- the voice section detection unit 501 detects a section in which the user of the sound processing device 13 or the speaker is speaking using the voice signal x1 (t) output from the first directivity forming unit 1103.
- the speech segment detection unit 501 outputs the detected speech segment detection result information to the speaker distance determination unit 105 and the self-speech speech determination unit 801, respectively.
- Specific components of the voice section detection unit 501 are the same as those shown in FIG.
- the self-speech voice determination unit 801 acquires the voice segment detection result information output from the voice segment detection unit 501.
- the speech utterance determination unit 801 uses the absolute sound pressure level of level Lx3 (t) in the speech segment based on the acquired speech segment detection result information, and the speech detected by the speech segment detection unit 501 It is determined whether or not the voice.
- the self-speech voice determination unit 801 determines that the voice corresponding to the level Lx3 (t) is a self-speech voice.
- the fourth threshold value ⁇ 4 is, for example, 74 [dB (SPL)].
- the self-speech speech determination unit 801 outputs the self-speech speech determination result information corresponding to the determined result to the distance determination threshold setting unit 802 and the speaker distance determination unit 105, respectively.
- the self-uttered voice may be input to the user's ear at an unnecessarily high level, which is not preferable from the viewpoint of protecting the user's ear. Accordingly, when the speech corresponding to the level Lx3 (t) is determined to be a self-speech speech, the self-speech speech determination unit 801 outputs “0” or “ ⁇ 1” as the self-speech speech determination result information.
- the level of the spontaneous speech itself is not controlled by the level control unit 107.
- the perspective determination threshold setting unit 802 acquires the self-speech voice determination result information output by the self-speech voice determination unit 801.
- the perspective determination threshold setting unit 802 directly uses the speech signals x1 (t) and x2 (t) of the speech section determined as the self-speech speech by the self-speech speech determination unit 801, and is directly included in the speech signal x2 (t). Remove sound components.
- the perspective determination threshold value setting unit 802 calculates a reverberation level included in the audio signal x2 (t).
- the perspective determination threshold value setting unit 802 sets the first threshold value ⁇ 1 and the second threshold value ⁇ 2 according to the calculated reverberation level.
- FIG. 18 shows an example of the internal configuration of the perspective determination threshold value setting unit 802 using an adaptive filter.
- FIG. 18 is a block diagram showing an internal configuration of the perspective determination threshold value setting unit 802.
- the perspective determination threshold setting unit 802 includes an adaptive filter 901, a delay unit 902, a difference signal calculation unit 903, and a determination threshold setting unit 904.
- the adaptive filter 901 convolves the coefficient of the adaptive filter 901 with the audio signal x1 (t) output by the first directivity forming unit 1103. Next, the adaptive filter 901 outputs the convoluted audio signal yh (t) to the difference signal calculation unit 903 and the determination threshold setting unit 904, respectively.
- the delay device 902 delays the audio signal x2 (t) output from the second directivity forming unit 1104 by a predetermined amount, and outputs the delayed audio signal x2 (t ⁇ D) to the difference signal calculation unit 903. .
- the parameter D represents the number of samples delayed by the delay unit 902.
- the difference signal calculation unit 903 acquires the audio signal yh (t) output from the adaptive filter 901 and the audio signal x2 (t ⁇ D) delayed by the delay unit 902.
- the difference signal calculation unit 903 calculates a difference signal e (t) that is a difference between the sound signal x2 (t ⁇ D) and the sound signal yh (t).
- the difference signal calculation unit 903 outputs the calculated difference signal e (t) to the determination threshold setting unit 904.
- the adaptive filter 901 uses the difference signal e (t) calculated by the difference signal calculation unit 903 to update the filter coefficient.
- the filter coefficient is adjusted so that the direct sound component included in the audio signal x2 (t) output by the second directivity forming unit 1104 is removed.
- the tap length of the adaptive filter 901 is such that only the direct sound component of the audio signal x2 (t) output by the second directivity forming unit 1104 is removed and the reverberant sound component of the audio signal x2 (t) is different. Since it is output as a signal, it is relatively short.
- the tap length of the adaptive filter 901 is a length corresponding to several [m seconds] to several tens [m seconds].
- the delay device 902 that delays the audio signal x2 (t) output by the second directivity forming unit 1104 is inserted to satisfy the causality with the first directivity forming unit 1103. This is because the audio signal x1 (t) output from the first directivity forming unit 1103 always has a predetermined amount of delay when it passes through the adaptive filter 901.
- the number of samples to be delayed is set to a value about half the tap length of the adaptive filter 901.
- the determination threshold value setting unit 904 acquires the difference signal e (t) output from the difference signal calculation unit 903 and the audio signal yh (t) output from the adaptive filter 901, respectively.
- the determination threshold setting unit 904 calculates a level Le (t) using the acquired difference signal e (t) and audio signal yh (t), and sets the first threshold ⁇ 1 and the second threshold ⁇ 2. To do.
- Level Le (t) [dB] is calculated according to Equation (6).
- the parameter L is the number of samples for level calculation.
- Equation (6) in order to reduce the dependency of the difference signal e (t) on the absolute level, normalization is performed with the level of the audio signal yh (t) output by the adaptive filter 901 corresponding to the direct sound estimation signal. ing.
- the level Le (t) increases when the reverberant component is large, and decreases when the reverberant component is small.
- Le (t) has a value close to ⁇ [dB].
- the denominator and the numerator are the same level in Equation (6), and thus the value is close to 0 [dB].
- the predetermined amount is, for example, ⁇ 10 [dB].
- the first threshold value ⁇ 1 is set to a small value.
- the second directivity forming unit 1104 does not collect much reverberant sound.
- the predetermined amount is, for example, ⁇ 10 [dB].
- the first threshold value ⁇ 1 The second threshold value ⁇ 2 is set to a large value.
- the speaker distance determination unit 105 includes first and second voice interval detection result information by the voice interval detection unit 501, self-speech voice determination result information by the self-speech voice determination unit 801, and first and second values set by the distance determination threshold setting unit 802. Threshold values ⁇ 1 and ⁇ 2 are input. Next, the utterer distance determination unit 105 determines that the utterer is closer to the user based on the input voice section detection result information, the own utterance voice determination result information, and the set first and second threshold values ⁇ 1 and ⁇ 2. It is determined whether or not. The speaker distance determination unit 105 outputs the determined distance determination result information to the gain deriving unit 106.
- FIG. 19 is a flowchart for explaining the operation of the sound processing apparatus 13 according to the fourth embodiment.
- FIG. 19 the description of the same operation as that of the sound processing apparatus 13 of the third embodiment shown in FIG. 16 is omitted, and the processing related to the above-described components will be mainly described.
- the speech segment detection unit 501 outputs the detected speech segment detection result information to the speaker distance determination unit 105 and the self-speech speech determination unit 801, respectively.
- the self-speech voice determination unit 801 acquires the voice segment detection result information output from the voice segment detection unit 501.
- the speech utterance determination unit 801 uses the absolute sound pressure level of level Lx3 (t) in the speech segment based on the acquired speech segment detection result information, and the speech detected by the speech segment detection unit 501 It is determined whether or not it is voice (S431).
- the self-speech speech determination unit 801 outputs the self-speech speech determination result information corresponding to the determined result to the distance determination threshold setting unit 802 and the speaker distance determination unit 105, respectively.
- the perspective determination threshold setting unit 802 acquires the self-speech voice determination result information output by the self-speech voice determination unit 801.
- the perspective determination threshold setting unit 802 is included in the audio signal x2 (t) using the audio signals x1 (t) and x2 (t) of the audio section determined as the self-uttered speech by the self-uttered speech determining unit 801. Calculate the reverberation level.
- the perspective determination threshold value setting unit 802 sets the first threshold value ⁇ 1 and the second threshold value ⁇ 2 according to the calculated reverberation level (S432).
- the speaker distance determination unit 105 includes first and second voice interval detection result information by the voice interval detection unit 501, self-speech voice determination result information by the self-speech voice determination unit 801, and first and second values set by the distance determination threshold setting unit 802. Threshold values ⁇ 1 and ⁇ 2 are input. Next, the speaker distance determination unit 105 determines that the speaker is close to the user based on the input voice section detection result information, the self-speech voice determination result information, and the set first and second threshold values ⁇ 1 and ⁇ 2. It is determined whether or not the user is present (S105).
- the speaker distance determination unit 105 outputs the determined distance determination result information to the gain deriving unit 106. Since the contents after these processes are the same as those in the first embodiment (see FIG. 5), the description thereof is omitted.
- sound is collected by the first directivity forming unit by the self-speech voice determination unit added to the internal configuration of the sound processing device according to the third embodiment. It is determined whether or not the speech signal x1 (t) contains a speech voice.
- the speech signals respectively collected by the second directivity forming unit The included reverberation level is calculated. Further, the perspective determination threshold value setting unit sets the first threshold value ⁇ 1 and the second threshold value ⁇ 2 according to the calculated reverberation level.
- the speaker is close to or far from the user. Is determined.
- the gain calculated according to the determination result is multiplied by the voice signal output to the first directivity forming unit 1103 that picks up the direct sound of the speaker, and the level is controlled.
- the voice of a speaker who is close to the user such as a conversation partner
- the voice of a speaker who is far from the user is attenuated or suppressed.
- the distance of the speaker is determined only in the voice section of the voice signal x1 (t) output by the first directivity forming unit 1103, the distance of the speaker is determined with high accuracy. be able to.
- the threshold for determining the perspective is dynamically set according to the degree of the reverberation level. It becomes possible to set. Therefore, in the present embodiment, the distance between the user and the speaker can be determined with high accuracy.
- FIG. 20 is a block diagram illustrating an internal configuration of the sound processing apparatus 14 according to the fifth embodiment.
- the acoustic processing device 14 according to the fifth embodiment is different from the acoustic processing device 12 according to the third embodiment in that constituent elements such as a self-speech voice determination unit 801 and a conversation partner determination unit 1001 are further provided as shown in FIG. It is a point to have. 20, the same components as those in FIG. 7 are denoted by the same reference numerals, and description thereof is omitted.
- the self-speech voice determination unit 801 acquires the voice segment detection result information output from the voice segment detection unit 501.
- the speech utterance determination unit 801 uses the absolute sound pressure level of level Lx3 (t) in the speech segment based on the acquired speech segment detection result information, and the speech detected by the speech segment detection unit 501 It is determined whether or not the voice.
- the self-speech voice determination unit 801 determines that the voice corresponding to the level Lx3 (t) is a self-speech voice.
- the fourth threshold value ⁇ 4 is, for example, 74 [dB (SPL)].
- the self-speech voice determination unit 801 outputs the self-speech voice determination result information corresponding to the determined result to the conversation partner determination unit 1001.
- the self-speech voice determination unit 801 may output the self-speech voice determination result information to the speaker distance determination unit 105 and the conversation partner determination unit 1001, respectively.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the voice section detection result information by the voice section detection unit 501. Further, the utterer distance determination unit 105 may acquire the own utterance voice determination result information output by the own utterance voice determination unit 801.
- the speaker distance determination unit 105 determines the distance to the speaker by excluding the voice section determined to be the self-speech voice from the detected voice section and the detected section.
- the speaker distance determination unit 105 outputs the determined distance determination result information to the conversation partner determination unit 1001 based on the voice section detection result information.
- the utterer distance determination unit 105 may output the determined distance determination result information to the conversation partner determination unit 1001 based on the voice section detection result information and the self-uttered voice determination result information.
- the conversation partner determination unit 1001 acquires the self-speech speech determination result information by the self-speech speech determination unit 801 and the perspective determination result information by the speaker distance determination unit 105, respectively.
- the conversation partner determination unit 1001 uses the voice of the speaker near the user and the self-speech voice determined by the self-speech voice determination unit 801. It is determined whether the speaker is the user's conversation partner.
- the case where the speaker distance determination unit 105 determines that the speaker is nearby is a case where the distance determination result information indicates “1”.
- the conversation partner determination unit 1001 When it is determined that the speaker is the user's conversation partner, the conversation partner determination unit 1001 outputs the conversation partner determination result information as “1” to the gain derivation unit 106. On the other hand, if it is determined that the speaker is not the user's conversation partner, conversation partner determination unit 1001 sets the conversation partner determination result information to “0” or “ ⁇ 1” and outputs the result to gain derivation unit 106. .
- the conversation partner determination unit 1001 determines whether or not the speaker is the user's conversation partner based on the self-speech voice determination result information and the perspective determination result information will be described with reference to FIGS. 21 and 22. .
- FIG. 21 is a diagram illustrating an example in which the perspective determination result information and the spontaneous speech determination result information are represented on the same time axis.
- FIG. 22 is a diagram illustrating another example in which the perspective determination result information and the self-uttered speech determination result information are represented on the same time axis.
- the perspective determination result information and the spontaneous speech determination result information shown in FIGS. 21 and 22 are referred to by the conversation partner determination unit 1001.
- FIG. 21 is a diagram when the self-speech voice determination result information is not output to the speaker distance determination unit 105.
- the self-speech voice determination result information is output to the conversation partner determination unit 1001.
- the perspective determination result information is also “1” when the self-speech voice determination result information is “1”.
- the conversation partner determination unit 1001 treats the distance determination result information as “0”.
- the conversation partner determination unit 1001 It is determined that the speaker is the conversation partner of the user.
- FIG. 22 is a diagram when self-speech voice determination result information is output to the speaker distance determination unit 105. As shown in FIG. 22, when the perspective determination result information is “1” and the self-speech voice determination result information is “1”, the state is alternately generated almost continuously in time.
- the conversation partner determination unit 1001 determines that the speaker is the user's conversation partner.
- the gain deriving unit 106 derives the gain ⁇ (t) using the conversation partner determination result information by the conversation partner determining unit 1001. Specifically, when the conversation partner determination result information is “1”, the gain deriving unit 106 determines that the speaker is the user's conversation partner, so the instantaneous gain ⁇ ′ (t) is calculated. Set to “2.0”.
- the gain deriving unit 106 uses the derived instantaneous gain ⁇ ′ (t) to derive the gain ⁇ (t) according to the above equation (4), and supplies the derived gain ⁇ (t) to the level control unit 107. Output.
- FIG. 23 is a flowchart for explaining the operation of the sound processing apparatus 14 according to the fifth embodiment.
- the description about the same operation as that of the sound processing apparatus 12 of the third embodiment shown in FIG. 16 is omitted, and the processing related to the above-described components will be mainly described.
- the speech segment detection unit 501 outputs the detected speech segment detection result information to the speaker distance determination unit 105 and the self-speech speech determination unit 801, respectively.
- the self-speech voice determination unit 801 acquires the voice segment detection result information output from the voice segment detection unit 501.
- the self-speech voice determination unit 801 uses the absolute sound pressure level of the level Lx3 (t) in the voice segment based on the voice segment detection result information to determine whether the voice detected by the voice segment detection unit 501 is a self-spoken voice. It is determined whether or not (S431).
- the self-speech speech determination unit 801 outputs the self-speech speech determination result information corresponding to the determined result to the conversation partner determination unit 1001.
- the self-speech voice determination unit 801 may output the self-speech voice determination result information to the conversation partner determination unit 1001 and the speaker distance determination unit 105.
- the speaker distance determination unit 105 determines whether or not the speaker is close to the user based on the voice section detection result information by the voice section detection unit 501 (S105).
- the conversation partner determination unit 1001 determines whether the speaker is the user's conversation partner (S542). Specifically, the conversation partner determination unit 1001 uses the voice of the speaker near the user and the self-speech voice determined by the self-speech voice determination unit 801 to determine whether or not the speaker is the user's conversation partner. Determine whether.
- the gain derivation unit 106 performs gain derivation processing. (S106).
- the gain deriving unit 106 derives the gain ⁇ (t) using the conversation partner determination result information by the conversation partner determining unit 1001 (S106). Since the contents after these processes are the same as those in the first embodiment (see FIG. 5), the description thereof is omitted.
- the sound is collected by the first directivity forming unit by the self-speech sound determination unit added to the internal configuration of the sound processing device according to the third embodiment. It is determined whether or not the speech signal x1 (t) contains a speech voice.
- this embodiment is based on the temporal generation order of the self-speech speech determination result information and the perspective determination result information in the speech section in which the speaker is determined to be near the user by the conversation partner determination unit. It is then determined whether the speaker is the user's conversation partner.
- the gain calculated based on the determined conversation partner determination result information is multiplied by the voice signal output to the first directivity forming unit that picks up the direct sound of the speaker, and the level is controlled. .
- the voice of a speaker who is close to the user such as a conversation partner
- the voice of a speaker who is far from the user is attenuated or suppressed.
- the distance of the speaker is determined only in the voice section of the voice signal x1 (t) output by the first directivity forming unit, the distance to the speaker is determined with high accuracy. be able to.
- the voice of the speaker can be emphasized only when the speaker near the user is the conversation partner, and the voice of only the user's conversation partner can be clearly heard.
- FIG. 24 is a block diagram illustrating an internal configuration of the sound processing device 15 according to the sixth embodiment.
- the sound processing device 15 of the sixth embodiment is obtained by applying the sound processing device 11 of the second embodiment to a hearing aid.
- the difference from the sound processing apparatus 11 of the second embodiment is that, as shown in FIG. 24, the gain deriving unit 106 and the level control unit 107 shown in FIG. This is a point further having a component as a speaker 3102.
- the nonlinear amplifying unit 3101 acquires the audio signal x1 (t) output from the first directivity forming unit 1103 and the perspective determination result information output from the speaker distance determination unit 105.
- the non-linear amplification unit 3101 amplifies the audio signal x1 (t) output from the first directivity forming unit 1103 based on the perspective determination result information output from the speaker distance determination unit 105 and outputs the amplified signal to the speaker 3102. To do.
- FIG. 25 is a block diagram illustrating an example of the internal configuration of the nonlinear amplification unit 3101.
- the nonlinear amplification unit 3101 includes a band dividing unit 3201, a plurality of band signal control units (# 1 to #N) 3202, and a band synthesizing unit 3203.
- the band dividing unit 3201 divides the audio signal x1 (t) from the first directivity forming unit 1103 into a signal x1n (t) in the N-band frequency band using a filter or the like.
- a DFT Discrete Fourier Transform
- Each band signal control unit (# 1 to #N) 3202 is based on the perspective determination result information from the speaker distance determination unit 105 and the level of the signal x1n (t) of each frequency band from the band division unit 3201. A gain to be multiplied by each frequency band signal x1n (t) is set. Next, each band signal control unit (# 1 to #N) 3202 controls the level of the signal x1n (t) in each frequency band using the set gain.
- FIG. 25 shows the internal configuration of the band signal control unit (#n) 3202 in the frequency band #n among the band signal control units (# 1 to #N) 3202.
- the band signal control unit (#n) 3202 includes a band level calculation unit 3202-1, a band gain setting unit 3202-2, and a band gain control unit 3202-3.
- the band signal control unit 3202 in other frequency bands has the same internal configuration.
- the band level calculation unit 3202-1 calculates the level Lx1n (t) [dB] of the frequency band signal x1n (t).
- the level calculation formula is calculated by the method of the above formula (1), for example.
- Band gain setting section 3202-2 receives band level Lx1n (t) calculated by band level calculation section 3202-1 and distance determination result information output by speaker distance determination section 105. Next, the band gain setting unit 3202-2 multiplies the band signal x1n (t) to be controlled by the band signal control unit 3202 based on the band level Lx1n (t) and the perspective determination result information. (T) is set.
- band gain setting section 3202-2 sets band gain ⁇ n (t) that compensates the user's auditory characteristics as shown in FIG. 26, using band level Lx1n (t) of the signal.
- FIG. 26 is an explanatory diagram showing input / output characteristics at a level for compensating the user's auditory characteristics.
- the band gain setting unit 3202-2 sets the output band level to 80 [dB], and therefore increases the band gain by 20 [dB].
- the band gain setting unit 3202-2 sets “1.0” as the band gain ⁇ n (t) for the band signal x1n (t) to be controlled.
- the band gain control unit 3202-3 multiplies the band gain ⁇ n (t) by the band signal x1n (t) to be controlled, and calculates the band signal yn (t) controlled by the band signal control unit 3202 .
- the band synthesizing unit 3203 synthesizes each band signal yn (t) by a method corresponding to the band dividing unit 3201 and calculates a signal y (t) after band synthesis.
- the speaker 3102 outputs a band-combined signal y (t) in which the band gain is set by the non-linear amplifier 3101.
- FIG. 27 is a flowchart for explaining the operation of the sound processing device 15 according to the sixth embodiment.
- the description about the same operation as that of the sound processing apparatus 11 of the second embodiment shown in FIG. 12 is omitted, and the processing related to the above-described components will be mainly described.
- the nonlinear amplification unit 3101 acquires the audio signal x1 (t) output from the first directivity forming unit 1103 and the perspective determination result information output from the speaker distance determination unit 105. Next, the nonlinear amplifying unit 3101 amplifies the audio signal x1 (t) output from the first directivity forming unit 1103 based on the perspective determination result information output from the speaker distance determination unit 105, and the speaker It outputs to 3102 (S3401).
- FIG. 28 is a flowchart illustrating details of the operation of the non-linear amplification unit 3101.
- the band dividing unit 3201 divides the audio signal x1 (t) output from the first directivity forming unit 1103 into signals x1n (t) in N frequency bands (S3501).
- the band level calculation unit 3202-1 calculates the level Lx1n (t) of the signal x1n (t) in each frequency band (S3502).
- Band gain setting section 3202-2 uses band gain ⁇ n (t) multiplied by band signal x1n (t) based on band level Lx1n (t) and the distance determination result information output by speaker distance determination section 105. Setting is made (S3503).
- FIG. 29 is a flowchart illustrating details of the operation of the band gain setting unit 3202-2.
- the band gain setting unit 3202-2 uses the band level Lx1n (t) to set the band gain ⁇ n (t) for compensating the user's auditory characteristics as shown in FIG. 26 (S3602).
- the band gain setting unit 3202-2 sets “1.0” as the band gain ⁇ n (t) for the band signal x1n (t) (S3603).
- the band gain control unit 3202-3 multiplies the band signal x1n (t) by the band gain ⁇ n (t) to calculate the band signal yn (t) after the control by the band signal control unit 3202 (S3504).
- the band synthesis unit 3203 synthesizes each band signal yn (t) by a method corresponding to the band division unit 3201, and calculates the signal y (t) after band synthesis (S3505).
- the speaker 3102 outputs the band-combined signal y (t) whose gain has been adjusted (S3402).
- the gain derivation unit 106 and the level control unit 107 of the internal configuration of the sound processing device 11 of the second embodiment are integrated into the nonlinear amplification unit 3101.
- the sound processing device 15 of the sixth embodiment by further including a component such as the speaker 3102 in the sound output unit, only the sound of the conversation partner can be amplified, and the sound of only the user's conversation partner is clearly displayed. I can hear you.
- the value of the instantaneous gain ⁇ ′ (t) described above is specifically described as “2.0” or “0.5”, but is not limited to this number.
- the value of the instantaneous gain ⁇ ′ (t) can be individually set in advance according to the degree of deafness of the user used as a hearing aid.
- the conversation partner determination unit of the fifth embodiment described above is determined by the voice of the speaker and the own speech determination unit when the speaker distance determination unit determines that the speaker is near the user. Whether or not the speaker is the user's conversation partner is determined using the self-spoken voice.
- the conversation partner determination unit 1001 recognizes the voices of the speaker and the own utterance. At this time, the conversation partner determination unit 1001 extracts a predetermined keyword from the recognized speech, and determines that the speaker is the user's conversation partner when it is determined that the keyword is in the same field. It doesn't matter.
- the predetermined keyword is a keyword such as “airplane”, “car”, “Hokkaido”, “Kyushu”, and the like, and relates to the same field.
- the conversation partner determination unit 1001 performs specific speaker recognition for a speaker near the user.
- the recognized person is a specific speaker registered in advance or there is only one speaker around the user, the person is determined to be the conversation partner of the user.
- the first level calculation process is shown to be performed after the voice segment detection process. However, the first level calculation process may be performed before the voice segment detection process.
- the first level calculation process is performed after each of the voice segment detection process and the self-speech voice determination process and before the distance determination threshold setting process. As shown.
- the first level calculation process is performed before the speech detection process or the self-speech speech determination process. Alternatively, it may be performed after setting the perspective determination threshold.
- the second level calculation process is shown to be performed before the perspective determination threshold setting process. However, the second level calculation process may be performed after setting the perspective determination threshold.
- the first level calculation process is shown to be performed after each process of voice segment detection and self-speech voice determination. However, if the condition for performing the speech utterance determination process after the speech segment detection process is satisfied, the first level calculation process is performed before the speech segment detection process or the spontaneous speech determination process. It doesn't matter.
- Each processing unit excluding the microphone array 1102 described above is specifically implemented as a computer system including a microprocessor, a ROM, a RAM, and the like.
- Each processing unit includes first and second directivity forming units 1103 and 1104, first and second level calculation units 103 and 104, a speaker distance determination unit 105, a gain derivation unit 106, a level control unit 107, A speech section detection unit 501, a self-speech speech determination unit 801, a perspective determination threshold setting unit 802, a conversation partner determination unit 1001, and the like are included.
- This RAM stores computer programs. Each device achieves its functions by the microprocessor operating according to the computer program.
- the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
- the system LSI is a super multifunctional LSI manufactured by integrating a plurality of components on one chip, and specifically, a computer system including a microprocessor, a ROM, a RAM, and the like. .
- the RAM stores computer programs.
- the system LSI achieves its functions by the microprocessor operating according to the computer program.
- each of the processing units described above may be constituted by an IC card or a single module that can be attached to and detached from any of the acoustic processing apparatuses 10 to 60.
- the IC card or module is a computer system composed of a microprocessor, ROM, RAM, and the like. Further, the IC card or module may include the above-described super multifunctional LSI. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
- the embodiment of the present invention may be a sound processing method performed by the sound processing apparatus described above.
- the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of a computer program.
- the present invention also relates to a computer program or a recording medium that can read a digital signal, such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc), It may be recorded in a semiconductor memory or the like.
- a digital signal such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc), It may be recorded in a semiconductor memory or the like.
- the present invention may be digital signals recorded on these recording media.
- a computer program or a digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
- the present invention is a computer system including a microprocessor and a memory, and the memory stores the above-described computer program, and the microprocessor may operate according to the computer program.
- the present invention is implemented by another independent computer system by recording and transferring a program or digital signal on a recording medium or by transferring a program or digital signal via a network or the like. It is good.
- the sound processing apparatus has a speaker distance determination unit corresponding to the level difference between two directional microphones, and is useful as a hearing aid or the like that wants to hear only the voice of a nearby conversation partner.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Neurosurgery (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
Description
図1は、第1の実施形態における音響処理装置10の内部構成を示すブロック図である。図1に示すように、音響処理装置10は、第1の指向性マイクロホン101、第2の指向性マイクロホン102、第1のレベル算出部103、第2のレベル算出部104、発話者遠近判定部105、ゲイン導出部106、及びレベル制御部107を有する。
第1の指向性マイクロホン101は、発話者方向に指向性の主軸を有する単一指向性マイクロホンであり、主に発話者の音声の直接音を収音する。第1の指向性マイクロホン101は、この収音された音声信号x1(t)を、第1のレベル算出部103及びレベル制御部107にそれぞれ出力する。
次に、第1の実施形態の音響処理装置10の動作について図5を参照して説明する。図5は、第1の実施形態の音響処理装置10の動作を説明するフローチャートである。
ゲイン導出部106が、発話者遠近判定部105により出力された遠近判定結果情報に基づいて、音声信号x1(t)に対するゲインα(t)を導出する処理の詳細について、図6を参照して説明する。図6は、ゲイン導出部106の動作の詳細を説明したフローチャートである。
図7は、第1の実施形態における音響処理装置11の内部構成を示すブロック図である。図7において、図1と同じ構成要素については同じ符号を用い、当該構成要素の説明を省略する。図7に示すように、音響処理装置11は、指向性収音部1101、第1のレベル算出部103、第2のレベル算出部104、発話者遠近判定部105、ゲイン導出部106、及びレベル制御部107を有する。
一方、数式(2)に示す関係が成立しない場合(上記数式(3))には、音声の子音区間や文節間において、レベルの低下を軽減するために大きい時定数が用いられる。
次に、第2の実施形態の音響処理装置11の動作について図12を参照して説明する。図12は、第2の実施形態の音響処理装置11の動作を説明するフローチャートである。
図13は、第3の実施形態の音響処理装置12の内部構成を示すブロック図である。第3の実施形態の音響処理装置12が第2の実施形態の音響処理装置11と異なる点は、図13に示すように、音声区間検出部501なる構成要素を更に有する点である。図13において、図7と同じ構成要素については同じ符号を用い、当該構成要素の説明を省略する。
音声区間検出部501は、第1の指向性形成部1103により出力された音声信号x1(t)を取得する。音声区間検出部501は、第1の指向性形成部1103により出力された音声信号x1(t)を用いて、音響処理装置12のユーザを含まない発話者が音声を発声している区間を検出する。音声区間検出部501は、この検出された音声区間検出結果情報を発話者遠近判定部105に出力する。
次に、第3の実施形態の音響処理装置12の動作について図16を参照して説明する。図16は、第3の実施形態の音響処理装置12の動作を説明するフローチャートである。図16において、図12に示す第2の実施形態の音響処理装置11の動作と同一の動作についての説明は省略し、上記した構成要素に関連する処理を主に説明する。
図17は、第4の実施形態の音響処理装置13の内部構成を示すブロック図である。第4の実施形態の音響処理装置13が第3の実施形態の音響処理装置12と異なる点は、図17に示すように、自発話音声判定部801及び遠近判定閾値設定部802なる構成要素を更に有する点である。
音声区間検出部501は、第1の指向性形成部1103により出力された音声信号x1(t)を取得する。音声区間検出部501は、第1の指向性形成部1103により出力された音声信号x1(t)を用いて、音響処理装置13のユーザ或いは発話者が音声を発声している区間を検出する。
次に、第4の実施形態の音響処理装置13の動作について図19を参照して説明する。図19は、第4の実施形態の音響処理装置13の動作を説明するフローチャートである。図19において、図16に示す第3の実施形態の音響処理装置13の動作と同一の動作についての説明は省略し、上記した構成要素に関連する処理を主に説明する。
図20は、第5の実施形態の音響処理装置14の内部構成を示すブロック図である。第5の実施形態の音響処理装置14が第3の実施形態の音響処理装置12と異なる点は、図20に示すように、自発話音声判定部801及び会話相手判定部1001なる構成要素を更に有する点である。図20において、図7と同じ構成要素については同じ符号を用い、説明を省略する。
自発話音声判定部801は、音声区間検出部501から出力された音声区間検出結果情報を取得する。自発話音声判定部801は、この取得された音声区間検出結果情報に基づく音声区間において、レベルLx3(t)の絶対音圧レベルを用いて、音声区間検出部501により検出された音声が自発話音声であるか否かを判定する。
次に、第5の実施形態の音響処理装置14の動作について図23を参照して説明する。図23は、第5の実施形態の音響処理装置14の動作を説明するフローチャートである。図23において、図16に示す第3の実施形態の音響処理装置12の動作と同一の動作についての説明は省略し、上記した構成要素に関連する処理を主に説明する。
図24は、第6の実施形態の音響処理装置15の内部構成を示すブロック図である。第6の実施形態の音響処理装置15は、第2の実施形態の音響処理装置11を補聴器に適用したものである。第2の実施形態の音響処理装置11と異なる点は、図24に示すように、図7に示すゲイン導出部106とレベル制御部107とを非線形増幅部3101に統合し、更に音声出力部としてスピーカ3102なる構成要素を更に有する点である。第6実施形態においては、図7と同じ構成要素については同じ符号を用い、当該構成要素の説明を省略する。
非線形増幅部3101は、第1の指向性形成部1103により出力された音声信号x1(t)、及び発話者遠近判定部105により出力された遠近判定結果情報を取得する。非線形増幅部3101は、発話者遠近判定部105により出力された遠近判定結果情報に基づいて、第1の指向性形成部1103により出力された音声信号x1(t)を増幅し、スピーカ3102に出力する。
次に、第6の実施形態の音響処理装置15の動作について図27を参照して説明する。図27は、第6の実施形態の音響処理装置15の動作を説明するフローチャートである。図27において、図12に示す第2の実施形態の音響処理装置11の動作と同一の動作についての説明は省略し、上記した構成要素に関連する処理を主に説明する。
20 音響処理装置
30 音響処理装置
40 音響処理装置
50 音響処理装置
1101 指向性収音部
1102 マイクロホンアレイ
1103 第1の指向性形成部
1104 第2の指向性形成部
103 第1のレベル算出部
104 第2のレベル算出部
105 発話者遠近判定部
106 ゲイン導出部
107 レベル制御部
1201-1 無指向性マイクロホン
1201-2 無指向性マイクロホン
1202 遅延器
1203 演算器
1204 EQ
501 音声区間検出部
601 第3のレベル算出部
602 推定騒音レベル算出部
603 レベル比較部
604 音声区間判定部
801 自発話音声判定部
802 遠近判定閾値設定部
901 適応フィルタ
902 遅延器
903 差信号算出部
904 判定閾値設定部
1001 会話相手判定部
3101 非線形増幅部
3201 帯域分割部
3202 帯域信号制御部
3202-1 帯域レベル算出部
3202-2 帯域ゲイン設定部
3202-3 帯域ゲイン制御部
3203 帯域合成部
Claims (6)
- 複数の無指向性マイクロホンによる各々の出力信号を用いて、発話者方向に指向性の主軸を形成した第1の指向性信号を出力する第1の指向性形成部と、
前記複数の無指向性マイクロホンによる各々の出力信号を用いて、発話者方向に指向性の死角を形成した第2の指向性信号を出力する第2の指向性形成部と、
前記第1の指向性形成部により出力された第1の指向性信号のレベルを算出する第1のレベル算出部と、
前記第2の指向性形成部により出力された第2の指向性信号のレベルを算出する第2のレベル算出部と、
前記第1及び第2のレベル算出部により算出された前記第1の指向性信号のレベル及び前記第2の指向性信号のレベルに基づいて、前記発話者との遠近を判定する発話者遠近判定部と、
前記発話者遠近判定部の結果に応じて、前記第1の指向性信号に与えるゲインを導出するゲイン導出部と、
前記ゲイン導出部により導出されたゲインを用いて、前記第1の指向性信号のレベルを制御するレベル制御部と、
を有することを特徴とする音響処理装置。 - 請求項1に記載の音響処理装置であって、
前記第1の指向性信号の音声区間を検出する音声区間検出部と、を更に有し、
前記発話者遠近判定部は、前記音声区間検出部により検出された音声区間における音声信号に基づいて、前記発話者の遠近を判定することを特徴とする音響処理装置。 - 請求項1又は2に記載の音響処理装置であって、
前記音声区間検出部にて検出された音声区間における前記第1の指向性信号のレベルに基づいて、自発話音声か否かを判定する自発話音声判定部と、
前記自発話音声判定部により判定された自発話音声に含まれる残響音を推定し、この推定された残響音に基づいて、前記発話者遠近判定部が前記発話者との遠近を判定する際に用いられる判定閾値を設定する遠近判定閾値設定部と、を更に有し、
前記発話者遠近判定部は、前記遠近判定閾値設定部により設定された前記判定閾値を用いて前記発話者との遠近を判定することを特徴とする音響処理装置。 - 請求項3に記載の音響処理装置であって、
前記発話者遠近判定部の結果と、前記自発話音声判定部の結果とに基づいて、前記発話者遠近判定部により判定された前記発話者音声が会話相手により発声されたか否かを判定する会話相手判定部と、を更に有し、
前記ゲイン導出部は、前記会話相手判定部の結果に応じて、前記第1の指向性信号に与えるゲインを導出することを特徴とする音響処理装置。 - 複数の無指向性マイクロホンによる各々の出力信号を用いて、発話者方向に指向性の主軸を形成した第1の指向性信号を出力するステップと、
前記複数の無指向性マイクロホンによる各々の出力信号を用いて、発話者方向に指向性の死角を形成した第2の指向性信号を出力するステップと、
前記出力された第1の指向性信号のレベルを算出するステップと、
前記出力された第2の指向性信号のレベルを算出するステップと、
前記算出された前記第1の指向性信号のレベル及び前記第2の指向性信号のレベルに基づいて、前記発話者との遠近を判定するステップと、
前記判定された前記発話者との遠近に応じて、前記第1の指向性信号に与えるゲインを導出するステップと、
前記導出されたゲインを用いて、前記第1の指向性信号のレベルを制御するステップと、
を有することを特徴とする音響処理方法。 - 請求項1~請求項4のいずれか一項に記載の音響処理装置を備えることを特徴とする補聴器。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10824665.3A EP2492912B1 (en) | 2009-10-21 | 2010-10-20 | Sound processing apparatus, sound processing method and hearing aid |
CN2010800449129A CN102549661B (zh) | 2009-10-21 | 2010-10-20 | 音响处理装置、音响处理方法及助听器 |
US13/499,027 US8755546B2 (en) | 2009-10-21 | 2010-10-20 | Sound processing apparatus, sound processing method and hearing aid |
JP2011537143A JP5519689B2 (ja) | 2009-10-21 | 2010-10-20 | 音響処理装置、音響処理方法及び補聴器 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009242602 | 2009-10-21 | ||
JP2009-242602 | 2009-10-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011048813A1 true WO2011048813A1 (ja) | 2011-04-28 |
Family
ID=43900057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/006231 WO2011048813A1 (ja) | 2009-10-21 | 2010-10-20 | 音響処理装置、音響処理方法及び補聴器 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8755546B2 (ja) |
EP (1) | EP2492912B1 (ja) |
JP (1) | JP5519689B2 (ja) |
CN (1) | CN102549661B (ja) |
WO (1) | WO2011048813A1 (ja) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013061421A (ja) * | 2011-09-12 | 2013-04-04 | Oki Electric Ind Co Ltd | 音声信号処理装置、方法及びプログラム |
CN103124165A (zh) * | 2011-11-14 | 2013-05-29 | 谷歌公司 | 自动增益控制 |
JP2014186295A (ja) * | 2013-02-21 | 2014-10-02 | Nippon Telegr & Teleph Corp <Ntt> | 音声区間検出装置、音声認識装置、その方法、及びプログラム |
JP2016505896A (ja) * | 2013-01-08 | 2016-02-25 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 背景ノイズにおけるスピーチ了解度を増幅及び圧縮により向上させる装置と方法 |
JP2016039398A (ja) * | 2014-08-05 | 2016-03-22 | 沖電気工業株式会社 | 残響判定装置及びプログラム |
JP2019198073A (ja) * | 2018-05-11 | 2019-11-14 | シバントス ピーティーイー リミテッド | 補聴器の作動方法および補聴器 |
WO2020026727A1 (ja) * | 2018-08-02 | 2020-02-06 | 日本電信電話株式会社 | 集音装置 |
WO2020148859A1 (ja) * | 2019-01-17 | 2020-07-23 | Toa株式会社 | マイクロホン装置 |
WO2022137806A1 (ja) * | 2020-12-25 | 2022-06-30 | パナソニックIpマネジメント株式会社 | 耳装着型デバイス、及び、再生方法 |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140112483A1 (en) * | 2012-10-24 | 2014-04-24 | Alcatel-Lucent Usa Inc. | Distance-based automatic gain control and proximity-effect compensation |
US9685171B1 (en) * | 2012-11-20 | 2017-06-20 | Amazon Technologies, Inc. | Multiple-stage adaptive filtering of audio signals |
JP6326071B2 (ja) | 2013-03-07 | 2018-05-16 | アップル インコーポレイテッド | 部屋およびプログラム反応型ラウドスピーカシステム |
DE102013207149A1 (de) * | 2013-04-19 | 2014-11-06 | Siemens Medical Instruments Pte. Ltd. | Steuerung der Effektstärke eines binauralen direktionalen Mikrofons |
EP2876900A1 (en) | 2013-11-25 | 2015-05-27 | Oticon A/S | Spatial filter bank for hearing system |
CN105474610B (zh) * | 2014-07-28 | 2018-04-10 | 华为技术有限公司 | 通信设备的声音信号处理方法和设备 |
JP6450458B2 (ja) * | 2014-11-19 | 2019-01-09 | シバントス ピーティーイー リミテッド | 自身の声を迅速に検出する方法と装置 |
CN105100413B (zh) * | 2015-05-27 | 2018-08-07 | 努比亚技术有限公司 | 一种信息处理方法及装置、终端 |
DE102015210652B4 (de) * | 2015-06-10 | 2019-08-08 | Sivantos Pte. Ltd. | Verfahren zur Verbesserung eines Aufnahmesignals in einem Hörsystem |
KR20170035504A (ko) * | 2015-09-23 | 2017-03-31 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 오디오 처리 방법 |
JP6828804B2 (ja) | 2017-03-24 | 2021-02-10 | ヤマハ株式会社 | 収音装置および収音方法 |
DE102017215823B3 (de) * | 2017-09-07 | 2018-09-20 | Sivantos Pte. Ltd. | Verfahren zum Betrieb eines Hörgerätes |
WO2019160006A1 (ja) * | 2018-02-16 | 2019-08-22 | 日本電信電話株式会社 | ハウリング抑圧装置、その方法、およびプログラム |
US10939202B2 (en) * | 2018-04-05 | 2021-03-02 | Holger Stoltze | Controlling the direction of a microphone array beam in a video conferencing system |
CN112712790B (zh) * | 2020-12-23 | 2023-08-15 | 平安银行股份有限公司 | 针对目标说话人的语音提取方法、装置、设备及介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05207587A (ja) * | 1992-01-24 | 1993-08-13 | Matsushita Electric Ind Co Ltd | マイクロホン装置 |
JPH09311696A (ja) * | 1996-05-21 | 1997-12-02 | Nippon Telegr & Teleph Corp <Ntt> | 自動利得調整装置 |
JP2004226656A (ja) * | 2003-01-22 | 2004-08-12 | Fujitsu Ltd | マイクロホンアレイを用いた話者距離検出装置及び方法並びに当該装置を用いた音声入出力装置 |
JP2008312002A (ja) * | 2007-06-15 | 2008-12-25 | Yamaha Corp | テレビ会議装置 |
JP2009036810A (ja) | 2007-07-31 | 2009-02-19 | National Institute Of Information & Communication Technology | 近傍場音源分離プログラム、及びこのプログラムを記録したコンピュータ読取可能な記録媒体、並びに近傍場音源分離方法 |
JP2009242602A (ja) | 2008-03-31 | 2009-10-22 | Panasonic Corp | 粘着シート |
JP2010112996A (ja) * | 2008-11-04 | 2010-05-20 | Sony Corp | 音声処理装置、音声処理方法およびプログラム |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0511696A (ja) * | 1991-07-05 | 1993-01-22 | Sumitomo Electric Ind Ltd | 地図表示装置 |
WO1999037118A1 (fr) | 1998-01-16 | 1999-07-22 | Sony Corporation | Haut-parleur et appareil electronique utilisant un haut-parleur |
US6243322B1 (en) * | 1999-11-05 | 2001-06-05 | Wavemakers Research, Inc. | Method for estimating the distance of an acoustic signal |
US8326611B2 (en) * | 2007-05-25 | 2012-12-04 | Aliphcom, Inc. | Acoustic voice activity detection (AVAD) for electronic systems |
EP1413168A2 (en) * | 2001-07-20 | 2004-04-28 | Koninklijke Philips Electronics N.V. | Sound reinforcement system having an echo suppressor and loudspeaker beamformer |
JP4195267B2 (ja) * | 2002-03-14 | 2008-12-10 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 音声認識装置、その音声認識方法及びプログラム |
JP5207587B2 (ja) * | 2005-02-18 | 2013-06-12 | 三洋電機株式会社 | 回路装置 |
US8180067B2 (en) * | 2006-04-28 | 2012-05-15 | Harman International Industries, Incorporated | System for selectively extracting components of an audio input signal |
CN101779476B (zh) * | 2007-06-13 | 2015-02-25 | 爱利富卡姆公司 | 全向性双麦克风阵列 |
-
2010
- 2010-10-20 JP JP2011537143A patent/JP5519689B2/ja active Active
- 2010-10-20 EP EP10824665.3A patent/EP2492912B1/en active Active
- 2010-10-20 CN CN2010800449129A patent/CN102549661B/zh active Active
- 2010-10-20 US US13/499,027 patent/US8755546B2/en active Active
- 2010-10-20 WO PCT/JP2010/006231 patent/WO2011048813A1/ja active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05207587A (ja) * | 1992-01-24 | 1993-08-13 | Matsushita Electric Ind Co Ltd | マイクロホン装置 |
JPH09311696A (ja) * | 1996-05-21 | 1997-12-02 | Nippon Telegr & Teleph Corp <Ntt> | 自動利得調整装置 |
JP2004226656A (ja) * | 2003-01-22 | 2004-08-12 | Fujitsu Ltd | マイクロホンアレイを用いた話者距離検出装置及び方法並びに当該装置を用いた音声入出力装置 |
JP2008312002A (ja) * | 2007-06-15 | 2008-12-25 | Yamaha Corp | テレビ会議装置 |
JP2009036810A (ja) | 2007-07-31 | 2009-02-19 | National Institute Of Information & Communication Technology | 近傍場音源分離プログラム、及びこのプログラムを記録したコンピュータ読取可能な記録媒体、並びに近傍場音源分離方法 |
JP2009242602A (ja) | 2008-03-31 | 2009-10-22 | Panasonic Corp | 粘着シート |
JP2010112996A (ja) * | 2008-11-04 | 2010-05-20 | Sony Corp | 音声処理装置、音声処理方法およびプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP2492912A4 |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013061421A (ja) * | 2011-09-12 | 2013-04-04 | Oki Electric Ind Co Ltd | 音声信号処理装置、方法及びプログラム |
US9426566B2 (en) | 2011-09-12 | 2016-08-23 | Oki Electric Industry Co., Ltd. | Apparatus and method for suppressing noise from voice signal by adaptively updating Wiener filter coefficient by means of coherence |
CN103124165A (zh) * | 2011-11-14 | 2013-05-29 | 谷歌公司 | 自动增益控制 |
JP2013109346A (ja) * | 2011-11-14 | 2013-06-06 | Google Inc | 自動利得制御 |
JP2016505896A (ja) * | 2013-01-08 | 2016-02-25 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 背景ノイズにおけるスピーチ了解度を増幅及び圧縮により向上させる装置と方法 |
US10319394B2 (en) | 2013-01-08 | 2019-06-11 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improving speech intelligibility in background noise by amplification and compression |
JP2014186295A (ja) * | 2013-02-21 | 2014-10-02 | Nippon Telegr & Teleph Corp <Ntt> | 音声区間検出装置、音声認識装置、その方法、及びプログラム |
JP2016039398A (ja) * | 2014-08-05 | 2016-03-22 | 沖電気工業株式会社 | 残響判定装置及びプログラム |
JP2019198073A (ja) * | 2018-05-11 | 2019-11-14 | シバントス ピーティーイー リミテッド | 補聴器の作動方法および補聴器 |
WO2020026727A1 (ja) * | 2018-08-02 | 2020-02-06 | 日本電信電話株式会社 | 集音装置 |
JP2020022115A (ja) * | 2018-08-02 | 2020-02-06 | 日本電信電話株式会社 | 集音装置 |
JP7210926B2 (ja) | 2018-08-02 | 2023-01-24 | 日本電信電話株式会社 | 集音装置 |
WO2020148859A1 (ja) * | 2019-01-17 | 2020-07-23 | Toa株式会社 | マイクロホン装置 |
JPWO2020148859A1 (ja) * | 2019-01-17 | 2021-11-25 | Toa株式会社 | マイクロホン装置 |
JP7422683B2 (ja) | 2019-01-17 | 2024-01-26 | Toa株式会社 | マイクロホン装置 |
WO2022137806A1 (ja) * | 2020-12-25 | 2022-06-30 | パナソニックIpマネジメント株式会社 | 耳装着型デバイス、及び、再生方法 |
Also Published As
Publication number | Publication date |
---|---|
JP5519689B2 (ja) | 2014-06-11 |
CN102549661A (zh) | 2012-07-04 |
US8755546B2 (en) | 2014-06-17 |
US20120189147A1 (en) | 2012-07-26 |
EP2492912B1 (en) | 2018-12-05 |
EP2492912A1 (en) | 2012-08-29 |
EP2492912A4 (en) | 2016-10-19 |
CN102549661B (zh) | 2013-10-09 |
JPWO2011048813A1 (ja) | 2013-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5519689B2 (ja) | 音響処理装置、音響処理方法及び補聴器 | |
US11109163B2 (en) | Hearing aid comprising a beam former filtering unit comprising a smoothing unit | |
US9591410B2 (en) | Hearing assistance apparatus | |
CN107872762B (zh) | 话音活动检测单元及包括话音活动检测单元的听力装置 | |
US10154353B2 (en) | Monaural speech intelligibility predictor unit, a hearing aid and a binaural hearing system | |
US9959886B2 (en) | Spectral comb voice activity detection | |
US8842861B2 (en) | Method of signal processing in a hearing aid system and a hearing aid system | |
CN108235181B (zh) | 在音频处理装置中降噪的方法 | |
US9241223B2 (en) | Directional filtering of audible signals | |
JP2023159381A (ja) | 音声認識オーディオシステムおよび方法 | |
EP2823482A2 (en) | Voice activity detection and pitch estimation | |
US9437213B2 (en) | Voice signal enhancement | |
JP5115818B2 (ja) | 音声信号強調装置 | |
JP2009075160A (ja) | コミュニケーション音声処理方法とその装置、及びそのプログラム | |
JP3411648B2 (ja) | 車載用オーディオ装置 | |
JP4098647B2 (ja) | 音響信号の残響除去方法、装置、及び音響信号の残響除去プログラム、そのプログラムを記録した記録媒体 | |
JP2005303574A (ja) | 音声認識ヘッドセット | |
JP6794887B2 (ja) | 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法 | |
JP2020053841A (ja) | 音源方向判定装置、音源方向判定方法、及び音源方向判定プログラム | |
WO2023104215A1 (en) | Methods for synthesis-based clear hearing under noisy conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080044912.9 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10824665 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011537143 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13499027 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010824665 Country of ref document: EP |