WO2022259589A1 - 耳装着型デバイス、及び、再生方法 - Google Patents
耳装着型デバイス、及び、再生方法 Download PDFInfo
- Publication number
- WO2022259589A1 WO2022259589A1 PCT/JP2022/000697 JP2022000697W WO2022259589A1 WO 2022259589 A1 WO2022259589 A1 WO 2022259589A1 JP 2022000697 W JP2022000697 W JP 2022000697W WO 2022259589 A1 WO2022259589 A1 WO 2022259589A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- signal
- sound signal
- ear
- human voice
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 22
- 230000005236 sound signal Effects 0.000 claims abstract description 179
- 230000003044 adaptive effect Effects 0.000 claims description 17
- 230000002238 attenuated effect Effects 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 description 39
- 238000004891 communication Methods 0.000 description 24
- 238000010586 diagram Methods 0.000 description 23
- 238000004364 calculation method Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 10
- 238000001228 spectrum Methods 0.000 description 8
- 230000002123 temporal effect Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000000613 ear canal Anatomy 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 210000003454 tympanic membrane Anatomy 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1041—Mechanical or electronic switches, or control elements
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61F—FILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
- A61F11/00—Methods or devices for treatment of the ears or hearing sense; Non-electric hearing aids; Methods or devices for enabling ear patients to achieve auditory perception through physiological senses other than hearing sense; Protective devices for the ears, carried on the body or in the hand
- A61F11/06—Protective devices for the ears
- A61F11/08—Protective devices for the ears internal, e.g. earplugs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1785—Methods, e.g. algorithms; Devices
- G10K11/17853—Methods, e.g. algorithms; Devices of the filter
- G10K11/17854—Methods, e.g. algorithms; Devices of the filter the filter being an adaptive filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1787—General system configurations
- G10K11/17873—General system configurations using a reference signal without an error signal, e.g. pure feedforward
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1787—General system configurations
- G10K11/17885—General system configurations additionally using a desired external signal, e.g. pass-through audio such as music or speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1016—Earpieces of the intra-aural type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1083—Reduction of ambient noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2460/00—Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
- H04R2460/01—Hearing devices using active noise cancellation
Definitions
- the present disclosure relates to an ear-worn device and a reproduction method.
- Japanese Patent Laid-Open No. 2002-200002 discloses a technology related to audio reproducing headphones.
- the present disclosure provides an ear-mounted device capable of reproducing the voices of people heard in the surroundings according to the surrounding noise environment.
- An ear-mounted device includes a microphone that acquires sound and outputs a first sound signal of the acquired sound, and a predetermined requirement related to a noise component included in the sound. and a signal processing circuit that outputs a second sound signal based on the first sound signal when it is determined that the sound includes a human voice; and A speaker for outputting reproduced sound, a housing for housing the microphone, the signal processing circuit, and the speaker are provided.
- An ear-mounted device can reproduce the voices of people heard in the surroundings according to the surrounding noise environment.
- FIG. 1 is an external view of a device that constitutes a sound signal processing system according to an embodiment.
- FIG. 2 is a block diagram showing the functional configuration of the sound signal processing system according to the embodiment.
- FIG. 3 is a flow chart of Example 1 of the ear-mounted device according to the embodiment.
- FIG. 4 is a first flowchart of the operation in the external sound capture mode of the ear-mounted device according to the embodiment.
- FIG. 5 is a second flowchart of the operation of the ear-mounted device according to the embodiment in the external sound capture mode.
- FIG. 6 is a flowchart of operations in the noise canceling mode of the ear-worn device according to the embodiment.
- FIG. 7 is a flow chart of Example 2 of the ear-mounted device according to the embodiment.
- FIG. 8 is a diagram showing an example of an operation mode selection screen.
- FIG. 9 is a flow chart of Example 3 of the ear-mounted device according to the embodiment.
- FIG. 10 is a first diagram showing temporal changes in spectral flatness.
- FIG. 11 is a second diagram showing temporal changes in spectral flatness.
- FIG. 12 is a third diagram showing temporal changes in spectral flatness.
- FIG. 13 is a fourth diagram showing temporal changes in spectral flatness.
- FIG. 14 is a first diagram showing the spectrogram of the first sound signal.
- FIG. 15 is a second diagram showing the spectrogram of the first sound signal.
- FIG. 16 is a third diagram showing the spectrogram of the first sound signal.
- FIG. 17 is a fourth diagram showing the spectrogram of the first sound signal.
- FIG. 18 is a block diagram showing the functional configuration of a noise removal filter that functions as an adaptive filter.
- each figure is a schematic diagram and is not necessarily strictly illustrated. Moreover, in each figure, the same code
- FIG. 1 is an external view of a device that constitutes a sound signal processing system according to an embodiment.
- FIG. 2 is a block diagram showing the functional configuration of the sound signal processing system according to the embodiment.
- the sound signal processing system 10 includes an ear-worn device 20 and a mobile terminal 30.
- the ear-worn device 20 is an earphone-type device that reproduces the fourth sound signal provided from the mobile terminal 30 .
- the fourth sound signal is, for example, a sound signal of music content.
- the ear-worn device 20 has an external sound capturing function (also referred to as an external sound capturing mode) that captures the surrounding sounds of the user during reproduction of the fourth sound signal.
- the surrounding sounds here are, for example, announcement sounds.
- the announcement sound is, for example, inside a moving body such as a train, a bus, and an airplane, and is output from a speaker provided in the moving body.
- the announcement sound includes human voice.
- the ear-mounted device 20 operates in a normal mode of reproducing the fourth sound signal provided from the mobile terminal 30, and operates in an external sound capture mode of capturing and reproducing the surrounding sounds of the user. For example, when the user wearing the ear-worn device 20 is on a moving mobile object and is listening to music content in the normal mode, an announcement sound is output within the moving object and is output. If the announced sound contains a human voice, the ear-worn device 20 automatically transitions from the normal mode to the external sound capture mode. This prevents the user from missing the announcement sound.
- the ear-worn device 20 specifically includes a microphone 21, a DSP 22, a communication circuit 27a, a mixing circuit 27b, and a speaker 28.
- the communication circuit 27a and the mixing circuit 27b may be included in the DSP 22.
- Microphone 21, DSP 22, communication circuit 27a, mixing circuit 27b, and speaker 28 are housed in housing 29 (shown in FIG. 1).
- the microphone 21 is a sound pickup device that acquires sounds around the ear-mounted device 20 and outputs a first sound signal based on the acquired sounds.
- the microphone 21 is specifically a condenser microphone, a dynamic microphone, or a MEMS (Micro Electro Mechanical Systems) microphone, but is not particularly limited. Also, the microphone 21 may be omnidirectional or directional.
- the DSP 22 implements an external sound capture function by performing signal processing on the first sound signal output from the microphone 21 .
- the DSP 22 realizes an external sound capturing function by outputting a second sound signal based on the first sound signal to the speaker 28, for example.
- the DSP 22 also has a noise canceling function and can output a third sound signal obtained by performing phase inversion processing on the first sound signal.
- DSP22 is an example of a signal processing circuit.
- the DSP 22 specifically has a filter circuit 23 , a CPU (Central Processing Unit) 24 and a memory 26 .
- the filter circuit 23 includes a noise removal filter 23a, a high-pass filter 23b, and a low-pass filter 23c.
- the noise removal filter 23 a is a filter for removing noise contained in the first sound signal output from the microphone 21 .
- the noise removal filter 23a is, for example, a non-linear digital filter, but may be a filter using a spectral subtraction method that removes noise in the frequency domain.
- the noise removal filter unit 23a may be a Wiener filter.
- the high-pass filter 23b attenuates the components in the band of 512 Hz or lower contained in the noise-removed first sound signal output from the noise removal filter 23a.
- the low-pass filter 23 c attenuates components in the band of 512 Hz or higher included in the first sound signal output from the microphone 21 .
- these cutoff frequencies are examples, and the cutoff frequencies may be determined empirically or experimentally. The cutoff frequency is determined, for example, according to the type of mobile object in which the ear-worn device 20 is supposed to be used.
- the CPU 24 includes, as functional components, an audio feature amount calculation section 24a, a noise feature amount calculation section 24b, a determination section 24c, and a switching section 24d.
- the functions of the sound feature amount calculation unit 24a, the noise feature amount calculation unit 24b, the determination unit 24c, and the switching unit 24d are realized by the CPU 24 executing a computer program stored in the memory 26, for example.
- the details of the functions of the sound feature amount calculation unit 24a, the noise feature amount calculation unit 24b, the determination unit 24c, and the switching unit 24d will be described later.
- the memory 26 is a storage device that stores computer programs executed by the CPU 24 and various information necessary for realizing the external sound capturing function.
- the memory 26 is implemented by a semiconductor memory or the like. Note that the memory 26 may be realized as an external memory of the DSP 22 instead of an internal memory of the DSP 22 .
- the communication circuit 27 a receives the fourth sound signal from the mobile terminal 30 .
- the communication circuit 27a is, for example, a wireless communication circuit, and communicates with the mobile terminal 30 based on a communication standard such as Bluetooth (registered trademark) or BLE (Bluetooth (registered trademark) Low Energy).
- the mixing circuit 27 b mixes the fourth sound signal received by the communication circuit 27 a with one of the second sound signal and the third sound signal output by the DSP 22 and outputs the result to the speaker 28 .
- the communication circuit 27a and the mixing circuit 27b may be realized as one SoC (System-on-a-Chip).
- the speaker 28 outputs reproduced sound based on the mixed sound signal obtained from the mixing circuit 27b.
- the speaker 28 is a speaker that emits sound waves toward the ear canal (eardrum) of the user wearing the ear-worn device 20, but may be a bone conduction speaker.
- the mobile terminal 30 is an information terminal that functions as a user interface device in the sound signal processing system 10 by installing a predetermined application program.
- the mobile terminal 30 also functions as a sound source that provides the ear-worn device 20 with a fourth sound signal (music content). Specifically, by operating the mobile terminal 30 , the user can select music content to be reproduced by the speaker 28 , switch the operation mode of the ear-worn device 20 , and the like.
- the mobile terminal 30 includes a UI (User Interface) 31 , a communication circuit 32 , a CPU 33 and a memory 34 .
- the UI 31 is a user interface device that receives user operations and presents images to the user.
- the UI 31 is implemented by an operation reception unit such as a touch panel and a display unit such as a display panel.
- the communication circuit 32 transmits the fourth sound signal, which is the sound signal of the music content selected by the user, to the ear-mounted device 20 .
- the communication circuit 32 is, for example, a wireless communication circuit, and communicates with the ear-worn device 20 based on a communication standard such as Bluetooth (registered trademark) or BLE (Bluetooth (registered trademark) Low Energy).
- the CPU 33 performs information processing related to image display on the display unit, transmission of the fourth sound signal using the communication circuit 32, and the like.
- the CPU 33 is implemented by, for example, a microcomputer, but may be implemented by a processor.
- the image display function, the fourth sound signal transmission function, and the like are realized by the CPU 33 executing a computer program stored in the memory 34 .
- the memory 34 is a storage device that stores various information necessary for the CPU 33 to process information, a computer program executed by the CPU 33, a fourth sound signal (music content), and the like.
- the memory 34 is implemented by, for example, a semiconductor memory.
- Example 1 As described above, the ear-worn device 20 can automatically operate in the external sound capture mode when the moving body on which the user rides is moving and an announcement sound is output within the moving body. .
- a plurality of embodiments of the ear-mounted device 20 will be described below, taking specific situations as examples.
- Example 1 of the ear-mounted device 20 will be described.
- FIG. 3 is a flow chart of Example 1 of the ear-worn device 20 . It should be noted that Example 1 shows an operation that is assumed to be used when the user wearing the ear-mounted device 20 is on a mobile object.
- the microphone 21 acquires sound and outputs a first sound signal of the acquired sound (S11).
- the noise feature amount calculator 24b calculates spectral flatness by performing signal processing on the first sound signal, which is the first sound signal output from the microphone 21 and to which the low-pass filter 23c is applied (S12).
- Spectral flatness is an example of a feature quantity of noise included in the first sound signal, and specifically, a feature quantity indicating the flatness of the signal.
- Spectral flatness indicates, for example, how close the first sound signal is to noise, such as white noise, pink noise, or brown noise.
- the cutoff frequency of the low-pass filter 23c is 512 Hz, it can be said that the spectral flatness calculated in step S12 indicates the flatness of noise below 512 Hz.
- S k be the complex spectrum of the first sound signal to which the low-pass filter 23c is applied
- N FFT be the number of frequency bins of the Fourier transform (in other words, the number of FFT calculation points, the number of sampling points). It is calculated by the following formula. Note that exp[x] means e to the power of x, and ln(x) means log e (x).
- the numerator on the right side of the equation below corresponds to the entropy calculation, and the denominator corresponds to the calculation for normalizing the entropy.
- the sound feature amount calculation unit 24a performs signal processing on the first sound signal output from the microphone 21 and to which the noise removal filter 23a and the high-pass filter 23b are applied, thereby obtaining MFCC ( Mel-Frequency (Cepstral Coefficient) is calculated (S13).
- MFCC Mel-Frequency (Cepstral Coefficient) is calculated (S13).
- MFCC is a coefficient of cepstrum that is used as a feature quantity in speech recognition, etc.
- the determination unit 24c determines whether or not the sound acquired by the microphone 21 satisfies predetermined requirements related to noise components included in the sound (S14). Specifically, the determination unit 24c determines whether or not the value of the spectral flatness SF output from the noise feature amount calculation unit 24b is equal to or greater than a threshold.
- the spectral flatness SF takes a value between 0 and 1, and it is considered that the closer the value is to 1, the closer to white noise the microphone 21 has acquired. In other words, when the value of spectral flatness SF is equal to or greater than the threshold, it can be considered that the mobile object on which the user is riding is moving. Step S14 can be rephrased as a step of determining whether or not the moving body is moving.
- the determination unit 24c determines that the sound acquired by the microphone 21 satisfies a predetermined requirement (Yes in S14), and performs the process of step S15.
- the determination unit 24c determines whether or not the sound acquired by the microphone 21 includes a human voice, based on the MFCC output from the sound feature amount calculation unit 24a (S15).
- the determination unit 24c includes, for example, a machine learning model (neural network) that receives the MFCC as an input and outputs a determination result as to whether or not the sound contains a human voice. Using such a machine learning model, the microphone 21 determines whether or not the sound acquired by includes a human voice.
- the human voice here is assumed to be the human voice included in the announcement sound.
- the switching unit 24d When it is determined that the sound acquired by the microphone 21 includes a human voice (Yes in S15), the switching unit 24d operates in the external sound capture mode (S16). That is, the ear-worn device 20 (switching unit 24d) switches to the external sound capture mode when the moving body is moving (Yes in S14) and when the human voice is being output (Yes in S15). An operation is performed (S16).
- FIG. 4 is a first flow chart of operations in the ambient sound capture mode.
- the switching unit 24d In the external sound capture mode, the switching unit 24d generates a second sound signal by performing equalizing processing for emphasizing a specific frequency component in the first sound signal output by the microphone 21, and generates the generated second sound signal. is output (S16a).
- a specific frequency component is, for example, a frequency component of 100 Hz or more and 2 kHz or less. If the band corresponding to the frequency band of the human voice is emphasized in this way, the human voice is thereby emphasized, so the announcement sound (more specifically, the human voice included in the announcement sound) is emphasized. be.
- the mixing circuit 27b mixes the fourth sound signal (music content) received by the communication circuit 27a with the second sound signal and outputs the result to the speaker 28 (S16b), and the speaker 28 outputs the mixed fourth sound signal.
- a reproduced sound is output based on the second sound signal (S16c).
- the announcement sound is emphasized, so that the user of the ear-worn device 20 can easily hear the announcement sound.
- the switching unit 24d when it is determined that the sound acquired by the microphone 21 does not satisfy the predetermined requirements (the value of the spectral flatness SF is less than the threshold value) (No in S14 of FIG. 3), and If it is determined that the voice is not included (Yes in S14 and No in S15), the switching unit 24d operates in the normal mode (S17).
- the reproduced sound (music content) of the fourth sound signal received by the communication circuit 27a is output from the speaker 28, and the reproduced sound based on the second sound signal is not output. That is, the switching unit 24d does not cause the speaker 28 to output the reproduced sound based on the second sound signal.
- the processing shown in the flowchart of FIG. 3 above is repeated at predetermined time intervals. That is, it is determined in which mode, the normal mode or the external sound capturing mode, the operation is to be performed at predetermined time intervals.
- the predetermined time is, for example, 1/60 second.
- the ear-worn device 20 is set outside only when the moving object is moving and the condition that a human voice is being output is satisfied (that is, only when Yes in step S14 and Yes in step S15). It operates in the sound capturing mode, and otherwise operates in the normal mode.
- the DSP 22 determines that the noise contained in the sound acquired by the microphone 21 satisfies the predetermined requirements and that the sound contains a human voice, the noise based on the first sound signal Output a second sound signal.
- the DSP 22 determines that the sound acquired by the microphone 21 satisfies the predetermined requirements related to the noise component included in the sound and includes the human voice in the sound, the signal outputting the processed second sound signal;
- This signal processing includes equalizing processing for emphasizing specific frequency components of sound.
- the DSP 22 determines that the sound acquired by the microphone 21 does not satisfy the predetermined requirements, and when it determines that the sound does not include a human voice, the DSP 22 outputs the second sound signal to the speaker 28. Do not output the playback sound based on
- the ear-worn device 20 can assist the user on the mobile body to hear the announcement sound while the mobile body is moving. Even if the user is immersed in the music content, it becomes difficult for the user to miss the announcement sound.
- the operation in the ambient sound capturing mode is not limited to the operation shown in FIG.
- the equalizing process is performed in step S16a, and the second sound signal may be generated by signal processing that gains up (increases the amplitude) the first sound signal.
- the external sound capture mode it is not essential that the first sound signal is subjected to signal processing.
- FIG. 5 is a second flowchart of the operation in the ambient sound capturing mode.
- the switching unit 24d outputs the first sound signal output by the microphone 21 as the second sound signal (S16d). That is, the switching unit 24d outputs the first sound signal substantially as it is as the second sound signal.
- the switching unit 24d also instructs the mixing circuit 27b to attenuate the fourth sound signal (gain down, amplitude attenuation) during mixing.
- the mixing circuit 27b mixes the second sound signal with the fourth sound signal (music content) whose amplitude is attenuated compared to the normal mode, and outputs the result to the speaker 28 (S16e). A reproduced sound is output based on the second sound signal obtained by mixing the signals (S16f).
- the amplitude is attenuated more than during the operation of the normal mode before the output of the second sound signal is started.
- the resulting fourth sound signal may be mixed with the second sound signal.
- the operation in the external sound capturing mode is not limited to the operation shown in FIGS. 4 and 5.
- the process of attenuating the fourth sound signal may be omitted, and the unattenuated fourth sound signal may be mixed with the second sound signal.
- the ear-worn device 20 has a noise canceling function (hereinafter also referred to as a noise canceling mode) that reduces environmental sounds around the user wearing the ear-worn device 20 during reproduction of the fourth sound signal (music content). ).
- a noise canceling mode that reduces environmental sounds around the user wearing the ear-worn device 20 during reproduction of the fourth sound signal (music content).
- the noise cancellation mode will be explained.
- the CPU 33 uses the communication circuit 32 to issue a setting command for setting the noise cancellation mode to the ear-worn device 20 .
- Send to device 20 .
- the setting command is received by the communication circuit 27a of the ear-worn device 20, the switching section 24d operates in the noise canceling mode.
- FIG. 6 is a flowchart of operations in noise cancellation mode.
- the switching unit 24d performs phase inversion processing on the first sound signal output by the microphone 21 and outputs it as a third sound signal (S18a).
- a specific frequency component is, for example, a frequency component of 100 Hz or more and 2 kHz or less.
- the mixing circuit 27b mixes the fourth sound signal (music content) received by the communication circuit 27a with the third sound signal and outputs the result to the speaker 28 (S18b), and the speaker 28 outputs the mixed fourth sound signal.
- a reproduced sound is output based on the third sound signal (S18c).
- FIG. 7 is a flow chart of Example 2 of the ear-worn device 20 .
- Example 2 shows the operation when the user wearing the ear-worn device 20 rides on a moving object.
- steps S11 to S13 in FIG. 7 is the same as the processing of steps S11 to S13 in the first embodiment (FIG. 3).
- the determination unit 24c determines whether or not the sound acquired by the microphone 21 satisfies a predetermined requirement related to noise components included in the sound (S14). Details of the processing in step S14 are the same as in step S14 of the first embodiment (FIG. 3). Specifically, the determination unit 24c determines whether or not the value of the spectral flatness SF is equal to or greater than a threshold.
- the determination unit 24c determines that the sound acquired by the microphone 21 satisfies a predetermined requirement (Yes in S14), and performs the process of step S15.
- the determination unit 24c determines whether or not the sound acquired by the microphone 21 includes a human voice, based on the MFCC output from the sound feature amount calculation unit 24a (S15).
- the details of the processing of step S15 are the same as those of step S15 of the first embodiment (FIG. 3).
- the switching unit 24d When it is determined that the sound acquired by the microphone 21 includes a human voice (Yes in S15), the switching unit 24d operates in the external sound capture mode (S16). That is, the ear-worn device 20 (switching unit 24d) switches to the external sound capture mode when the moving body is moving (Yes in S14) and when the human voice is being output (Yes in S15). An operation is performed (S16). The operation in the external sound capture mode is as described with reference to FIGS. 4 and 5 and the like. Since the announcement sound is emphasized according to the operation in the external sound capture mode, the user of the ear-worn device 20 can easily hear the announcement sound.
- the switching unit 24d operates in the noise cancel mode (S18).
- the noise cancellation mode operation is as described with reference to FIG.
- the above processing shown in the flowchart of FIG. 7 is repeated at predetermined time intervals.
- the operation is to be performed at predetermined time intervals.
- the predetermined time is, for example, 1/60 second.
- the ear-worn device 20 is set outside only when the moving object is moving and the condition that a human voice is being output is satisfied (that is, only when Yes in step S14 and Yes in step S15). It operates in the sound capture mode, and in other cases it operates in the noise canceling mode.
- the DSP 22 determines that the sound acquired by the microphone 21 does not satisfy the predetermined requirements related to the noise component included in the sound, and that the sound does not include human voice. If so, the third sound signal obtained by performing the phase inversion process on the first sound signal is output.
- the speaker 28 outputs reproduced sound based on the outputted third sound signal.
- the ear-worn device 20 can help the user on a mobile object to clearly listen to music content while the mobile object is moving.
- the UI 31 of the mobile terminal 30 displays, for example, a selection screen as shown in FIG.
- FIG. 8 is a diagram showing an example of an operation mode selection screen.
- the user-selectable operating modes include, for example, three modes: normal mode, noise cancellation mode, and ambient sound capture mode. That is, the ear-worn device 20 may operate in the external sound capturing mode based on the user's operation on the mobile terminal 30 .
- Example 3 The ear-mounted device 20 determines whether or not noise satisfies a predetermined requirement based on the spectral flatness SF calculated using the portion of the first signal that does not contain human voice (whether it is moving or not) may be determined.
- FIG. 9 is a flow chart of Example 3 of such an ear-worn device 20 .
- Example 3 shows the operation when the user wearing the ear-mounted device 20 is riding a mobile object.
- the first sound signal includes a portion corresponding to the first period and a portion corresponding to the second period after the first period, the first period being the first period indicating the first sound.
- a second partial signal corresponding to one partial signal that is, a signal that is part of the first sound signal
- a second period that corresponds to a second partial signal that indicates a second sound that is, a signal that is another part of the first sound signal
- the second period is, for example, a fixed period immediately after the first period.
- steps S11 to S13 is the same as the processing of steps S11 to S13 of the first embodiment (FIG. 3).
- the determination unit 24c determines whether or not the first sound acquired by the microphone 21 includes a human voice based on the MFCC output from the voice feature amount calculation unit 24a ( S19).
- the determination unit 24c adds the first sound acquired by the microphone 21 to the first sound. It is determined whether or not predetermined requirements related to the included noise components are satisfied (S20). Specifically, the determination unit 24c determines whether or not the value of the flatness SF is equal to or greater than the threshold.
- the determination unit 24c determines that the first sound acquired by the microphone 21 satisfies the predetermined requirements (Yes in S20), and performs the process of step S21. I do.
- the determination unit 24c determines whether or not the second sound acquired by the microphone 21 includes human voice based on the MFCC output from the audio feature amount calculation unit 24a (S21).
- the switching unit 24d When it is determined that the second sound acquired by the microphone 21 includes a human voice (Yes in S21), the switching unit 24d operates in the external sound capture mode (S16).
- the operation in the external sound capture mode is as described with reference to FIGS. 4 and 5 and the like. Since the announcement sound is emphasized according to the operation in the external sound capture mode, the user of the ear-worn device 20 can easily hear the announcement sound.
- the switching unit 24d operates in the normal mode. (S17). Note that the noise canceling mode operation of step S18 may be performed instead of step S17.
- the noise cancellation mode operation is as described with reference to FIG.
- the processing shown in the flowchart of FIG. 9 above is repeated every predetermined time. That is, it is determined in which mode, the normal mode or the external sound capturing mode, the operation is to be performed at predetermined time intervals.
- the predetermined time is, for example, 1/60 second.
- the ear-worn device 20 is set outside only when the moving object is moving and the condition that a human voice is being output is satisfied (that is, only when Yes in step S20 and Yes in step S21). It operates in the sound capturing mode, and otherwise operates in the normal mode.
- the DSP 22 ensures that the first sound satisfies predetermined requirements relating to noise components contained in the sound, the first sound does not contain human voices, and the second sound does not contain human voices. output a second sound signal when it is determined that the voice of
- the ear-worn device 20 uses the portion of the first sound signal that does not contain human voice to determine whether or not the noise satisfies the predetermined requirements, thereby improving the accuracy of the determination. can be achieved.
- the determination unit 24c determines whether noise satisfies a predetermined requirement (whether the spectral flatness SF is equal to or greater than a threshold ) was determined.
- the validity of the method of applying the low-pass filter 23c will be supplemented below with reference to the waveform of the spectral flatness SF.
- FIG. 10 shows a case in which the spectrum flatness SF is calculated for the components of 512 Hz or higher of the first sound signal acquired by the microphone 21 while the moving object is moving and an announcement sound is being output within the moving object.
- FIG. 11 shows the case where the spectrum flatness SF is calculated for the component below 512 Hz of the first sound signal acquired by the microphone 21 while the moving object is moving and the announcement sound is being output within the moving object.
- FIG. 12 shows a case where the spectrum flatness SF is calculated for the components of 512 Hz or higher of the first sound signal acquired by the microphone 21 when the moving object is stopped and an announcement sound is being output within the moving object.
- FIG. 13 shows a case where the spectrum flatness SF is calculated for the component below 512 Hz of the first sound signal acquired by the microphone 21 when the moving object is stopped and an announcement sound is being output within the moving object.
- the spectral flatness SF calculated based on the components of 512 Hz or higher of the first sound signal fluctuates greatly, indicating whether or not the moving object is moving (spectral flatness). is not suitable for determining whether the ness SF is greater than or equal to a threshold).
- the spectral flatness SF calculated based on the components of less than 512 Hz of the first sound signal has relatively small fluctuations , indicating whether or not the moving object is moving.
- This is suitable for determining (whether or not the spectral flatness SF is equal to or greater than the threshold). That is, based on the first sound signal to which the low-pass filter 23c is applied, it is determined whether or not the moving object is moving (whether or not the spectral flatness SF is equal to or greater than the threshold value). can be improved.
- the determination unit 24c can determine whether the moving body is moving or stopped.
- a threshold is an example, and the threshold may be appropriately determined empirically or experimentally by the designer.
- the determination unit 24c may determine whether or not the noise satisfies a predetermined requirement based on whether or not the moving average value or moving median value of the spectral flatness SF is equal to or greater than a threshold.
- the threshold is also set to a value corresponding to the moving average value or the moving median value.
- the determination unit 24c determines whether or not the sound acquired by the microphone 21 includes human voice based on the first sound signal to which the high-pass filter 23b is applied. The validity of the method of applying the high-pass filter 23b will be supplemented below with reference to spectrograms.
- FIG. 14 is a diagram showing a spectrogram of the first sound signal acquired by the microphone 21 while the moving object is moving and an announcement sound is being output within the moving object.
- FIG. 15 is a diagram showing a spectrogram of the first sound signal acquired by the microphone 21 while the mobile body is moving and no announcement sound is output within the mobile body.
- FIG. 16 is a diagram showing a spectrogram of the first sound signal acquired by the microphone 21 when the moving object is stopped and an announcement sound is being output within the moving object.
- FIG. 17 is a diagram showing a spectrogram of the first sound signal acquired by the microphone 21 when the moving object is stationary and no announcement sound is being output within the moving object.
- the determination unit 24c can determine whether or not the sound acquired by the microphone 21 includes a human voice based on at least the 512 Hz or higher component of the first sound signal. Further, the determination unit 24c determines whether or not the sound acquired by the microphone 21 includes a human voice based on the first sound signal to which the high-pass filter 23b is applied, thereby improving the determination accuracy. can be planned.
- the noise reduction filter 23a may be an adaptive filter. Specifically, as indicated by the dashed arrow pointing from the noise feature quantity calculation unit 24b to the noise removal filter 23a in FIG. The value of F may be used to update the filter coefficients.
- FIG. 18 is a block diagram showing the functional configuration of the noise removal filter 23a that functions as an adaptive filter.
- the noise removal filter 23a implemented as an adaptive filter includes a filter coefficient updating section 23a1 and an adaptive filter section 23a2.
- the filter coefficient update unit 23a1 successively updates the coefficients of the adaptive filter based on the following update formula.
- w is the filter coefficient
- x is the first sound signal
- e is the error signal.
- the error signal is a signal corresponding to the difference between the first sound signal and the target signal after the filter coefficients have been applied.
- ⁇ is a parameter (hereinafter also referred to as a step size parameter) indicating the update amount (step size) of the filter coefficient, and is a positive coefficient.
- the adaptive filter unit 23a2 applies a filter configured by the filter coefficients calculated by the filter coefficient updating unit 23a1 to the first sound signal, and converts the first sound signal to which the filter coefficients have been applied (that is, the noise-removed first sound signal) to the high-pass filter 23b.
- the filter coefficient updating unit 23a1 may change the step size parameter using the value of the spectral flatness SF.
- the filter coefficient updating unit 23a1 changes the value of the step size parameter to a value larger than the current value as the value of the spectral flatness SF increases.
- the filter coefficient updating unit 23a1 changes the value of the step size parameter as follows using a first threshold and a second threshold larger than the first threshold.
- filter coefficient updating section 23a1 When the value of spectral flatness SF is less than the first threshold, filter coefficient updating section 23a1 changes the step size parameter to a value smaller than the current value, and changes the value of spectral flatness SF to the first threshold. If it is greater than or equal to the threshold and less than the second threshold, then the step size parameter is maintained at its current value. When the value of spectral flatness SF is equal to or greater than the second threshold, filter coefficient updating section 23a1 changes the step size parameter to a value larger than the current value.
- the noise removal filter 23a (filter coefficient updating unit 23a1) can accelerate adaptive learning when the noise is closer to white noise.
- the filter coefficient updating unit 23a1 does not have to change the step size parameter in the external sound capture mode. That is, the filter coefficient updating unit 23a1 may fix the step size parameter to a constant value in the external sound capture mode.
- the noise removal filter 23a is implemented as a feedforward control type adaptive filter using the first sound signal output by the microphone 21, but the noise removal filter 23a is a feedback control type adaptive filter. It may also be implemented as an adaptive filter.
- the noise removal filter 23a is not limited to a filter with fixed coefficients or an adaptive filter.
- the noise removal filter 23a may be a filter that has a plurality of filters of different types and that can switch among the plurality of filters based on the value of the spectral flatness SF.
- the ear-mounted device 20 includes the microphone 21 that acquires sound and outputs the first sound signal of the acquired sound, and the predetermined requirements related to the noise component included in the sound. and a DSP 22 that outputs a second sound signal based on the first sound signal when it is determined that the sound includes a human voice; and a reproduced sound based on the output second sound signal and a housing 29 that houses the microphone 21 , the DSP 22 and the speaker 28 .
- DSP22 is an example of a signal processing circuit.
- Such an ear-mounted device 20 can reproduce the voices of people heard in the surroundings according to the surrounding noise environment.
- the ear-worn device 20 can output a reproduced sound including the announcement sound from the speaker 28 when an announcement sound is output inside the mobile object while the mobile object is moving.
- the DSP 22 outputs the first sound signal as the second sound signal when determining that the sound satisfies predetermined requirements and that the sound includes a human voice.
- Such an ear-mounted device 20 can reproduce the voice of a person who can be heard in the surroundings based on the first sound signal.
- the DSP 22 determines that the sound satisfies predetermined requirements and that the sound includes a human voice
- the DSP 22 outputs a second sound signal obtained by performing signal processing on the first sound signal.
- Such an ear-mounted device 20 can reproduce the voices of people heard around it based on the signal-processed first sound signal.
- the signal processing includes equalizing processing for emphasizing a specific frequency component of the sound.
- Such an ear-mounted device 20 can emphasize and reproduce the voices of people heard in the surroundings.
- the DSP 22 determines that the sound does not satisfy the predetermined requirements, or determines that the sound does not include human voice
- the DSP 22 causes the speaker 28 to perform reproduction based on the second sound signal. Do not output sound.
- Such an ear-mounted device 20 can stop outputting the reproduced sound based on the second sound signal when, for example, no human voice can be heard in the surroundings.
- the DSP 22 determines that the sound does not satisfy the predetermined requirements, or determines that the sound does not include a human voice
- the DSP 22 performs phase inversion processing on the first sound signal.
- a third sound signal is output, and the speaker 28 outputs a reproduced sound based on the output third sound signal.
- Such an ear-mounted device 20 can make it difficult to hear surrounding sounds when, for example, people's voices cannot be heard around them.
- the ear-worn device 20 further includes a mixing circuit 27b that mixes the outputted second sound signal with the fourth sound signal provided from the sound source.
- a mixing circuit 27b that mixes the outputted second sound signal with the fourth sound signal provided from the sound source.
- Such an ear-mounted device 20 can emphasize and reproduce the voices of people heard in the surroundings.
- the DSP 22 determines whether or not the sound satisfies a predetermined requirement based on the first sound signal to which the low-pass filter 23c is applied, and based on the first sound signal to which the high-pass filter 23b is applied. to determine whether or not the sound includes a human voice.
- Such an ear-mounted device 20 can improve the determination accuracy by applying a filter to the first sound signal for determination.
- the DSP 22 determines whether or not the sound includes a human voice based on the first sound signal to which the adaptive filter is applied, and updates the filter coefficient of the adaptive filter according to the sound. Change it based on the noise it contains.
- Such an ear-worn device 20 can change the effect of the adaptive filter according to the surrounding noise environment.
- the sound includes the first sound acquired during the first period and the second sound acquired during the second period after the first period.
- the DSP 22 determines that the first sound satisfies the predetermined requirements, the first sound does not contain human voice, and the second sound contains human voice, the second sound Output a signal.
- Such an ear-mounted device 20 can improve the accuracy of determining whether the sound satisfies predetermined requirements.
- the reproduction method executed by a computer such as the DSP 22 is based on the first sound signal of the sound output by the microphone 21 that acquires the sound, and generates a predetermined sound associated with the noise component included in the sound.
- Such a reproduction method can reproduce the voices of people heard in the surroundings according to the surrounding noise environment.
- the ear-mounted device was described as an earphone-type device, but it may be a headphone-type device. Further, in the above embodiments, the ear-mounted device has the function of reproducing music content, but may not have the function of reproducing music content (communication circuit and mixing circuit).
- the ear-worn device may be earplugs or hearing aids with noise cancellation and ambient sound capture capabilities.
- the machine learning model is used to determine whether or not the sound acquired by the microphone contains a human voice. It may also be based on other algorithms that do not use models. Determining whether a sound captured by a microphone satisfies predetermined requirements related to the noise content of that sound was also done using spectral flatness, but machine learning models may be done.
- the predetermined requirement related to the noise component was the requirement corresponding to whether or not the moving body is moving.
- the predetermined requirement related to the noise component may be other requirements such as, for example, a requirement corresponding to whether the ambient noise level is greater than a predetermined value.
- the configuration of the ear-mounted device according to the above embodiment is an example.
- the ear worn device may include components not shown such as D/A converters, filters, power amplifiers, or A/D converters.
- the sound signal processing system is implemented by a plurality of devices, but may be implemented as a single device.
- the functional components included in the sound signal processing system may be distributed to the plurality of devices in any way.
- the mobile terminal may include some or all of the functional components included in the ear-worn device.
- the communication method between devices in the above embodiment is not particularly limited.
- a relay device (not shown) may intervene between the two devices.
- the order of processing described in the above embodiment is an example.
- the order of multiple processes may be changed, and multiple processes may be executed in parallel.
- a process executed by a specific processing unit may be executed by another processing unit.
- part of the digital signal processing described in the above embodiments may be realized by analog signal processing.
- each component may be realized by executing a software program suitable for each component.
- Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
- each component may be realized by hardware.
- each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.
- general or specific aspects of the present disclosure may be implemented in a system, apparatus, method, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM.
- any combination of systems, devices, methods, integrated circuits, computer programs and recording media may be implemented.
- the present disclosure may be implemented as a reproduction method executed by a computer such as an ear-worn device or a mobile terminal, or may be implemented as a program for causing a computer to execute such a reproduction method.
- the present disclosure may be implemented as a computer-readable non-temporary recording medium in which such a program is recorded.
- the program here includes an application program for causing a general-purpose mobile terminal to function as the mobile terminal of the above embodiment.
- the ear-mounted device of the present disclosure can output reproduced sounds including the voices of surrounding people according to the surrounding noise environment.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Otolaryngology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Vascular Medicine (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Neurosurgery (AREA)
- Animal Behavior & Ethology (AREA)
- Heart & Thoracic Surgery (AREA)
- Biomedical Technology (AREA)
- Psychology (AREA)
- Biophysics (AREA)
- Quality & Reliability (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
Description
[構成]
まず、実施の形態に係る音信号処理システムの構成について説明する。図1は、実施の形態に係る音信号処理システムを構成するデバイスの外観図である。図2は、実施の形態に係る音信号処理システムの機能構成を示すブロック図である。
上述のように、耳装着型デバイス20は、ユーザが乗る移動体が移動しており、かつ、移動体内でアナウンス音が出力されると、自動的に外音取り込みモードの動作を行うことができる。以下、具体的なシチュエーションを例に挙げつつ、耳装着型デバイス20の複数の実施例について説明する。まず、耳装着型デバイス20の実施例1について説明する。図3は、耳装着型デバイス20の実施例1のフローチャートである。なお、実施例1は、耳装着型デバイス20を装着しているユーザが移動体に乗っている場合に使用することを想定した動作を示す。
耳装着型デバイス20は、第4音信号(音楽コンテンツ)の再生中に、耳装着型デバイス20を装着したユーザの周囲の環境音を低減するノイズキャンセル機能(以下、ノイズキャンセルモードとも記載される)を有してもよい。
耳装着型デバイス20は、第1信号のうち人の声が含まれていない部分を用いて算出されたスペクトルフラットネスSFに基づいて、ノイズが所定の要件を満たすか否か(移動体が移動中であるか否か)の判定を行ってもよい。図9は、このような耳装着型デバイス20の実施例3のフローチャートである。
上記実施の形態では、判定部24cは、ローパスフィルタ23cが適用された第1音信号に基づいて、ノイズが所定の要件を満たすか否か(スペクトルフラットネスSFが閾値以上であるか否か)を判定した。以下、このようなローパスフィルタ23cの適用方法の妥当性について、スペクトルフラットネスSFの波形を参照しながら補足する。
また、上記実施の形態では、判定部24cは、ハイパスフィルタ23bが適用された第1音信号に基づいて、マイクロフォン21が取得した音に人の声が含まれるか否かを判定した。以下、このようなハイパスフィルタ23bの適用方法の妥当性について、スペクトログラムを参照しながら補足する。
ノイズ除去フィルタ23aは、適応フィルタであってもよい。具体的には、上記図2でノイズ特徴量算出部24bからノイズ除去フィルタ23aへ向かう破線矢印によって示されるように、ノイズ除去フィルタ23aは、ノイズ特徴量算出部24bから出力されるスペクトルフラットネスSFの値を用いてフィルタ係数を更新してもよい。図18は、適応フィルタとして機能するノイズ除去フィルタ23aの機能構成を示すブロック図である。
以上説明したように、耳装着型デバイス20は、音を取得し、取得された音の第1音信号を出力するマイクロフォン21と、当該音が当該音に含まれるノイズ成分に関連する所定の要件を満たし、かつ、当該音に人の声が含まれると判定した場合に、第1音信号に基づく第2音信号を出力するDSP22と、出力された第2音信号に基づいて再生音を出力するスピーカ28と、マイクロフォン21、DSP22、及び、スピーカ28を収容するハウジング29とを備える。DSP22は、信号処理回路の一例である。
以上、実施の形態について説明したが、本開示は、上記実施の形態に限定されるものではない。
20 耳装着型デバイス
21 マイクロフォン
22 DSP
23 フィルタ部
23a ノイズ除去フィルタ
23a1 フィルタ係数更新部
23a2 適応フィルタ部
23b ハイパスフィルタ
23c ローパスフィルタ
24 信号処理部
24a 音声特徴量算出部
24b ノイズ特徴量算出部
24c 判定部
24d 切替部
26 メモリ
27a 通信回路
27b ミキシング回路
28 スピーカ
29 ハウジング
30 携帯端末
31 UI
32 通信回路
33 CPU
34 メモリ
Claims (12)
- 音を取得し、取得された前記音の第1音信号を出力するマイクロフォンと、
前記音が当該音に含まれるノイズ成分に関連する所定の要件を満たし、かつ、前記音に人の声が含まれると判定した場合に、前記第1音信号に基づく第2音信号を出力する信号処理回路と、
出力された前記第2音信号に基づいて再生音を出力するスピーカと、
前記マイクロフォン、前記信号処理回路、及び、前記スピーカを収容するハウジングとを備える
耳装着型デバイス。 - 前記信号処理回路は、前記音が前記所定の要件を満たし、かつ、前記音に人の声が含まれると判定した場合に、前記第1音信号を前記第2音信号として出力する
請求項1に記載の耳装着型デバイス。 - 前記信号処理回路は、前記音が前記所定の要件を満たし、かつ、前記音に人の声が含まれると判定した場合に、前記第1音信号に信号処理を行った前記第2音信号を出力する
請求項1に記載の耳装着型デバイス。 - 前記信号処理には、前記音の特定の周波数成分を強調するためのイコライジング処理が含まれる
請求項3に記載の耳装着型デバイス。 - 前記信号処理回路は、前記音が前記所定の要件を満たさないと判定した場合、及び、前記音に人の声が含まれないと判定した場合には、前記スピーカに前記第2音信号に基づく再生音を出力させない
請求項1~4のいずれか1項に記載の耳装着型デバイス。 - 前記信号処理回路は、前記音が前記所定の要件を満たさないと判定した場合、及び、前記音に人の声が含まれないと判定した場合には、前記第1音信号に位相反転処理を行った第3音信号を出力し、
前記スピーカは、出力された前記第3音信号に基づいて再生音を出力する
請求項1~4のいずれか1項に記載の耳装着型デバイス。 - さらに、出力された前記第2音信号に、音源から提供される第4音信号をミキシングするミキシング回路を備え、
前記信号処理回路によって前記第2音信号の出力が開始されると、前記第2音信号の出力が開始される前よりも振幅が減衰した前記第4音信号が前記第2音信号にミキシングされる
請求項1~6のいずれか1項に記載の耳装着型デバイス。 - 前記信号処理回路は、
ローパスフィルタが適用された前記第1音信号に基づいて、前記音が前記所定の要件を満たすか否かを判定し、
ハイパスフィルタが適用された前記第1音信号に基づいて、前記音に人の声が含まれるか否かを判定する
請求項1~7のいずれか1項に記載の耳装着型デバイス。 - 前記信号処理回路は、
適応フィルタが適用された前記第1音信号に基づいて、前記音に人の声が含まれるか否かを判定し、
前記適応フィルタのフィルタ係数の更新量を、前記音に含まれるノイズに基づいて変更する
請求項1~8のいずれか1項に記載の耳装着型デバイス。 - 前記音には、第1期間に取得された第1の音、及び、前記第1期間の後の第2期間に取得された第2の音が含まれ、
前記信号処理回路は、前記第1の音が前記所定の要件を満たし、かつ、前記第1の音に人の声が含まれず、かつ、前記第2の音に人の声が含まれると判定した場合に、前記第2音信号を出力する
請求項1~9のいずれか1項に記載の耳装着型デバイス。 - 音を取得するマイクロフォンによって出力される前記音の第1音信号に基づいて、前記音が当該音に含まれるノイズ成分に関連する所定の要件を満たし、かつ、前記音に人の声が含まれると判定した場合に、前記第1音信号に基づく第2音信号を出力する出力ステップと、
出力された前記第2音信号に基づいてスピーカから再生音を出力する再生ステップとを含む
再生方法。 - 請求項11に記載の再生方法をコンピュータに実行させるためのプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022552401A JPWO2022259589A1 (ja) | 2021-06-08 | 2022-01-12 | |
EP22800559.1A EP4354898A1 (en) | 2021-06-08 | 2022-01-12 | Ear-mounted device and reproduction method |
US17/925,242 US20230320903A1 (en) | 2021-06-08 | 2022-01-12 | Ear-worn device and reproduction method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021096075 | 2021-06-08 | ||
JP2021-096075 | 2021-06-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022259589A1 true WO2022259589A1 (ja) | 2022-12-15 |
Family
ID=84425607
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/000697 WO2022259589A1 (ja) | 2021-06-08 | 2022-01-12 | 耳装着型デバイス、及び、再生方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230320903A1 (ja) |
EP (1) | EP4354898A1 (ja) |
JP (1) | JPWO2022259589A1 (ja) |
WO (1) | WO2022259589A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4117310A1 (en) * | 2021-07-09 | 2023-01-11 | Starkey Laboratories, Inc. | Method and apparatus for automatic correction of real ear measurements |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11345000A (ja) * | 1998-06-03 | 1999-12-14 | Nec Corp | 雑音消去方法及び雑音消去装置 |
JP2006093792A (ja) | 2004-09-21 | 2006-04-06 | Yamaha Corp | 特定音声再生装置、及び特定音声再生ヘッドホン |
JP2011199699A (ja) * | 2010-03-23 | 2011-10-06 | Yamaha Corp | ヘッドフォン |
JP2021511755A (ja) * | 2017-12-07 | 2021-05-06 | エイチイーディ・テクノロジーズ・エスアーエルエル | 音声認識オーディオシステムおよび方法 |
-
2022
- 2022-01-12 US US17/925,242 patent/US20230320903A1/en active Pending
- 2022-01-12 EP EP22800559.1A patent/EP4354898A1/en active Pending
- 2022-01-12 WO PCT/JP2022/000697 patent/WO2022259589A1/ja active Application Filing
- 2022-01-12 JP JP2022552401A patent/JPWO2022259589A1/ja active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11345000A (ja) * | 1998-06-03 | 1999-12-14 | Nec Corp | 雑音消去方法及び雑音消去装置 |
JP2006093792A (ja) | 2004-09-21 | 2006-04-06 | Yamaha Corp | 特定音声再生装置、及び特定音声再生ヘッドホン |
JP2011199699A (ja) * | 2010-03-23 | 2011-10-06 | Yamaha Corp | ヘッドフォン |
JP2021511755A (ja) * | 2017-12-07 | 2021-05-06 | エイチイーディ・テクノロジーズ・エスアーエルエル | 音声認識オーディオシステムおよび方法 |
Also Published As
Publication number | Publication date |
---|---|
US20230320903A1 (en) | 2023-10-12 |
EP4354898A1 (en) | 2024-04-17 |
JPWO2022259589A1 (ja) | 2022-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11710473B2 (en) | Method and device for acute sound detection and reproduction | |
CN106664473B (zh) | 信息处理装置、信息处理方法和程序 | |
US9595252B2 (en) | Noise reduction audio reproducing device and noise reduction audio reproducing method | |
US8315400B2 (en) | Method and device for acoustic management control of multiple microphones | |
JP4640461B2 (ja) | 音量調整装置およびプログラム | |
US10049653B2 (en) | Active noise cancelation with controllable levels | |
WO2009136953A1 (en) | Method and device for acoustic management control of multiple microphones | |
JP2009530950A (ja) | ウェアラブル装置のためのデータ処理 | |
JP2008099163A (ja) | ノイズキャンセルヘッドフォンおよびヘッドフォンにおけるノイズキャンセル方法 | |
JP2013065039A (ja) | ヘッドホン、ヘッドホンのノイズ低減方法、ノイズ低減処理用プログラム | |
WO2022259589A1 (ja) | 耳装着型デバイス、及び、再生方法 | |
WO2012098856A1 (ja) | 補聴器、及び、補聴器の制御方法 | |
JP2009020143A (ja) | ノイズキャンセルヘッドホン | |
CN115250397A (zh) | Tws耳机和tws耳机的播放方法及装置 | |
JPWO2016059878A1 (ja) | 信号処理装置、信号処理方法及びコンピュータプログラム | |
WO2023119764A1 (ja) | 耳装着型デバイス、及び、再生方法 | |
CN114501211A (zh) | 具有丽音通透性的主动降噪电路、方法、设备及存储介质 | |
WO2022137806A1 (ja) | 耳装着型デバイス、及び、再生方法 | |
JP5880753B2 (ja) | ヘッドホン、ヘッドホンのノイズ低減方法、ノイズ低減処理用プログラム | |
WO2023220918A1 (zh) | 一种音频信号处理方法、装置、存储介质和车辆 | |
CN117392994B (zh) | 一种音频信号处理方法、装置、设备及存储介质 | |
JP2019016851A (ja) | 音声処理装置、音声処理方法、及びプログラム | |
Patel | Acoustic Feedback Cancellation and Dynamic Range Compression for Hearing Aids and Its Real-Time Implementation | |
JP2018207313A (ja) | 音声処理装置及びその制御方法 | |
CN115580678A (zh) | 一种数据处理方法、装置和设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022552401 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22800559 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022800559 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022800559 Country of ref document: EP Effective date: 20240108 |