WO2011070971A1 - 制御装置および方法、並びにプログラム - Google Patents
制御装置および方法、並びにプログラム Download PDFInfo
- Publication number
- WO2011070971A1 WO2011070971A1 PCT/JP2010/071606 JP2010071606W WO2011070971A1 WO 2011070971 A1 WO2011070971 A1 WO 2011070971A1 JP 2010071606 W JP2010071606 W JP 2010071606W WO 2011070971 A1 WO2011070971 A1 WO 2011070971A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- value
- collection unit
- unit
- sound collection
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2200/00—Indexing scheme relating to G06F1/04 - G06F1/32
- G06F2200/16—Indexing scheme relating to G06F1/16 - G06F1/18
- G06F2200/163—Indexing scheme relating to constructional details of the computer
- G06F2200/1636—Sensing arrangement for detection of a tap gesture on the housing
Definitions
- the present invention relates to a control device, method, and program, and more particularly, to a control device, method, and program that can improve operability with a simpler configuration.
- a controller is provided in a cord connecting an earphone worn on a user's ear and the electronic device main body, and the reproduction of music by the electronic device can be controlled by operating the controller.
- a camera is provided in the controller, and the user can also take a picture by operating the controller and the electronic device body.
- buttons As many buttons as that, and the configuration of the electronic device or the like becomes complicated.
- each button becomes small or it becomes difficult to find a target button, so that operability is deteriorated.
- buttons are provided on an electronic device or the like, the operation becomes difficult.
- the present invention has been made in view of such a situation, and is intended to improve operability with a simpler configuration.
- the control device uses a sound collecting unit that picks up surrounding sound, and a maximum value and an effective value of the sound collected by the sound collecting unit. Determination means for determining whether or not the sound collection unit has been hit, and execution means for executing a predetermined process when it is determined that the sound collection unit has been hit.
- the execution means may specify the number of times the sound collection unit has been hit within a predetermined time based on the determination result by the determination means, and execute a process determined for the specified number of times. it can.
- the discriminating means can discriminate whether or not the sound collection unit has been hit based on the result of threshold processing for the maximum value and the result of threshold processing for the effective value.
- the threshold value used for threshold processing for the maximum value and the threshold value used for threshold processing for the effective value can be determined in advance by discriminant analysis or SVM.
- the determining means determines that the sound collection unit is not hit, and the sound of the sound.
- the maximum value of the low-frequency component having a frequency lower than that of the high-frequency component is less than the second threshold value, it can be determined that the sound collection unit is not hit.
- the determination means determines whether or not the effective value of the section of the high frequency component is equal to or less than a third threshold value determined for each section for each of the plurality of sections in the time direction of the high frequency component. When there is a section of the high frequency component whose effective value exceeds the third threshold value, it is determined that the sound collection unit is not beaten, and the low frequency component is divided into the low frequency component for each of the plurality of sections in the time direction. If the effective value of the section of the region component is greater than or equal to a fourth threshold determined for each section, and there is a section of the low-frequency component where the effective value is less than the fourth threshold, It can be determined that the sound collection unit is not hit.
- Each of the plurality of sections of the high-frequency component may be a section having a different length
- each of the plurality of sections of the low-frequency component may be a section having a different length
- the determination means further determines whether or not the absolute value of the high frequency component is maximized at a specific position in the time direction, and if the absolute value does not become maximum at the specific position, the sound collecting unit It can be determined that it is not beaten.
- the determination means further determines whether or not a zero cross value of the sound is equal to or less than a fifth threshold value, and if the zero cross value exceeds the fifth threshold value, the sound collection unit is not hit. Can be discriminated.
- the determination means determines whether or not a linear sum of effective values of the plurality of sections in the time direction of the high-frequency component is equal to or less than a sixth threshold value, and the linear sum exceeds the sixth threshold value. In this case, it can be determined that the sound collection unit is not hit.
- the determining means determines whether or not a linear sum of logarithmic values of respective effective values of the plurality of sections in the time direction of the high frequency component is equal to or less than a seventh threshold value, and the linear sum is the seventh When the threshold value is exceeded, it can be determined that the sound collection unit is not hit.
- the determination means determines whether or not the linear sum of the effective values of the plurality of sections in the time direction of the low frequency component is equal to or less than an eighth threshold value, and the linear sum exceeds the eighth threshold value. In this case, it can be determined that the sound collection unit is not hit.
- the determining means determines whether or not a linear sum of logarithmic values of respective effective values of the plurality of sections in the time direction of the low frequency component is equal to or less than a ninth threshold value, and the linear sum is the ninth When the threshold value is exceeded, it can be determined that the sound collection unit is not hit.
- the determination means includes a linear sum of logarithmic values of the respective effective values of the plurality of sections in the time direction of the high frequency component and a linear sum of logarithmic values of the respective effective values of the plurality of sections of the low frequency component in the time direction. It is possible to determine whether or not the sound collection unit is not struck when the sum exceeds the tenth threshold value.
- the execution means can execute processing determined by the hit sound collecting unit among the plurality of sound collecting units.
- the control method or program causes the sound collecting unit to pick up ambient sound, and uses the maximum value and effective value of the sound collected by the sound collecting unit to use the sound collecting unit. And a step of executing a predetermined process when it is determined that the sound collection unit has been hit.
- ambient sound is collected by the sound collection unit, and the maximum value and effective value of the sound collected by the sound collection unit are used to strike the sound collection unit.
- a predetermined process is executed.
- operability can be improved with a simpler configuration.
- FIG. 1 is a diagram showing a configuration example of an embodiment of a playback apparatus to which the present invention is applied.
- the playback device 11 is a portable music player that plays back sound such as music, for example.
- the playback device 11 includes an earphone 21 that is worn on the user's ear, and a main body 22 that is connected to the earphone 21 and that is carried by the user. .
- the earphone 21 is provided with a sound collection unit 31-1, a sound collection unit 31-2, a speaker 32-1, and a speaker 32-2.
- the sound collection unit 31-1 and the sound collection unit 31-2 are configured by, for example, a microphone, and collect sound around the playback device 11 and supply the sound signal obtained as a result to the main body 22.
- the speaker 32-1 and the speaker 32-2 reproduce sound based on a sound signal such as music supplied from the main body 22.
- the sound collection unit 31-1 and the sound collection unit 31-2 are also simply referred to as the sound collection unit 31 when it is not necessary to distinguish between them. Further, when it is not necessary to distinguish between the speaker 32-1 and the speaker 32-2, they are also simply referred to as the speaker 32.
- the sound collection unit 31 and the speaker 32 are integrated to form an earphone that is worn on the right or left ear of the user.
- the sound collected by the sound collection unit 31 and supplied to the main body 22 is used for so-called noise canceling and reproduction control of sound such as music.
- the main body 22 identifies the tap operation to the sound collection unit 31 by the user from the collected sound, and responds to the operation. Execute the process.
- the main body 22 of the reproducing apparatus 11 includes an A / D (Analog / Digital) conversion unit 33-1, an A / D conversion unit 33-2, a determination unit 34-1, a determination unit 34-2, a control unit 35, An audio supply unit 36, a noise removal processing unit 37, an addition unit 38, and a reproduction control unit 39 are provided.
- a / D Analog / Digital
- the A / D conversion unit 33-1 and the A / D conversion unit 33-2 convert the audio signal, which is an analog signal supplied from the sound collection unit 31-1 and the sound collection unit 31-2, into a digital signal.
- the audio signal converted into the digital signal is supplied from the A / D conversion unit 33-1 to the determination unit 34-1 and the noise removal processing unit 37.
- the audio signal converted into the digital signal is supplied from the A / D conversion unit 33-2 to the determination unit 34-2 and the noise removal processing unit 37.
- the discriminating unit 34-1 and the discriminating unit 34-2 determine whether or not the sound collecting unit 31 is directly hit based on the sound supplied from the A / D converting unit 33-1 and the A / D converting unit 33-2. And the determination result is supplied to the control unit 35. That is, it is identified whether or not the collected sound is a sound generated when the sound collection unit 31 is hit by the user.
- a / D conversion unit 33-1 and the A / D conversion unit 33-2 are also simply referred to as the A / D conversion unit 33, and the determination unit 34-1 and the determination unit 34- When it is not necessary to distinguish 2 from each other, they are simply referred to as a determination unit 34.
- the control unit 35 controls the operation of the entire playback device 11. For example, the control unit 35 causes the audio supply unit 36 to output sound such as music based on the determination result supplied from the determination unit 34 or controls the reproduction of sound in the reproduction control unit 39.
- the audio supply unit 36 records audio data such as music, decodes the audio data according to an instruction from the control unit 35, and supplies the decoded data to the addition unit 38.
- the noise removal processing unit 37 Based on the sound supplied from the A / D conversion unit 33, the noise removal processing unit 37 generates sound having a phase opposite to that of the sound around the playback device 11 and supplies the sound to the adding unit 38.
- the addition unit 38 adds the sound supplied from the noise removal processing unit 37 to the sound supplied from the sound supply unit 36 and supplies the added sound to the reproduction control unit 39.
- the reproduction control unit 39 supplies the sound supplied from the adding unit 38 to the speakers 32-1 and 32-2 according to the instruction of the control unit 35, and outputs the sound.
- the adder 38 by adding the sound of the opposite phase to the collected ambient sound to the sound of the music to be reproduced, the surrounding environmental noise is canceled and only the music can be heard by the user's ear.
- [Configuration of discriminator] 1 is configured as shown in FIG. 2 in more detail.
- the determination unit 34 includes a low-frequency filter processing unit 61, a low-frequency maximum value calculation unit 62, a low-frequency effective value calculation unit 63, a high-frequency filter processing unit 64, a high-frequency maximum value calculation unit 65, and a high-frequency effective value calculation.
- a section 66, a zero cross value calculation section 67, and a discrimination processing section 68 are included.
- the low-pass filter processing unit 61 performs a filtering process on the audio signal supplied from the A / D conversion unit 33 to extract a low-frequency component, and obtains the low-frequency signal obtained as a result from the low-frequency maximum value calculation unit 62 and The low range effective value calculation unit 63 is supplied.
- the low frequency maximum value calculation unit 62 calculates the maximum value of the low frequency signal supplied from the low frequency filter processing unit 61 (hereinafter also referred to as a low frequency maximum value), and supplies the maximum value to the discrimination processing unit 68.
- the low-frequency effective value calculation unit 63 calculates the effective value (hereinafter also referred to as a low-frequency effective value) of the low-frequency signal supplied from the low-frequency filter processing unit 61 and supplies it to the discrimination processing unit 68.
- the high-pass filter processing unit 64 performs filtering on the audio signal supplied from the A / D conversion unit 33 to extract a high-frequency component, and the resulting high-frequency signal is converted into a high-frequency maximum value calculation unit 65 and The high frequency effective value calculation unit 66 is supplied.
- the high frequency maximum value calculation unit 65 calculates the maximum value of the high frequency signal supplied from the high frequency filter processing unit 64 (hereinafter also referred to as a high frequency maximum value) and supplies it to the discrimination processing unit 68.
- the high frequency effective value calculation unit 66 calculates the effective value (hereinafter also referred to as a high frequency effective value) of the high frequency signal supplied from the high frequency filter processing unit 64, and supplies it to the discrimination processing unit 68.
- the zero cross value calculation unit 67 calculates the zero cross value of the audio signal supplied from the A / D conversion unit 33 and supplies the zero cross value to the discrimination processing unit 68.
- the discrimination processing unit 68 includes the maximum value supplied from the low frequency maximum value calculation unit 62, the low frequency effective value calculation unit 63, the high frequency maximum value calculation unit 65, the high frequency effective value calculation unit 66, and the zero cross value calculation unit 67. , The effective value, and the zero-cross value are used to identify the collected sound. That is, the determination processing unit 68 determines whether or not the sound collection unit 31 is directly hit by the user, and supplies the determination result to the control unit 35.
- the playback device 11 When the playback device 11 is operated by the user and playback of music or the like is instructed, the playback device 11 starts playback of the music. That is, the audio supply unit 36 supplies the audio (audio signal) of the designated music piece to the addition unit 38 in accordance with an instruction from the control unit 35. Further, the noise removal processing unit 37 uses the sound (environmental sound) supplied from the sound collection unit 31 via the A / D conversion unit 33 to generate a sound having a phase opposite to that of the sound, and adds the addition unit 38. To supply.
- the adding unit 38 adds the audio from the audio supplying unit 36 and the noise removal processing unit 37, and the reproduction control unit 39 supplies the audio obtained by the adding unit 38 to the speaker 32 for output.
- the user puts the earphone 21 on the ear and puts it in the pocket of the clothes wearing the main body 22.
- the user performs operations such as playback of the next music or volume adjustment, the user directly taps the sound collection unit 31 of the earphone 21 with the finger pad to instruct execution of a desired process.
- the playback device 11 in order to perform processing according to the user's operation, when the playback device 11 is turned on, the user's operation on the sound collection unit 31 is detected, and processing according to the operation is performed.
- the reproduction control process which is a process for executing the above, is repeated.
- step S ⁇ b> 11 the sound collection unit 31 collects ambient sound and supplies the sound signal obtained as a result to the A / D conversion unit 33.
- the A / D conversion unit 33 converts the audio signal from the sound collection unit 31 from an analog signal to a digital signal, and outputs the signal to the low-pass filter processing unit 61, the high-pass filter processing unit 64, and the zero-cross value calculation unit 67. Supply.
- n in the value x (n) of the audio signal that is a digital signal represents a time index, that is, what number of sampling values.
- the sampling frequency is not limited to 44.1 kHz, but may be about 16 kHz or more. This is because if the sampling frequency is about 16 kHz or more, there is almost no influence on the sound detection performance when the sound collection unit 31 is directly hit. Furthermore, if the sound collection bandwidth of the sound collection unit 31 is 8 kHz or more, an audio signal that can be sufficiently determined whether or not the sound collection unit 31 has been directly hit can be obtained.
- step S ⁇ b> 12 the low-pass filter processing unit 61 extracts a low-frequency signal from the audio signal supplied from the A / D conversion unit 33 by filter processing using a low-pass filter, and a low-frequency maximum value calculation unit 62. And supplied to the low-frequency effective value calculation unit 63.
- the low-pass filter processing unit 61 extracts the low-frequency signal xl (n) from the audio signal by calculating the following equation (1).
- Nl represents the number of taps of the low-pass filter
- hl (i) represents the coefficient of the low-pass filter. Therefore, the values of Nl audio signals that are temporally continuous from the value x (n) to the value x (n ⁇ Nl + 1) of the audio signal obtained by the most recent sampling are weighted and added to obtain a low frequency Signal xl (n).
- step S ⁇ b> 13 the high-pass filter processing unit 64 extracts a high-frequency signal from the audio signal supplied from the A / D conversion unit 33 by a filter process using a high-pass filter, and a high-frequency maximum value calculation unit 65. And supplied to the high-frequency effective value calculation unit 66.
- the high-pass filter processing unit 64 extracts the high-frequency signal xh (n) from the audio signal by calculating the following equation (2).
- Nh indicates the number of taps of the high-pass filter
- hh (i) indicates the coefficient of the high-pass filter. Therefore, the values of the Nh audio signals that are temporally continuous from the value x (n) to the value x (n ⁇ Nh + 1) of the audio signal obtained by the latest sampling are weighted and added, The signal xh (n) is used.
- the coefficients hl (i) and hh (i) in the equations (1) and (2) are FIR (Finite Impulse Response) type coefficients having a linear phase, and a low-pass filter and a high-pass filter are used.
- the cut-off frequency of the filter is 5512.5 Hz. That is, in the audio signal, a frequency component of 5512.5 Hz or less is a low-frequency signal, and a frequency component larger than 5512.5 Hz is a high-frequency signal.
- the number of taps Nl of the low-pass filter and the number of taps Nh of the high-pass filter are both 128.
- the low-pass filter and the high-pass filter have the frequency amplitude characteristics shown in FIG.
- the frequency amplitude characteristic of the low-pass filter is shown on the upper side in the figure
- the frequency amplitude characteristic of the high-pass filter is shown on the lower side in the figure.
- the vertical axis indicates the amplitude (dB)
- the horizontal axis indicates the normalized frequency.
- the amplitude is almost 0 dB from the normalized frequency of 0 to about 0.25, and the amplitude is rapidly reduced near the normalized frequency of 0.25.
- the normalized frequency is 0.3 or more, the amplitude decreases as the normalized frequency increases.
- the amplitude is approximately ⁇ 60 dB from the normalized frequency of 0 to about 0.2, and suddenly near the normalized frequency of 0.2.
- the amplitude is large.
- the normalized frequency is 0.25 or more, the amplitude is almost 0 dB.
- the stop band attenuation is set to ⁇ 60 dB.
- the stopband attenuation may be about ⁇ 40 dB or less. If the amount of attenuation in the stop band is about ⁇ 40 dB or less, the sound detection performance when the sound pickup unit 31 is directly hit is hardly affected.
- the low-pass filter may be an all-pass filter. Furthermore, the example in which the cut-off frequency of the low-pass filter and the high-pass filter is 5512.5 Hz has been described. However, the cut-off frequency may be about 2 kHz to 10 kHz. There is almost no effect on the sound detection performance when directly struck.
- an IIR (Infinite Impulse Response) type filter may be used as the low-pass filter and the high-pass filter.
- IIR Infinite Impulse Response
- step S13 when the low-frequency signal and the high-frequency signal are extracted, the process proceeds from step S13 to step S14.
- step S14 the low frequency maximum value calculation unit 62 calculates the low frequency maximum value Pl (n) based on the low frequency signal supplied from the low frequency filter processing unit 61 and supplies the low frequency maximum value Pl (n) to the discrimination processing unit 68. Specifically, the low frequency maximum value calculation unit 62 calculates the low frequency maximum value Pl (n) by calculating the following equation (3).
- a low frequency signal of 2048 samples from the current time n to a past time (n-2047) is processed, and the low frequency maximum value Pl (n) is calculated.
- the target sample number should just be about 512 samples or more. If the number of samples to be processed when calculating the low frequency maximum value Pl (n) is about 512 samples or more, the sound detection performance when the sound collection unit 31 is directly hit is not affected.
- a 512-sample low-frequency signal when a 512-sample low-frequency signal is to be processed, a 512-sample low-frequency signal from the past time (n-1536) to the past time (n-2047) is used, and the absolute value of those values is used.
- the maximum value among the values is set as the low frequency maximum value Pl (n).
- the low frequency effective value calculation unit 63 divides the low frequency signal of 2048 samples from the current time n to the past time (n-2047) into four sections at equal intervals.
- the four sections LS0 to LS3 obtained by the division are each composed of low-band signal values of continuous 512 samples.
- the calculated root mean square value is set as the low-frequency effective value rmsl (n, m) of the section LSm.
- the low-frequency effective value rmsl (n, 3) of the section LS3 is the time from the value xl (n) at the time n of the low-frequency signal to the value xl (n-511) at the time (n-511).
- the low-frequency effective value rmsl (n, m) thus obtained is an effective value in each section of the low-frequency component of the audio signal.
- the absolute value average value (primary average) of a low frequency signal is improved for the improvement of the detection performance of a sound when the sound collection part 31 is hit, and the amount of calculations reduction.
- Norm may be the low-frequency effective value.
- the low-frequency signal of 2048 samples from the current time n to the past time (n-2047) is processed, and the low-frequency effective value rmsl (n, m) is calculated.
- the number of samples to be processed may be about 1024 samples or more. If the number of samples to be processed is about 1024 samples or more, the sound detection performance when the sound collection unit 31 is directly hit is not affected.
- the low frequency signal from the current time n to the past time (n-1023) is divided into two sections at equal intervals. Then, for each section obtained by the division, the root mean square value of the 512-band low-frequency signal values constituting the section is calculated as the low-frequency effective value.
- the section to be processed (number of samples) is divided at equal intervals.
- the low-frequency signal is not evenly spaced according to the characteristics of the waveform of the low-frequency signal. It may be divided. Thereby, the section of the characteristic waveform portion of the low frequency signal is narrowed, and the detection performance is improved when detecting the sound when the sound pickup unit 31 is directly hit using the low frequency effective value. be able to.
- step S16 the high frequency maximum value calculator 65 calculates the high frequency maximum value Ph (n) based on the high frequency signal supplied from the high frequency filter processor 64. Specifically, the high frequency maximum value calculation unit 65 calculates the high frequency maximum value Ph (n) by calculating the following equation (5).
- the absolute value is obtained for each value from the value xh (n) at the current time n of the high frequency signal to the value xh (n-2047) at the past time (n-2047).
- the maximum value is set as the high frequency maximum value Ph (n).
- the high frequency maximum value calculation unit 65 is the time (time) of the sample in which the absolute value of the value of the high frequency signal is the maximum among the samples set to the high frequency maximum value Ph (n), that is, the sample to be processed.
- the index) hi is supplied to the discrimination processing unit 68 together with the high frequency maximum value Ph (n).
- the high frequency effective value calculation unit 66 divides the high frequency signal of 2048 samples from the current time n to the past time (n-2047) into 32 sections at equal intervals.
- the 32 sections HS0 to HS31 obtained by the division are each composed of 64 samples of high-frequency signal values.
- the high-frequency effective value calculation unit 66 calculates the mean square value (Euclidean norm) of the values of the high-frequency signals for 64 samples constituting the section. ) And the obtained root mean square value is defined as the high-frequency effective value rmsh (n, m) of the section HSm.
- the absolute value average value (primary average) of a high frequency signal is improved for the improvement of the audio
- Norm may be the high-frequency effective value.
- the high frequency signal of 2048 samples from the current time n to the past time (n-2047) is processed, and the high frequency effective value rmsh (n, m) is calculated.
- the number of samples to be processed may be about 1024 samples or more. If the number of samples to be processed is about 1024 samples or more, the sound detection performance when the sound collection unit 31 is directly hit is not affected.
- a high frequency signal from the past time (n-1024) to the past time (n-2047) is divided into 16 sections at equal intervals. Then, for each section obtained by the division, the root mean square value of the values of the high-frequency signal of 64 samples constituting the section is calculated as a high-frequency effective value.
- the section to be processed (number of samples) is divided at equal intervals.
- the high-frequency signal is not evenly spaced according to the characteristics of the waveform of the high-frequency signal. It may be divided. As a result, the characteristic waveform portion of the high-frequency signal is narrowed, and the detection performance is improved when the sound when the sound pickup unit 31 is directly hit is detected using the high-frequency effective value. be able to.
- step S18 the zero cross value calculation unit 67 calculates the following equation (7) based on the audio signal x (n) supplied from the A / D conversion unit 33, thereby calculating the zero cross value zcr (n). To the discrimination processing unit 68.
- negative (A) is a function that is set to 1 when the argument A is negative and is set to 0 when the argument A is not negative. Therefore, the zero cross value zcr (n) indicates the rate at which the audio signal (audio waveform) crosses 0 between the current time n and the past time (n-2048).
- the low frequency maximum value, the low frequency effective value, the high frequency maximum value, the high frequency effective value, and the zero cross value are supplied to the discrimination processing unit 68 as the feature amount of the feature of the audio signal.
- the low-frequency maximum value, the low-frequency effective value, the high-frequency maximum value, the high-frequency effective value, and the zero-cross value are also simply referred to as a feature amount of an audio signal when it is not necessary to distinguish them.
- step S19 the determination unit 34 performs a determination process, and whether or not the sound collected by the sound collection unit 31 is a sound when the sound collection unit 31 is directly hit by the belly of the user's finger. And the determination result is supplied to the control unit 35.
- the discrimination process when each feature amount of the audio signal satisfies a predetermined condition, a discrimination result that the sound collection unit 31 is directly hit is output, and each feature amount is If the predetermined condition is not satisfied, a determination result indicating that the sound collection unit 31 is not directly hit is output.
- the determination result that the sound collection unit 31 is directly hit is also referred to as a positive determination result
- the determination result that the sound collection unit 31 is not directly hit is also referred to as a negative determination result.
- the discrimination processing When the discrimination processing is performed, the discrimination results are supplied from the discrimination unit 34-1 and the discrimination unit 34-2 to the control unit 35, respectively. That is, the processing from step S11 to step S19 includes the sound collection unit 31-1, the A / D conversion unit 33-1 and the determination unit 34-1, the sound collection unit 31-2, and the A / D conversion unit 33-2. And the determination unit 34-2.
- step S20 the control unit 35 identifies the process instructed to be executed by the user based on the determination result supplied from the determination processing unit 68 of the determination unit 34.
- a specific process is associated in advance with the number of times the sound collection unit 31 has been hit within a predetermined time. For example, if only the sound collection unit 31-1 is hit only once within a predetermined time, the volume of the music being played is raised, and only the sound collection unit 31-2 is hit only once within a predetermined time. In this case, when the volume of the music being reproduced is lowered and only the sound collection unit 31-1 is hit twice within a predetermined time, the reproduction of the music is stopped.
- control unit 35 Based on the determination results sequentially supplied from the determination unit 34, the control unit 35 specifies which sound collection unit 31 has been struck (operated) within a predetermined time, and previously determines the specified result. Identify the prescribed process.
- the functions (processes) assigned to the operation of the sound collection unit 31 are executed by the playback device 11 such as switching the setting and turning the power off, as well as raising and lowering the volume, playing and stopping the music, sending music, and returning music. Any process may be used as long as the process is performed.
- processing is also possible for a combination operation of the two sound collection units 31 such as when the sound collection unit 31-1 and the sound collection unit 31-2 are alternately struck within a predetermined time or when the sound collection unit 31-2 is struck simultaneously. It may be assigned.
- step S21 the control unit 35 executes the process specified in step S20, and the reproduction control process ends.
- the control unit 35 controls the reproduction control unit 39 to temporarily stop the supply of sound from the reproduction control unit 39 to the speaker 32.
- the control unit 35 controls the reproduction control unit 39 so that the volume of the sound supplied from the reproduction control unit 39 to the speaker 32 is increased.
- the playback device 11 calculates the feature amount of the sound collected by the sound collection unit 31, and based on the feature amount, the collected sound directly hits the sound collection unit 31. It is determined whether or not the current voice is present, and processing according to the determination result is executed.
- the operability of the playback device 11 can be improved with a simpler configuration. That is, in the playback device 11, the surrounding sound is captured by the sound collection unit 31 provided for so-called noise canceling, and the feature amount of each feature is obtained for the sound. Is identified.
- the user does not have to take out the playback device from the pocket or the like and directly touch the buttons or touch panel provided on the playback device body, and can control playback of music and the like by the playback device 11 simply by tapping the sound collection unit 31. Can be executed.
- the user's operation is specified based on the sound collected by the sound collection unit 31, it is not necessary to provide a playback control button or the like on the playback device 11, and the configuration of the playback device 11 is simpler. It can be.
- step S51 the determination processing unit 68 determines whether or not the time index hi supplied from the high frequency maximum value calculation unit 65 satisfies the following expression (8).
- hi_peak is a predetermined constant, for example, 1791.
- the time index hi is a time at which the absolute value of the high frequency signal becomes maximum. Therefore, in step S51, it is determined whether or not the absolute value of the high frequency signal is maximum at time (n-hi_peak).
- the absolute value of the high frequency signal should be maximized at the time when the sound collection unit 31 is directly hit by the user. Therefore, an audio signal whose absolute value is maximum at a past time for a predetermined time (here, hi_peak) from the current time n which is a processing reference, is obtained when the sound collection unit 31 is directly hit. If it is set as a discrimination target, discrimination accuracy can be improved. In other words, the waveform of the synchronized audio signal in the period before and after the sound collecting unit 31 is directly hit by the user, that is, the waveform of a specific phase can be processed, and the discrimination of the audio can be performed more easily and accurately. Will be able to do.
- the predetermined time hi_peak is about (1791 ⁇ 128) ⁇ hi_peak ⁇ (1791 + 128), the sound detection performance when the sound pickup unit 31 is hit is hardly affected, so the time hi_peak is 1791. Any value can be used.
- step S52 If it is determined in step S51 that the absolute value of the high frequency signal is not the maximum at time (n-hi_peak), in step S52, the discrimination processing unit 68 indicates that the sound collection unit 31 has not been hit. A determination result, that is, a negative determination result is supplied to the control unit 35. When the determination result is output, the determination process ends, and then the process proceeds to step S20 in FIG.
- step S51 if it is determined in step S51 that the absolute value of the high frequency signal is maximum at time (n-hi_peak), the discrimination processing unit 68 determines from the high frequency maximum value calculation unit 65 in step S53. It is determined whether or not the supplied high frequency maximum value Ph (n) satisfies the following equation (9).
- ph_low is a predetermined threshold value, and in step S53, it is determined whether or not the high frequency maximum value Ph (n) is greater than or equal to the threshold value ph_low.
- step S53 If it is determined in step S53 that the maximum high frequency is less than the threshold value ph_low, a negative determination result is output in step S52, and the determination process ends. Thereafter, the process proceeds to step S20 in FIG.
- the high frequency component of the collected sound should have a certain level of intensity (amplitude). Therefore, when the high frequency maximum value is less than the threshold value ph_low, it is determined that the sound (audio signal) to be processed is not the sound when the sound collection unit 31 is directly struck, and negative determination is made. The result is output.
- step S54 the discrimination processing unit 68 determines each high frequency effective value rmsh (supplied from the high frequency effective value calculation unit 66). It is determined whether (n, m) satisfies the following equation (10).
- the high-frequency effective value rmsh It is determined whether or not (n, m) is equal to or less than a threshold value rmsh_high (m). That is, it is determined whether each of the high-frequency effective value rmsh (n, 0) to the high-frequency effective value rmsh (n, 31) is equal to or less than the threshold value rmsh_high (0) to the threshold value rmsh_high (31). .
- step S54 If it is determined in step S54 that the high-frequency effective value is not less than or equal to the threshold value rmsh_high (m), a negative determination result is output in step S52, and the determination process ends. Thereafter, the process proceeds to step S20 in FIG. .
- the high frequency component of the collected sound has a feature that the effective value is large in the section near the time when the sound is struck and the effective value is not so large in the other sections.
- the threshold value rmsh_high (m) for each section is determined in advance according to such characteristics. If any one of the high-frequency effective values in each section exceeds the threshold rmsh_high (m), the sound to be processed is the sound when the sound collection unit 31 is directly hit. Is determined to be negative, and a negative determination result is output.
- step S55 the determination processing unit 68 determines the low frequency maximum value supplied from the low frequency maximum value calculation unit 62. It is determined whether or not Pl (n) satisfies the following expression (11).
- pl_low is a predetermined threshold value
- step S55 it is determined whether or not the low frequency maximum value Pl (n) is equal to or greater than the threshold value pl_low.
- step S55 If it is determined in step S55 that the low frequency maximum value is less than the threshold value pl_low, a negative determination result is output in step S52, and the determination process ends. Thereafter, the process proceeds to step S20 in FIG.
- the low frequency component of the collected sound Should have some strength (amplitude). Therefore, when the low frequency maximum value is less than the threshold value pl_low, it is determined that the sound to be processed is not a sound when the sound collection unit 31 is directly hit, and a negative determination result is output.
- step S55 when it is determined in step S55 that the low frequency maximum value is equal to or greater than the threshold value pl_low, in step S56, the discrimination processing unit 68 determines each low frequency effective value rmsl ( It is determined whether (n, m) satisfies the following equation (12).
- step S56 If it is determined in step S56 that the low-frequency effective value is not greater than or equal to the threshold value rmsl_low (m), a negative determination result is output in step S52, and the determination process ends. Thereafter, the process proceeds to step S20 in FIG. .
- the low frequency component of the collected sound has a characteristic that the effective value becomes a large value for a certain period even after the time of being struck.
- the threshold value rmsl_low (m) for each section is determined in advance according to various features. If there is even one of the low-frequency effective values of each section that is less than the threshold value rmsl_low (m), the sound to be processed is the sound when the sound collection unit 31 is directly hit. It is determined that it is not voice, and a negative determination result is output.
- step S57 the determination processing unit 68 supplies the zero cross value zcr (n) supplied from the zero cross value calculation unit 67. Determines whether or not the following expression (13) is satisfied.
- zcr_high is a predetermined threshold value.
- step S57 it is determined whether or not the zero cross value zcr (n) is equal to or less than the threshold value zcr_high.
- step S57 If it is determined in step S57 that the zero cross value is greater than the threshold value zcr_high, a negative determination result is output in step S52, and the determination process ends. Thereafter, the process proceeds to step S20 in FIG.
- the zero-cross value of the collected sound should be small to some extent. Therefore, when the zero cross value zcr (n) exceeds the threshold value zcr_high, it is determined that the sound to be processed is not a sound when the sound collection unit 31 is directly hit, and a negative determination result is obtained. Is output.
- step S57 if it is determined in step S57 that the zero cross value zcr (n) is equal to or less than the threshold value zcr_high, the process proceeds to step S58.
- step S58 the discrimination processing unit 68 supplies the control unit 35 with a discrimination result indicating that the sound pickup unit 31 has been hit, that is, a positive discrimination result.
- the determination process ends, and then the process proceeds to step S20 in FIG.
- the discrimination processing unit 68 determines whether or not the feature amount of each feature of the collected sound satisfies the condition that the sound when the sound collecting unit 31 is directly hit is satisfied. Then, it is determined whether or not the collected sound is a sound when the sound collection unit 31 is directly hit. By determining whether or not a condition is satisfied for each collected voice feature, it is possible to more reliably determine the voice.
- the threshold value ph_low, threshold value rmsh_high (m), threshold value pl_low, threshold value rmsl_low (m), and threshold value zcr_high used in the discrimination processing described above are obtained in advance from a large number of samples and recorded in the discrimination processing unit 68. .
- a large number of voices when the sound pickup unit 31 is directly hit and voices when the sound pickup unit 31 is not hit are collected.
- it is determined as learning data for negative determination and a determination boundary on the feature space made up of feature amounts is obtained as a threshold value.
- FIG. 6 is a diagram illustrating the appearance probability of the high frequency maximum value Ph (n) under various environments such as a train, a bus, and walking.
- the horizontal axis indicates the maximum high frequency (dB) of each sampled voice
- the vertical axis indicates the appearance probability.
- the left side of the figure shows the appearance probability of the sound (hereinafter referred to as “environmental sound”) when the sound collection unit 31 is not hit, and the right side of the figure shows the sound collection unit 31.
- Appearance probabilities are shown for the sound (hereinafter referred to as operation sound) when is directly struck.
- the maximum high frequency of the environmental sound is distributed around -45 dB, whereas the maximum high frequency of the operation sound is distributed around 0 dB.
- the environmental sound whose high frequency is around 0 dB Almost no. That is, it can be seen that the high frequency maximum value of the operation sound is distributed in a larger value than the high frequency maximum value of the environmental sound.
- Such a difference in statistical distribution between the environmental sound and the high frequency maximum value Ph (n) of the operation sound is used to predict (discriminate) whether the sound is the environmental sound or the operation sound.
- linear discriminant analysis is used to discriminate between environmental sounds and operation sounds.
- the linear discriminant analysis is performed by using the explanatory variable as the high frequency maximum value Ph (n) and the target variable as the two groups of data of the environmental sound and the operation sound, the discriminant represented by the following equation (14) is obtained.
- the constant term ph_low is a value corresponding to the midpoint between the centroid of the environmental sound distribution and the centroid of the operation sound distribution in FIG.
- the threshold ph_low can be optimized by changing the threshold ph_low in the positive direction, that is, to be larger, but leak detection increases.
- the threshold ph_low is changed in the negative direction so as to be optimized. In the determination using the high frequency effective value, the low frequency maximum value, the low frequency effective value, and the zero cross value performed thereafter, the operation sound can be more reliably determined if the excessive detection is gradually reduced. It can be carried out.
- FIG. 7 is a diagram showing high-frequency effective values rmsh (n, m) under various environments such as trains, buses, and walks.
- the high range effective value (dB) of the section HSm is shown.
- the operation sound has a characteristic that the high-frequency effective value in the section near the time when the sound collection unit 31 is directly struck is large, and the high-frequency effective value in a section different from the section is relatively small.
- the high frequency effective value of the environmental sound has a certain level in any section.
- linear discriminant analysis is used to discriminate between environmental sounds and operation sounds.
- the linear variable discriminant analysis is performed using the explanatory variable as the high-frequency effective value rmsh (n, m) and the target variable as the data of two groups of the environmental sound and the operation sound, the discriminant represented by the following equation (15) is obtained.
- the values shown in FIG. 8 are obtained as the constant term rmsh_high (m), that is, the constant term rmsh_high (0) to the constant term rmsh_high (31) for each variable m.
- the constant terms rmsh_high (3) to the constant term rmsh_high (5) are particularly large values corresponding to the high frequency effective value of the operation sound shown in FIG.
- the discrimination score zrmsh in Expression (16) is 0 or more, the processing target sound is determined to be an operation sound, and when the determination score zrmsh is less than 0, it is determined that the processing target sound is an environmental sound. Is done.
- leakage detection and excess detection can be optimized by changing the constant term bl_rmsh.
- this constant term bl_rmsh is set as a threshold value
- the sum of products of the linear discriminant coefficient al_rmsh (m) and the logarithmic value of the high frequency effective value rmsh (n, m) with 10 as the base is obtained in step S54 of FIG. Is compared with a threshold ( ⁇ bl_rmsh), and the operation sound is discriminated. That is, when the sum of products of the linear discrimination coefficient and the logarithmic value of the high-frequency effective value exceeds ⁇ bl_rmsh, it is determined that the processing target sound is an environmental sound.
- discriminant of formula (15) to formula (17) may be determined according to the balance of the calculation amount, leak detection, and excess detection.
- FIG. 9 is a diagram illustrating the appearance probability of the low frequency maximum value Pl (n) under various environments such as a train, a bus, and walking.
- the horizontal axis indicates the low frequency maximum value (dB) of each sampled voice
- the vertical axis indicates the appearance probability.
- the appearance probability for the environmental sound is shown on the left side in the figure, and the appearance probability for the operation sound is shown on the right side in the figure.
- the low range maximum value of the environmental sound is widely distributed around ⁇ 28 dB, whereas the low range maximum value of the operation sound is widely distributed near ⁇ 10 dB. That is, it can be seen that the low frequency maximum value of the operation sound is distributed in a larger value than the low frequency maximum value of the environmental sound.
- Such a difference in statistical distribution between the environmental sound and the operation sound low range maximum value Pl (n) is used to predict whether the sound is an environmental sound or an operation sound so that excessive detection is reduced ( Determination) is performed.
- linear discriminant analysis is used to discriminate between environmental sounds and operation sounds.
- the linear discriminant analysis is performed using the explanatory variable as the low range maximum value Pl (n) and the target variable as the two groups of data of the environmental sound and the operation sound, the discriminant represented by the following equation (18) is obtained.
- the constant term pl_low is a value corresponding to the midpoint between the centroid of the environmental sound distribution and the centroid of the operation sound distribution in FIG.
- threshold rmsl_low (m) [About low-frequency effective value threshold rmsl_low (m)] Further, the threshold value rmsl_low (m) of the low-frequency effective value rmsl (n, m) will be described.
- FIG. 10 is a diagram showing the low-frequency effective value rmsl (n, m) under various environments such as trains, buses, and walking.
- the low band effective value (dB) is shown.
- the low-frequency effective value of the environmental sound is shown on the upper side in the figure, and the low-frequency effective value of the operation sound is shown on the lower side in the figure.
- linear discriminant analysis is used to discriminate between environmental sounds and operation sounds.
- the linear variable discriminant analysis is performed using the explanatory variable as the low-frequency effective value rmsl (n, m) and the target variable as the data of two groups of the environmental sound and the operation sound, the discriminant represented by the following equation (19) is obtained.
- the values shown in FIG. 11 are obtained as the constant term rmsl_low (m), that is, the constant term rmsl_low (0) to the constant term rmsl_low (3) for each variable m.
- the constant term rmsl_low (0) and the constant term rmsl_low (1) are particularly large values corresponding to the low-frequency effective value of the operation sound shown in FIG.
- the discrimination score zrmsl in Expression (20) is 0 or more, the processing target sound is determined to be an operation sound.
- the determination score zrmsl is less than 0, the processing target sound is determined to be an environmental sound. Is done.
- the operation sound is discriminated. That is, when the sum of the low-frequency effective values multiplied by the linear discrimination coefficient exceeds ⁇ b_rmsl, it is determined that the processing target sound is an environmental sound.
- leak detection and excess detection can be optimized by changing the constant term bl_rmsl.
- bl_rmsl is set as a threshold value
- the sum is compared with a threshold value ( ⁇ bl_rmsl), and the operation sound is discriminated. That is, when the sum of products of the linear discrimination coefficient and the logarithmic value of the low-frequency effective value exceeds ⁇ bl_rmsl, it is determined that the sound to be processed is an environmental sound.
- Bl_rms is a constant term for linear discrimination. Also in the discriminant of Expression (22), if the discrimination score zrms is 0 or more, it is determined as an operation sound, and if the discrimination score zrms is less than 0, it is determined as an environmental sound.
- step S56 of FIG. 5 the product of the linear discriminant coefficient al_rmsh (m) and the logarithmic value of the high frequency effective value rmsh (n, m) with 10 as the base is obtained.
- the sum of products of the sum and the linear discriminant coefficient al_rmsl (m) and the logarithmic value of the low-frequency effective value rmsl (n, m) with 10 as the base is obtained.
- the sum of the two obtained sums is compared with a threshold ( ⁇ bl_rms), and the operation sound is discriminated. That is, when the sum of the two obtained sums exceeds ⁇ bl_rms, it is determined that the processing target sound is an environmental sound. In this case, the balance between the low-frequency effective value and the high-frequency effective value is taken into consideration for the sound discrimination.
- discriminant of formula (19) to formula (22) may be determined according to the balance of calculation amount, leak detection, and excess detection.
- FIG. 12 is a diagram illustrating the appearance probability of the zero-cross value zcr (n) under various environments such as a train, a bus, and walking.
- the horizontal axis indicates the zero-cross value of each sampled voice
- the vertical axis indicates the appearance probability.
- the appearance probability for the environmental sound is shown on the left side in the figure, and the appearance probability for the operation sound is shown on the right side in the figure.
- the zero cross value of the environmental sound is relatively small and widely distributed, whereas the zero cross value of the operation sound is widely distributed around 0. Further, the zero cross values of the operation sound are distributed more in the vicinity of 0 than the zero cross values of the environmental sound. A difference in the statistical distribution of the environmental sound and the zero cross value of the operation sound is used to predict (determine) whether the sound is the environmental sound or the operation sound so that excessive detection is reduced.
- linear discriminant analysis is used to discriminate between environmental sounds and operation sounds.
- the linear discriminant analysis is performed using the explanatory variable as the zero cross value zcr (n) and the target variable as the data of two groups of the environmental sound and the operation sound, the discriminant represented by the following equation (23) is obtained.
- This constant term zcr_high is a value corresponding to the midpoint between the centroid of the environmental sound distribution and the centroid of the operation sound distribution in FIG.
- the operation sound when the sound collection unit 31 is directly hit and the environmental sound when the sound collection unit 31 is not hit are obtained. Can be determined.
- a method of creating a discriminant function for obtaining a threshold value and what feature quantity is used to create a discriminant function may be determined by a balance of discriminant performance such as leak detection and excessive detection, and a calculation amount.
- the environmental sound may be limited to only those similar to the operation sound.
- only the environmental sound whose discrimination score is near zero is made a sound similar to the operation sound, and is adopted as learning data.
- the series of processes described above can be executed by hardware or software.
- a program constituting the software may execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.
- FIG. 13 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input / output interface 205 is further connected to the bus 204.
- the input / output interface 205 includes an input unit 206 including a keyboard, a mouse, and a microphone, an output unit 207 including a display and a speaker, a recording unit 208 including a hard disk and nonvolatile memory, and a communication unit 209 including a network interface.
- a drive 210 for driving a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is connected.
- the CPU 201 loads, for example, the program recorded in the recording unit 208 to the RAM 203 via the input / output interface 205 and the bus 204, and executes the program. Is performed.
- Programs executed by the computer (CPU 201) are, for example, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor.
- the program is recorded on a removable medium 211 that is a package medium composed of a memory or the like, or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 208 via the input / output interface 205 by attaching the removable medium 211 to the drive 210. Further, the program can be received by the communication unit 209 via a wired or wireless transmission medium and installed in the recording unit 208. In addition, the program can be installed in the ROM 202 or the recording unit 208 in advance.
- the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
- 11 playback device 21 earphone, 22 body, 31-1, 31-2, 31 sound collection unit, 34-1, 34-2, 34 discrimination unit, 35 control unit, 39 playback control unit, 61 low-pass filter processing unit , 62 Low frequency maximum value calculation unit, 63 Low frequency effective value calculation unit, 64 High frequency filter processing unit, 65 High frequency maximum value calculation unit, 66 High frequency effective value calculation unit, 67 Zero cross value calculation unit, 68 Discrimination processing unit
Abstract
Description
図1は、本発明を適用した再生装置の一実施の形態の構成例を示す図である。
また、図1の判別部34は、より詳細には図2に示すように構成される。
ところで、ユーザにより再生装置11が操作され、楽曲等の再生が指示されると、再生装置11は、楽曲の再生を開始する。すなわち、音声供給部36は制御部35の指示に従って、指定された楽曲の音声(音声信号)を加算部38に供給する。また、ノイズ除去処理部37は、収音部31からA/D変換部33を介して供給された音声(環境音)を用いて、その音声とは逆位相の音声を生成し、加算部38に供給する。
次に、図5のフローチャートを参照して、図3のステップS19の処理に対応する判別処理について説明する。なお、この判別処理は、判別部34-1および判別部34-2のそれぞれにおいて行われる。
ところで、以上において説明した判別処理において用いられる閾値ph_low、閾値rmsh_high(m)、閾値pl_low、閾値rmsl_low(m)、および閾値zcr_highは、多数のサンプルから予め求められ、判別処理部68に記録される。
次に、高域実効値rmsh(n,m)の閾値rmsh_high(m)について説明する。図7は、電車、バス、徒歩などのさまざまな環境下での高域実効値rmsh(n,m)を示す図である。
次に、低域最大値Pl(n)の閾値pl_lowについて説明する。図9は、電車、バス、徒歩などのさまざまな環境下での低域最大値Pl(n)の出現確率を示す図である。なお、図中、横軸はサンプルとされた各音声の低域最大値(dB)を示しており、縦軸は出現確率を示している。
さらに、低域実効値rmsl(n,m)の閾値rmsl_low(m)について説明する。
最後に、ゼロクロス値zcr(n)の閾値zcr_highについて説明する。図12は、電車、バス、徒歩などのさまざまな環境下でのゼロクロス値zcr(n)の出現確率を示す図である。なお、図中、横軸はサンプルとされた各音声のゼロクロス値を示しており、縦軸は出現確率を示している。
Claims (18)
- 周囲の音声を収音する収音部と、
前記収音部により収音された前記音声の最大値と実効値とを用いて、前記収音部が叩かれたか否かを判別する判別手段と、
前記収音部が叩かれたと判別された場合、所定の処理を実行する実行手段と
を備える制御装置。 - 前記実行手段は、前記判別手段による判別結果に基づいて、所定時間内に前記収音部が叩かれた回数を特定し、特定された前記回数に対して定められた処理を実行する
請求項1に記載の制御装置。 - 前記実行手段は、複数の前記収音部のうち、叩かれた前記収音部により定まる処理を実行する
請求項1に記載の制御装置。 - 前記判別手段は、前記最大値に対する閾値処理の結果と、前記実効値に対する閾値処理の結果とに基づいて、前記収音部が叩かれたか否かを判別する
請求項1に記載の制御装置。 - 前記最大値に対する閾値処理に用いられる閾値、および前記実効値に対する閾値処理に用いられる閾値は、判別分析またはSVMにより予め定められる
請求項4に記載の制御装置。 - 前記判別手段は、前記音声の所定周波数よりも高い周波数の高域成分の最大値が、第1の閾値未満である場合、前記収音部が叩かれていないと判別し、前記音声の前記高域成分よりも低い周波数の低域成分の最大値が、第2の閾値未満である場合、前記収音部が叩かれていないと判別する
請求項4に記載の制御装置。 - 前記判別手段は、
前記高域成分の時間方向の複数区間のそれぞれについて、前記高域成分の区間の実効値が、区間ごとに定められた第3の閾値以下であるか否かを判定し、実効値が前記第3の閾値を超える前記高域成分の区間がある場合、前記収音部が叩かれていないと判別し、
前記低域成分の時間方向の複数区間のそれぞれについて、前記低域成分の区間の実効値が、区間ごとに定められた第4の閾値以上であるか否かを判定し、実効値が前記第4の閾値未満となる前記低域成分の区間がある場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記高域成分の複数区間のそれぞれは、互いに異なる長さの区間とされ、前記低域成分の複数区間のそれぞれは、互いに異なる長さの区間とされる
請求項7に記載の制御装置。 - 前記判別手段は、さらに前記高域成分の絶対値が、時間方向の特定位置において最大となるか否かを判定し、前記絶対値が前記特定位置において最大とならない場合、前記収音部が叩かれていないと判別する
請求項7または請求項8の何れかに記載の制御装置。 - 前記判別手段は、さらに前記音声のゼロクロス値が第5の閾値以下であるか否かを判定し、前記ゼロクロス値が前記第5の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項9に記載の制御装置。 - 前記判別手段は、前記高域成分の時間方向の複数区間のそれぞれの実効値の線形和が第6の閾値以下であるか否かを判定し、前記線形和が前記第6の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記判別手段は、前記高域成分の時間方向の複数区間のそれぞれの実効値の対数値の線形和が第7の閾値以下であるか否かを判定し、前記線形和が前記第7の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記判別手段は、前記低域成分の時間方向の複数区間のそれぞれの実効値の線形和が第8の閾値以下であるか否かを判定し、前記線形和が前記第8の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記判別手段は、前記低域成分の時間方向の複数区間のそれぞれの実効値の対数値の線形和が第9の閾値以下であるか否かを判定し、前記線形和が前記第9の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記判別手段は、前記高域成分の時間方向の複数区間のそれぞれの実効値の対数値の線形和と、前記低域成分の時間方向の複数区間のそれぞれの実効値の対数値の線形和との和が第10の閾値以下であるか否かを判定し、前記和が前記第10の閾値を超える場合、前記収音部が叩かれていないと判別する
請求項6に記載の制御装置。 - 前記収音部はイヤホンに設けられている
請求項1に記載の制御装置。 - 周囲の音声を収音する収音部と、
前記収音部により収音された前記音声の最大値と実効値とを用いて、前記収音部が叩かれたか否かを判別する判別手段と、
前記収音部が叩かれたと判別された場合、所定の処理を実行する実行手段と
を備える制御装置の制御方法であって、
前記収音部が前記音声を収音し、
前記判別手段が、前記収音部が叩かれたか否かを判別し、
前記実行手段が、前記判別手段による判別結果に応じて前記所定の処理を実行する
ステップを含む制御方法。 - 収音部に周囲の音声を収音させ、
前記収音部により収音された前記音声の最大値と実効値とを用いて、前記収音部が叩かれたか否かを判別し、
前記収音部が叩かれたと判別された場合、所定の処理を実行する
ステップを含む処理をコンピュータに実行させるプログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10835892.0A EP2386943B1 (en) | 2009-12-11 | 2010-12-02 | Mobile audio reproducing apparatus, corresponding method and computer program |
BRPI1007881A BRPI1007881A2 (pt) | 2009-12-11 | 2010-12-02 | dispositivo de controle, método de controle de um dispositivo de controle, e, programa. |
CN201080006668.7A CN102308277B (zh) | 2009-12-11 | 2010-12-02 | 控制装置、控制方法和程序 |
KR1020117018142A KR101669302B1 (ko) | 2009-12-11 | 2010-12-02 | 제어 장치 |
US13/147,858 US9053709B2 (en) | 2009-12-11 | 2010-12-12 | Control device, control method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-281964 | 2009-12-11 | ||
JP2009281964A JP5515709B2 (ja) | 2009-12-11 | 2009-12-11 | 制御装置および方法、並びにプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011070971A1 true WO2011070971A1 (ja) | 2011-06-16 |
Family
ID=44145516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/071606 WO2011070971A1 (ja) | 2009-12-11 | 2010-12-02 | 制御装置および方法、並びにプログラム |
Country Status (7)
Country | Link |
---|---|
US (1) | US9053709B2 (ja) |
EP (1) | EP2386943B1 (ja) |
JP (1) | JP5515709B2 (ja) |
KR (1) | KR101669302B1 (ja) |
CN (1) | CN102308277B (ja) |
BR (1) | BRPI1007881A2 (ja) |
WO (1) | WO2011070971A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102435885A (zh) * | 2011-10-09 | 2012-05-02 | 绵阳市维博电子有限责任公司 | 一种道岔转辙机动作电压检测方法、设备和系统 |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9599981B2 (en) | 2010-02-04 | 2017-03-21 | Echostar Uk Holdings Limited | Electronic appliance status notification via a home entertainment system |
JP5352634B2 (ja) * | 2011-07-11 | 2013-11-27 | 株式会社エヌ・ティ・ティ・ドコモ | 入力装置 |
KR102018654B1 (ko) * | 2013-06-07 | 2019-09-05 | 엘지전자 주식회사 | 이어 마이크가 장착되는 이동 단말기 및 그 동작방법 |
US9772612B2 (en) | 2013-12-11 | 2017-09-26 | Echostar Technologies International Corporation | Home monitoring and control |
US9900177B2 (en) | 2013-12-11 | 2018-02-20 | Echostar Technologies International Corporation | Maintaining up-to-date home automation models |
US9769522B2 (en) | 2013-12-16 | 2017-09-19 | Echostar Technologies L.L.C. | Methods and systems for location specific operations |
US9723393B2 (en) | 2014-03-28 | 2017-08-01 | Echostar Technologies L.L.C. | Methods to conserve remote batteries |
KR101486194B1 (ko) * | 2014-06-09 | 2015-02-11 | 박미경 | 이어폰을 이용한 입력 방법 및 장치 |
US9621959B2 (en) | 2014-08-27 | 2017-04-11 | Echostar Uk Holdings Limited | In-residence track and alert |
US9824578B2 (en) | 2014-09-03 | 2017-11-21 | Echostar Technologies International Corporation | Home automation control using context sensitive menus |
US9989507B2 (en) | 2014-09-25 | 2018-06-05 | Echostar Technologies International Corporation | Detection and prevention of toxic gas |
US9511259B2 (en) | 2014-10-30 | 2016-12-06 | Echostar Uk Holdings Limited | Fitness overlay and incorporation for home automation system |
US9983011B2 (en) | 2014-10-30 | 2018-05-29 | Echostar Technologies International Corporation | Mapping and facilitating evacuation routes in emergency situations |
US9967614B2 (en) | 2014-12-29 | 2018-05-08 | Echostar Technologies International Corporation | Alert suspension for home automation system |
US9729989B2 (en) | 2015-03-27 | 2017-08-08 | Echostar Technologies L.L.C. | Home automation sound detection and positioning |
CN106067996B (zh) * | 2015-04-24 | 2019-09-17 | 松下知识产权经营株式会社 | 语音再现方法、语音对话装置 |
US9948477B2 (en) | 2015-05-12 | 2018-04-17 | Echostar Technologies International Corporation | Home automation weather detection |
US9946857B2 (en) | 2015-05-12 | 2018-04-17 | Echostar Technologies International Corporation | Restricted access for home automation system |
US9632746B2 (en) * | 2015-05-18 | 2017-04-25 | Echostar Technologies L.L.C. | Automatic muting |
US9960980B2 (en) | 2015-08-21 | 2018-05-01 | Echostar Technologies International Corporation | Location monitor and device cloning |
US10589051B2 (en) | 2015-10-20 | 2020-03-17 | Steven Salter | CPAP compliance notification apparatus and method |
US9996066B2 (en) | 2015-11-25 | 2018-06-12 | Echostar Technologies International Corporation | System and method for HVAC health monitoring using a television receiver |
US10101717B2 (en) | 2015-12-15 | 2018-10-16 | Echostar Technologies International Corporation | Home automation data storage system and methods |
US9743170B2 (en) | 2015-12-18 | 2017-08-22 | Bose Corporation | Acoustic noise reduction audio system having tap control |
US10091573B2 (en) | 2015-12-18 | 2018-10-02 | Bose Corporation | Method of controlling an acoustic noise reduction audio system by user taps |
US10110987B2 (en) * | 2015-12-18 | 2018-10-23 | Bose Corporation | Method of controlling an acoustic noise reduction audio system by user taps |
US9798309B2 (en) | 2015-12-18 | 2017-10-24 | Echostar Technologies International Corporation | Home automation control based on individual profiling using audio sensor data |
US9930440B2 (en) | 2015-12-18 | 2018-03-27 | Bose Corporation | Acoustic noise reduction audio system having tap control |
US10091017B2 (en) | 2015-12-30 | 2018-10-02 | Echostar Technologies International Corporation | Personalized home automation control based on individualized profiling |
US10060644B2 (en) | 2015-12-31 | 2018-08-28 | Echostar Technologies International Corporation | Methods and systems for control of home automation activity based on user preferences |
US10073428B2 (en) | 2015-12-31 | 2018-09-11 | Echostar Technologies International Corporation | Methods and systems for control of home automation activity based on user characteristics |
US9628286B1 (en) | 2016-02-23 | 2017-04-18 | Echostar Technologies L.L.C. | Television receiver and home automation system and methods to associate data with nearby people |
US9882736B2 (en) | 2016-06-09 | 2018-01-30 | Echostar Technologies International Corporation | Remote sound generation for a home automation system |
US10294600B2 (en) | 2016-08-05 | 2019-05-21 | Echostar Technologies International Corporation | Remote detection of washer/dryer operation/fault condition |
US10049515B2 (en) | 2016-08-24 | 2018-08-14 | Echostar Technologies International Corporation | Trusted user identification and management for home automation systems |
WO2018167901A1 (ja) * | 2017-03-16 | 2018-09-20 | ヤマハ株式会社 | ヘッドフォン |
CN106814670A (zh) * | 2017-03-22 | 2017-06-09 | 重庆高略联信智能技术有限公司 | 一种河道采砂智能监管方法及系统 |
US10354641B1 (en) | 2018-02-13 | 2019-07-16 | Bose Corporation | Acoustic noise reduction audio system having tap control |
CN112468918A (zh) * | 2020-11-13 | 2021-03-09 | 北京安声浩朗科技有限公司 | 主动降噪方法、装置、电子设备以及主动降噪耳机 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004340706A (ja) * | 2003-05-15 | 2004-12-02 | Toshiba Mitsubishi-Electric Industrial System Corp | 機器の診断装置 |
JP2006323943A (ja) * | 2005-05-19 | 2006-11-30 | Sony Corp | 再生装置,プログラム及び再生制御方法 |
JP2008054103A (ja) * | 2006-08-25 | 2008-03-06 | Nec Corp | 携帯電子機器及びその制御方法 |
JP2008166897A (ja) * | 2006-12-27 | 2008-07-17 | Sony Corp | 音声出力装置、音声出力方法、音声出力処理用プログラムおよび音声出力システム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6772094B2 (en) * | 2000-10-23 | 2004-08-03 | James Tyson | Sound-based vessel cleaner inspection |
JP2005250584A (ja) * | 2004-03-01 | 2005-09-15 | Sharp Corp | 入力装置 |
KR100677613B1 (ko) * | 2005-09-09 | 2007-02-02 | 삼성전자주식회사 | 멀티미디어 기기의 동작을 제어하는 방법 및 그 장치 |
CN100555353C (zh) * | 2006-08-28 | 2009-10-28 | 日本胜利株式会社 | 电子设备的控制装置及电子设备的控制方法 |
JP4671055B2 (ja) * | 2007-11-26 | 2011-04-13 | セイコーエプソン株式会社 | 叩きコマンド処理システム、電子機器の操作システム及び電子機器 |
-
2009
- 2009-12-11 JP JP2009281964A patent/JP5515709B2/ja active Active
-
2010
- 2010-12-02 KR KR1020117018142A patent/KR101669302B1/ko active IP Right Grant
- 2010-12-02 EP EP10835892.0A patent/EP2386943B1/en not_active Not-in-force
- 2010-12-02 WO PCT/JP2010/071606 patent/WO2011070971A1/ja active Application Filing
- 2010-12-02 CN CN201080006668.7A patent/CN102308277B/zh not_active Expired - Fee Related
- 2010-12-02 BR BRPI1007881A patent/BRPI1007881A2/pt not_active IP Right Cessation
- 2010-12-12 US US13/147,858 patent/US9053709B2/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004340706A (ja) * | 2003-05-15 | 2004-12-02 | Toshiba Mitsubishi-Electric Industrial System Corp | 機器の診断装置 |
JP2006323943A (ja) * | 2005-05-19 | 2006-11-30 | Sony Corp | 再生装置,プログラム及び再生制御方法 |
JP2008054103A (ja) * | 2006-08-25 | 2008-03-06 | Nec Corp | 携帯電子機器及びその制御方法 |
JP2008166897A (ja) * | 2006-12-27 | 2008-07-17 | Sony Corp | 音声出力装置、音声出力方法、音声出力処理用プログラムおよび音声出力システム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102435885A (zh) * | 2011-10-09 | 2012-05-02 | 绵阳市维博电子有限责任公司 | 一种道岔转辙机动作电压检测方法、设备和系统 |
Also Published As
Publication number | Publication date |
---|---|
KR20120111917A (ko) | 2012-10-11 |
JP5515709B2 (ja) | 2014-06-11 |
EP2386943B1 (en) | 2018-11-14 |
US20110295396A1 (en) | 2011-12-01 |
US9053709B2 (en) | 2015-06-09 |
EP2386943A1 (en) | 2011-11-16 |
JP2011123751A (ja) | 2011-06-23 |
CN102308277B (zh) | 2015-03-25 |
KR101669302B1 (ko) | 2016-10-25 |
CN102308277A (zh) | 2012-01-04 |
BRPI1007881A2 (pt) | 2016-02-23 |
EP2386943A4 (en) | 2012-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5515709B2 (ja) | 制御装置および方法、並びにプログラム | |
JP4640461B2 (ja) | 音量調整装置およびプログラム | |
US9998081B2 (en) | Method and apparatus for processing an audio signal based on an estimated loudness | |
JP5493611B2 (ja) | 情報処理装置、情報処理方法およびプログラム | |
CN104246877B (zh) | 用于音频信号处理的系统和方法 | |
JP4640463B2 (ja) | 再生装置、表示方法および表示プログラム | |
CN102668374B (zh) | 音频录音的自适应动态范围增强 | |
US8804976B2 (en) | Content reproduction device and method, and program | |
WO2006075275A1 (en) | Audio entertainment system, method, computer program product | |
JP2012027186A (ja) | 音声信号処理装置、音声信号処理方法及びプログラム | |
JP4623124B2 (ja) | 楽曲再生装置、楽曲再生方法および楽曲再生プログラム | |
JP3810257B2 (ja) | 音声帯域拡張装置及び音声帯域拡張方法 | |
DE102012103553A1 (de) | Audiosystem und verfahren zur verwendung von adaptiver intelligenz, um den informationsgehalt von audiosignalen in verbraucheraudio zu unterscheiden und eine signalverarbeitungsfunktion zu steuern | |
JP2010021627A (ja) | 音量調整装置、音量調整方法および音量調整プログラム | |
JP2011237753A (ja) | 信号処理装置および方法、並びにプログラム | |
CA2869884C (en) | A processing apparatus and method for estimating a noise amplitude spectrum of noise included in a sound signal | |
CN106066782B (zh) | 一种数据处理方法及电子设备 | |
JP3933909B2 (ja) | 音声/音楽混合比推定装置およびそれを用いたオーディオ装置 | |
JP4495704B2 (ja) | 音像定位強調再生方法、及びその装置とそのプログラムと、その記憶媒体 | |
JP5126281B2 (ja) | 楽曲再生装置 | |
US8242836B2 (en) | Acoustic characteristic control apparatus | |
JP2012095254A (ja) | 音量調整装置、音量調整方法及び音量調整プログラム並びに音響機器 | |
KR100406248B1 (ko) | 새로운 음 연주 판별 방법 및 그 장치 | |
JPH05183522A (ja) | 音声・楽音識別回路 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080006668.7 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 20117018142 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13147858 Country of ref document: US Ref document number: 5660/CHENP/2011 Country of ref document: IN Ref document number: 2010835892 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10835892 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: PI1007881 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: PI1007881 Country of ref document: BR Kind code of ref document: A2 Effective date: 20110804 |