WO2016203753A1 - Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium - Google Patents

Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium Download PDF

Info

Publication number
WO2016203753A1
WO2016203753A1 PCT/JP2016/002839 JP2016002839W WO2016203753A1 WO 2016203753 A1 WO2016203753 A1 WO 2016203753A1 JP 2016002839 W JP2016002839 W JP 2016002839W WO 2016203753 A1 WO2016203753 A1 WO 2016203753A1
Authority
WO
WIPO (PCT)
Prior art keywords
section
frame
impact sound
signal
feature amount
Prior art date
Application number
PCT/JP2016/002839
Other languages
French (fr)
Japanese (ja)
Inventor
旭美 梅松
亮輔 磯谷
剛範 辻川
秀治 古明地
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2017524606A priority Critical patent/JPWO2016203753A1/en
Publication of WO2016203753A1 publication Critical patent/WO2016203753A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the present invention relates to a noise detection device, a noise suppression device, a noise detection method, a noise suppression method, and a recording medium.
  • Patent Document 1 and Non-Patent Document 1 describe a technique for determining whether or not there is a sudden noise, and reducing the sudden noise if it exists.
  • Patent Document 2 describes that the presence of a sudden change in the input signal is determined based on the linearity of the phase component signal in the frequency domain.
  • Patent Document 3 describes that audio information is extracted from reproduction information including music information.
  • Patent Document 4 An example of improving the quality of voice is described in Patent Document 4, for example.
  • Sudden noise is, for example, an impact sound.
  • the impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object.
  • noise reduction processing noise suppression processing
  • the present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for more suitably detecting an impact sound section from an acoustic signal.
  • the noise detection device calculates, from an acoustic signal including an impact sound, a feature amount representing a steep change in the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length.
  • Calculating means for detecting, as a start time of an impact sound section in which the impact sound is present, a frame in which a signal change is sharper than an audio signal based on the feature amount;
  • Second detection means for detecting, based on the feature amount, the last frame among the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section; .
  • the noise suppression device is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and
  • the detection means for detecting the first section where the power exists in a wide band and the first information related to the frame different from the frame included in the first section are used to relate to the frame included in the first section.
  • the noise suppression device is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and Detection means for detecting an initial section in which the power exists in a wide band, and replacement means for replacing or deleting a signal in the first section with a predetermined signal prepared in advance.
  • the noise detection method provides, for each frame obtained by dividing a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound by dividing the acoustic signal into a predetermined time length. Calculated, based on the feature amount, detects a frame having a greater steep change of the signal than the audio signal as a start time of the impact sound section where the impact sound exists, and based on the feature amount, The last frame is detected as the end time of the impact sound section among the frames in which the signal change is sharper than the sound signal continuously from the start time.
  • the noise suppression method from an acoustic signal including an impact sound, the first interval of the impact sound, the power is greater than the subsequent interval following the first interval, and A second interval related to a frame included in the first interval is detected using a first information related to a frame different from a frame included in the initial interval, by detecting an initial interval in which the power exists in a wide band. Is replaced with the first information, or a frame included in the first section is interpolated with information based on the first information.
  • the sudden noise is, for example, an impact sound.
  • the impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object.
  • the impact sound in each embodiment of the present invention is not limited to the above, for example, applause, the sound of falling coins, the sound of hitting a castanette, the sound of clap chopsticks, glass, plastic, metal, ceramic, wood, It may be a sound of hitting or hitting pottery and cans.
  • FIG. 1 is a diagram illustrating an example of a spectrogram of an impact sound.
  • the horizontal axis indicates time (seconds), and the vertical axis indicates frequency (kHz).
  • the impact sound includes a section in which the signal power is large and the power is present in a wide band, and a section in which the signal power is small and the power is present in a narrow band.
  • the former section is referred to as a hitting section or a hitting section
  • the latter section is referred to as an attenuation section or an attenuation section.
  • the impact sound includes the hitting section and the attenuation section.
  • the hitting section has a large signal power and exists in a wide band. Therefore, compared with the case where the entire impact sound section is detected and the noise of the entire impact sound section is suppressed, it is more acoustic signal to detect the hitting section section and suppress the noise of the striking section section. Recognition rate can be improved. This is because noise with higher power can be suppressed, and a section in which noise suppression processing is performed can be shortened.
  • FIG. 2 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 10 according to the present embodiment.
  • the noise detection apparatus 10 includes a calculation unit 11, a first detection unit 12, and a second detection unit 13.
  • the calculation unit 11 calculates, from the acoustic signal including the impact sound, a feature amount indicating the steepness of the change of the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length.
  • the calculation unit 11 outputs the calculated feature amount to the first detection unit 12 and the second detection unit 13.
  • the first detection unit 12 receives the feature amount calculated for each frame from the calculation unit 11. Based on the received feature value, the first detection unit 12 selects a frame having a greater steep change of the signal than the audio signal in an impact sound section that is a section where the impact sound exists in the acoustic signal. Detect as start time. The first detection unit 12 outputs the detected start time of the impact sound section to the second detection unit 13.
  • the second detection unit 13 receives the feature amount calculated for each frame from the calculation unit 11.
  • the second detection unit 13 receives the start time of the impact sound section from the first detection unit 12. Based on the received feature value, the second detection unit 13 detects the last frame of the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section. To do.
  • the first detection unit 12 of the noise detection apparatus 10 detects the start time of the impact sound interval, and the second detection unit 13 detects the end time of the impact sound interval. .
  • the noise detection apparatus 10 which concerns on this Embodiment can detect the impact sound area which is an area where an impact sound exists among acoustic signals.
  • the impact sound section is a striking section section where the power of the signal is large and the power exists in a wide band, and in that section, the signal changes more rapidly than the acoustic signal or section where only the sound exists.
  • the noise detection apparatus 10 which concerns on this Embodiment can detect the impact part area of an impact sound among acoustic signals more suitably.
  • FIG. 3 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 100 according to the present embodiment.
  • the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, and a second detection unit 130.
  • the calculation unit 110 includes a conversion unit 111 and an index calculation unit (linearity calculation unit) 112. Further, the conversion unit 111 includes a frame division unit 1111, a windowing processing unit 1112, and a Fourier transform unit 1113.
  • the index calculation unit 112 includes a change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123.
  • the frame division unit 1111 of the conversion unit 111 receives an acoustic signal (also referred to as an input signal) from the outside of the noise detection device 100, for example.
  • the frame dividing unit 1111 divides the received acoustic signal into frames in which one frame includes K samples.
  • K is assumed to be a positive even number.
  • the frame division unit 1111 outputs signal samples, which are acoustic signals divided into frames, to the windowing processing unit 1112.
  • the window processing unit 1112 receives the signal sample from the frame division unit 1111.
  • the windowing processing unit 1112 multiplies the received signal sample by the window function w (t).
  • a signal sample (also referred to as a window signal) windowed by the window function w (t) can be calculated by the following equation (1).
  • the windowing processing unit 1112 may window by overlapping (overlapping) a part of two consecutive frames.
  • a Hanning window represented by the following equation (2) can be used.
  • the windowing processing unit 1112 may window using various window functions such as a Hamming window and a triangular window.
  • the windowing processing unit 1112 outputs the windowing signal to the Fourier transform unit 1113.
  • the Fourier transform unit 1113 receives the windowing signal from the windowing processing unit 1112.
  • the Fourier transform unit 1113 performs a Fourier transform on the received windowed signal.
  • j represents an imaginary unit
  • represents an amplitude spectrum
  • p n (k) represents a phase spectrum
  • the Fourier transform unit 1113 separates the signal spectrum X n (k) into a phase spectrum p n (k) and an amplitude spectrum
  • the Fourier transform unit 1113 outputs the phase spectrum p n (k) obtained by separating the signal spectrum X n (k) to the index calculation unit 112 for each frame.
  • the phase spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as a phase component signal.
  • the amplitude spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as an amplitude component signal. In this way, by performing Fourier transform on the window signal, the Fourier transform unit 1113 can extract the phase component signal in the frequency domain from the acoustic signal.
  • the Fourier transform unit 1113 has been described with respect to Fourier transform of the windowed signal, but the present embodiment is not limited to this.
  • the Fourier transform unit 1113 may perform, for example, Hadamard transform, Haar transform, wavelet transform, or the like on the windowed signal instead of Fourier transform.
  • the change amount calculation unit 1121 of the index calculation unit 112 receives the phase component signal from the Fourier transform unit 1113 of the conversion unit 111 for each frame.
  • a change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123 described below perform processing in units of frames.
  • the change amount calculation unit 1121 calculates a phase component change amount ⁇ p n (k), which is a phase difference between adjacent frequency indexes (adjacent frequency bands), using the following equation (4). Use to calculate.
  • the change amount calculation unit 1121 outputs the calculated change amount ⁇ p n (k) of the phase component to the difference calculation unit 1122.
  • the difference calculation unit 1122 receives the phase component change amount ⁇ p n (k) from the change amount calculation unit 1121.
  • the difference calculation unit 1122 uses the received phase component variation ⁇ p n (k) to calculate the phase component variation ⁇ p n (k) between adjacent frequency indexes using the following equation (5). To calculate.
  • the difference calculation unit 1122 can obtain the variation of the change amount ⁇ p n (k) of the phase component along the frequency axis.
  • the change amount ⁇ p n (k) of the change amount of the phase component is also referred to as a change amount difference ⁇ p n (k).
  • the difference calculation unit 1122 outputs the calculated change amount difference ⁇ p n (k) to the feature amount calculation unit 1123.
  • the feature amount calculation unit 1123 receives the change amount difference ⁇ p n (k) from the difference calculation unit 1122. Then, the feature amount calculation unit 1123 averages the change amount differences ⁇ p n (k) in all frequency indexes in the frame (in this case, the nth frame) for which the change amount difference ⁇ p n (k) is obtained. Is calculated.
  • the calculated average value is a phase feature amount in a frame for which the average value is calculated. Further, it can be said that the average value of the change amount difference ⁇ p n (k) is the degree of variation (index indicating variation) of the phase component change amount ⁇ p n (k) in the frame.
  • the feature amount calculation unit 1123 calculates the phase feature amount PL n that is an average value of the change amount difference ⁇ p n (k) using the following equation (6).
  • the feature amount calculation unit 1123 calculates a value obtained by subtracting from 1 an average value obtained by dividing the cosine of the change amount difference ⁇ p n (k) by the number N of frequency indexes.
  • the calculated value is defined as a phase feature amount PL n .
  • the phase feature amount PL n is also an indicator representing the variation of the phase component variation ⁇ p n (k) in the frame, and is also referred to as an indicator PL n .
  • the phase feature amount PL n takes a value from 0 to 2.
  • the phase characteristic amount PL n is the phase spectrum p n (k) represents the how close to a straight line, it can be said that the index indicating the linearity of the phase spectrum p n (k).
  • the average value (phase feature amount PL n ) of the change amount difference ⁇ p n (k) takes a value closer to 0.
  • the closer the value of the phase feature amount PL n is to 0, the higher the linearity of the phase spectrum p n (k).
  • the feature amount calculation unit 1123 may obtain a variance value instead of the average value as an index representing the variation of the phase component variation ⁇ p n (k) along the frequency axis. Also in this case, when the phase feature amount PL n is a value closer to 0, it can be seen that the linearity of the phase spectrum p n (k) is high.
  • the index calculation unit 112 has been described to obtain the phase feature amount PL n by calculating the average value or the variance value of the change amount difference ⁇ p n (k). It is not limited to.
  • the index calculation unit 112 may obtain a regression line of the phase spectrum pn (k) and calculate a deviation from the regression line. Thereby, the index calculation unit 112 can calculate the deviation from the regression line as the phase feature amount PL n .
  • the feature amount calculation unit 1123 has been described as calculating the above-described phase feature amount PL n as an index representing the variation of the phase component variation ⁇ p n (k) along the frequency axis. .
  • This index may be the phase feature amount itself or information including the phase feature amount.
  • the feature amount calculation unit 1123 of the index calculation unit 112 outputs the calculated phase feature amount PL n to the first detection unit 120 and the second detection unit 130.
  • Information indicating the frame for which the phase feature amount PL n is calculated is associated with the phase feature amount PL n transmitted by the index calculation unit 112.
  • the information indicating the frame is, for example, a frame number. In the present embodiment, description will be made assuming that a frame number is associated with the phase feature amount PL n .
  • the storage unit 140 stores a threshold value Th start (first threshold value) and a threshold value Th end (second threshold value).
  • the storage unit 140 may be built in the noise detection device 100 or may be realized by a storage device separate from the noise detection device 100. Further, the threshold value Th start and the threshold value Th end may be stored in different storage units. The threshold Th start and the threshold Th end may be stored in a storage unit (not shown) in the first detection unit 120 and the second detection unit 130, respectively.
  • the threshold value Th start is a value used when the first detection unit 120 described later detects the first detection point.
  • the first detection point indicates a point in time when the power of the acoustic signal suddenly increases and starts to exist in a wide band in a short time of about several milliseconds to several tens of milliseconds.
  • the threshold value Th end is a value used when the second detection unit 130 described later detects the second detection point.
  • the second detection point indicates a point in time when the power of the acoustic signal suddenly decreases and begins to exist in a narrow band.
  • the second detection point is a time after the first detection point described above.
  • phase feature amount PL n for the voice-only section is referred to as PL speech .
  • PL speech is not limited to the phase feature amount PL n of the speech-only section, and may be, for example, the average of the phase feature amount PL n calculated from each of a large amount of learning data.
  • the training data for example, of the sound, the data comprising a phase characteristic amount PL n calculated for frames impact noise is not present, the phase characteristic amount PL n calculated for frames comprising a section of the speech There may be other data.
  • the PL speech may be, for example, a phase feature amount PL n calculated in advance for a background noise section of an acoustic signal.
  • the background noise is, for example, a vehicle sound, a mechanical sound such as an air conditioner, a bubble noise, and a noise in which a plurality of these sounds overlap.
  • the PL speech may be a phase feature amount PL n calculated from, for example, white noise, pink noise, or the like.
  • a value indicating the degree of change (steepness) of the acoustic signal will be described.
  • the value indicating the degree of change in the acoustic signal indicates a smaller value as the degree of change is larger (steepness is greater).
  • the phase feature amount is a value indicating the degree of change (steepness) of the acoustic signal.
  • the acoustic signal of the impact sound striking portion changes more rapidly than the acoustic signal in which only the sound exists or the signal in the section in which only the sound exists.
  • the change in signal is more abrupt at the start time of the hitting portion than at the end time. Therefore, the phase characteristic amount PL n, smaller towards the end time of the striking part than the speech signal, a smaller value than the start time of the striking part.
  • the following formula (7) is established for the threshold value Th start , the threshold value Th end, and the PL speech .
  • the threshold Th start may be calculated based on the phase feature amount PL n calculated using an acoustic signal including an impact sound as learning data. Further, an arbitrary value close to 0, for example, 0.1 may be set as the threshold Th start .
  • the threshold value Th end is a value calculated so as to satisfy the above formula (7) by using the previously calculated threshold value Th start and PL speech .
  • the threshold value Th start and the threshold value Th end that are calculated by the calculation unit 110 and satisfy the equation (7) are stored in advance.
  • the first detection unit 120 receives the phase feature amount PL n from the calculation unit 110.
  • the first detection unit 120 detects a first detection point from the received phase feature quantity PL n .
  • the first detection unit 120 compares the value of the received phase feature quantity PL n with the threshold value Th start stored in the storage unit 140.
  • Th start stored in the storage unit 140.
  • the phase feature amount PL n is smaller than the threshold value Th start , that is, when PL n ⁇ Th start is satisfied
  • the first detection unit 120 displays a frame indicated by the frame number associated with the PL n. It is determined that this is the start frame of the hitting section.
  • the first detection unit 120 acquires time information indicating the time of the frame from the frame determined to be the start frame.
  • the time information acquired by the first detection unit 120 may be a frame number, a start time of the frame, or other time included in the frame.
  • the first detection unit 120 detects a frame number or time indicated by the acquired time information as a first detection point. In the following description, the first detection point is described as a frame number.
  • the acoustic signal is a signal in which the impact sound is not superimposed on the voice signal or background noise, that is, when the acoustic signal is the voice signal or background noise, the phase spectrum pn (k) does not become a straight line. Therefore, the value of the phase feature amount PL n in such an acoustic signal is larger than the phase feature amount PL n when the impact sound is superimposed on the audio signal or background noise.
  • the first detection unit 120 can determine the start time of the impact sound hitting section by comparing the phase feature amount PL n and the threshold Th start .
  • the 1st detection part 120 outputs the time information showing the detected 1st detection point to the 2nd detection part 130 as start time information of a hit
  • the start time information of the hitting section represents the frame number. Since the time information indicating the first detection point is the start time information of the hitting section, the first detection point is hereinafter also referred to as the start time of the hitting section.
  • the second detection unit 130 receives the phase feature amount PL n from the calculation unit 110. Further, the second detection unit 130 receives the start time information of the hitting section from the first detection unit 120. Then, the second detector 130, a phase characteristic amount PL n received, based on the start time information of the striking part section, detects the second detection point.
  • the second detection unit 130 calculates the phase calculated with respect to the frame that is temporally later than the frame number represented by the start time information of the hitting unit section associated with the received phase feature amount PL n.
  • the feature amount PL n is compared with the threshold value Th end stored in the storage unit 140. Then, the second detection unit 130 determines whether or not the value of the phase feature quantity PL n is larger than the threshold value Th end , that is, whether Th end ⁇ PL n is satisfied, and the phase feature quantity PL n is greater than the threshold value Th end .
  • the second detection unit 130 determines that the frame immediately before the identified frame is the end frame of the hitting section.
  • the second detection unit 130 acquires time information indicating the time of the frame from the frame determined to be the end frame.
  • the time information acquired by the second detection unit 130 may be a frame number, a frame end time, or another time included in the frame.
  • the second detection unit 130 detects a frame number or time indicated by the acquired time information as a second detection point. In the following description, the second detection point is described as a frame number.
  • the hitting portion interval signal change continues steep state, the value of the phase characteristic amount PL n is low condition persists. Then, when the hitting section ends, the change in the acoustic signal becomes gradual and the value of the phase feature amount increases.
  • the 2nd detection part 130 can specify that the flame
  • the 2nd detection part 130 makes the time information showing the detected 2nd detection point the end time information of a hit
  • the second detection point is also referred to as the end time of the hitting section.
  • the second detection unit 130 may set the earlier one of the end time detected as described above and the time after an arbitrary time has elapsed from the start time (for example, one second later) as the end time. This is because in an actual environment, the end time detected by the second detection unit 130 may be considerably delayed from the start time due to the influence of reverberation and the like. Generally, the impact sound hitting section is often 1 second or less. Therefore, when the end time is not detected, for example, after 1 second from the start time, the second detection unit 130 may set the end time as the end time. Thereby, the noise detection apparatus 100 can reduce the misrecognition of the speech recognition due to the impact sound hitting section becoming longer.
  • the 2nd detection part 130 is the start time of the hit
  • FIG. 4 is a flowchart showing an example of the operation of the noise detection apparatus 100 according to the present embodiment.
  • the frame dividing unit 1111 of the converting unit 111 of the calculating unit 110 divides the acoustic signal into frames having a predetermined time length (step S41).
  • the noise detection apparatus 100 sets flag to 0 and n to 0 as initial values (step S42). flag takes a value of 0 or 1.
  • n is a variable indicating a frame number, and the upper limit is a number obtained by subtracting 1 from the number divided in step S41 (denoted as DIV).
  • the windowing processing unit 1112 of the conversion unit 111 performs windowing processing on the signal samples included in the divided frames (step S43).
  • the Fourier transform unit 1113 of the transform unit 111 calculates the phase spectrum pn (k) by performing Fourier transform on the signal sample that has been windowed for each frame (step S44).
  • the change amount calculation unit 1121 of the index calculation unit 112 of the calculation unit 110 calculates the change amount ⁇ p n (k) of the phase component (step S45).
  • the difference calculation unit 1122 of the index calculation unit 112 calculates a change amount difference ⁇ p n (k) that is a change amount of the change amount of the phase component (step S46).
  • the feature amount calculation unit 1123 of the index calculation unit 112 calculates a phase feature amount PL n that is an index indicating the linearity of the phase spectrum p n (k) (step S47).
  • step S48 determines whether or not the flag is 0 (step S48). If the flag is not 0 (NO in step S48), the process proceeds to step S54. If flag is 0 (YES in step S48), the process proceeds to step S49. This flag indicates whether or not the start time of the hitting section is detected. When it is 0, it indicates that it is not detected, and when it is 1, it indicates that it is detected.
  • step S48 determines whether the phase feature amount PL n calculated in step S47 is smaller than the threshold Th start . It is determined whether or not (step S49).
  • phase feature amount PL n is equal to or greater than threshold value Th start (NO in step S49)
  • noise detection apparatus 100 increments n (step S52) and determines whether or not incremented n is smaller than DIV (step S52).
  • step S53 If n is greater than or equal to DIV (NO in step S53), noise detection apparatus 100 ends the process. If n is smaller than DIV (YES in step S53), noise detection apparatus 100 returns the process to step S43. And the noise detection apparatus 100 performs the process of step S43 to step S48 with respect to the following flame
  • the first detection unit 120 detects the frame indicated by the frame number associated with the phase feature amount PL n as the hitting unit section. Is detected as a start frame (start time) (step S50). And the noise detection apparatus 100 sets flag to 1 (step S51). And the noise detection apparatus 100 advances a process to step S52. Then, noise detection apparatus 100 increments n (step S52), and when incremented n is smaller than DIV (YES in step S53), processing from step S43 to step S48 is executed for the next frame.
  • step S48 determines whether the phase feature amount PL n calculated in step S47 is greater than the threshold Th end . It is determined whether or not (step S54).
  • phase feature amount PL n is equal to or smaller than threshold value Th end (NO in step S54)
  • noise detection apparatus 100 advances the process to step S52.
  • the second detection unit 130 relates to the phase feature amount PL n and is one frame before the frame indicated by the frame number.
  • the frame is detected as the end frame (end time) of the hitting section (step S55).
  • the 2nd detection part 130 determines a hit
  • the noise detection apparatus 100 may sequentially receive acoustic signals and perform noise detection processing in real time. And the noise detection apparatus 100 may complete
  • the first detection unit 120 compares the feature amount calculated by the calculation unit 110 with the first threshold value. Then, the first detection unit 120 calculates the feature amount when the steepness of the change in the acoustic signal represented by the feature amount is larger than the steepness of the change in the acoustic signal represented by the first threshold value. The frame is detected as the start time of the impact sound section. Further, the second detection unit 130 compares the feature amount calculated by the calculation unit 110 with the second threshold value. The second threshold value represents a steepness smaller than the steepness of the change in the acoustic signal represented by the first threshold value.
  • the second detection unit 130 When the steepness of the change in the acoustic signal represented by the feature amount is equal to or less than the steepness of the change in the acoustic signal represented by the second threshold, the second detection unit 130 The previous frame is detected as the end time of the impact sound section.
  • the noise detection apparatus 100 can detect the time when the steep change of the acoustic signal starts and the time when it ends more accurately.
  • the point in time when the abrupt change of the acoustic signal starts corresponds to the start time of the hitting section shown in FIG.
  • the time when the steep change of the acoustic signal ends corresponds to the end time of the hitting section. Therefore, the noise detection apparatus 100 can more accurately detect the start time and the end time of the hitting section shown in FIG. 1 among the impact sounds.
  • the noise detection apparatus 100 can detect the start time and the end time of the hitting section. Thereby, the noise detection apparatus 100 can determine a hit
  • the noise detection apparatus 100 can further improve the recognition performance when performing speech recognition. For example, a scene where a voice is recognized when a store clerk is serving a customer at a store window or the like will be described. In this situation, when the store clerk is the target speaker and the store clerk's voice is the target voice, the store clerk talks while showing the catalog to the customer, or operates the keyboard and mouse to enter customer information. However, there are cases where customers come in and speak. In this case, since the object sound and the work sound generated by the target speaker are superimposed on the target voice, the voice recognition accuracy in the collected acoustic signal may be lowered.
  • the noise detection apparatus 100 it is possible to determine a section of impact sound such as work sound, in particular, a hitting section, and thus the apparatus for suppressing noise is determined. It is possible to perform processing for suppressing noise in the section. As a result, it is possible to extract a voice with suppressed noise, and thus it is possible to improve the recognition accuracy for this voice.
  • the determination of the hitting section by the noise detection device 100 should be preferably applied to the noise detection field. Can do.
  • the present invention can be applied to cases where the user views the collected sound by reducing noise, for example, touching a microphone or a large impact sound by noise suppression.
  • the noise detection apparatus 100 can detect an event such as door opening / closing and applause.
  • the noise detection apparatus 100 can be applied to, for example, detection of a section that suppresses noise such as a sound generated by a speaker when the target speaker's voice is desired.
  • the noise detection apparatus 100 can suppress the superimposed noise from the target signals such as voice and music and the noise signal superimposed on them.
  • the noise detection apparatus 100 can be applied to any other signal processing apparatus that is required to determine whether or not an input signal includes a rapidly changing section.
  • the feature amount calculation unit 1123 has been described as obtaining the phase feature amount PL n by calculating the average value of the change amount difference ⁇ p n (k) as the feature amount.
  • description will be given of the case where the feature quantity calculating unit 1123 obtains the phase feature quantity PL n by calculating the distribution of the change amount difference ⁇ p n (k) as the feature quantity.
  • the feature amount calculation unit 1123 uses the change amount difference ⁇ p n (k) received from the difference calculation unit 1122 to obtain a histogram in the frame in which the change amount difference ⁇ p n (k) is calculated. At this time, the feature amount calculation unit 1123 obtains a histogram using the value of the change amount difference ⁇ p n (k) as a bin.
  • the feature amount calculation unit 1123 can determine that the linearity of the phase spectrum is high. Then, the feature amount calculation unit 1123 may calculate the index PL n based on this histogram.
  • the feature amount calculation unit 1123 determines an arbitrary frequency index range, for example, k-100 to k-1, k to k + 99, etc., and uses the frequency index value as a bin to calculate the change amount difference ⁇ p n (k). A distribution may be obtained. Then, the feature quantity calculation unit 1123 may calculate the inter-distribution distance based on this distribution and calculate the index PL n .
  • the feature amount calculation unit 1123 can calculate the feature amount based on the distribution, not the feature amount based on the average value of the change amount difference ⁇ p n (k). And the noise detection apparatus 100 can determine a hit
  • FIG. 5 is a functional block diagram showing an example of a functional configuration of the noise suppression apparatus 200 according to the present embodiment.
  • members having the same functions as those included in the drawings described in the second embodiment described above are given the same reference numerals, and descriptions thereof are omitted.
  • the noise suppression apparatus 200 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, and a replacement unit 210.
  • the functional configurations of the noise detection device 10 and the noise detection device 100 are the same as the functional configuration described with reference to FIG. 2 and FIG. In the following description, it is assumed that the noise suppression device 200 includes the noise detection device 100, but it goes without saying that the noise suppression device 200 may be configured to include the noise detection device 10.
  • the noise detection apparatus 100 outputs information indicating the hitting section to the replacing unit 210. Specifically, the noise detection apparatus 100 uses the information indicating the start time calculated by the first detection unit 120 and the end time calculated by the second detection unit 130 as information indicating the hitting section. It outputs to the substitution part 210 with the information which shows the acoustic signal used as the object which determines an impact part area.
  • the replacement unit 210 receives information indicating the hitting section from the noise detection device 100 together with information indicating the acoustic signal. Then, the replacement unit 210 receives an acoustic signal represented by information indicating the received acoustic signal, for example, from the outside of the noise suppression device 200. Then, the replacement unit 210 associates the time information of the received acoustic signal with the time information represented by the information indicating the hitting section interval received from the noise detection apparatus 100, and the start time of the hitting section in the received acoustic signal. And the end time.
  • the replacement unit 210 replaces the signal of the frame included in the hitting section with the signal of the immediately preceding frame using the signal of the frame immediately before the frame indicated by the specified start time.
  • FIG. 6 is a diagram for explaining the operation of the replacement unit 210.
  • the horizontal axis shown in FIG. 6 indicates the frame number, and the vertical axis indicates the frequency (kHz).
  • the upper diagram in FIG. 6 shows the acoustic signal before replacement, and the lower diagram in FIG. 6 shows the acoustic signal after replacement.
  • the hitting section determined by the noise detection device 100 is a section from the nth frame to the n + 1th frame.
  • the start time of the hitting section is the nth frame and the end time is the (n + 1) th frame.
  • the replacement unit 210 the n-th frame is the signal samples of the n + 1 frame x n (t) and x n + 1 a (t), the n-1 frame signal samples x n which is the immediately preceding frame start time -1 Replace with (t).
  • the signals of the nth frame and the (n + 1) th frame are replaced with the same signals as the signals of the (n ⁇ 1) th frame.
  • the replacement unit 210 replaces the signal of the hitting section with the signal of the frame immediately before the start time of the hitting section, but the present embodiment is not limited thereto. Is not to be done.
  • the replacement unit 210 may replace the feature amount of the hitting section with the feature amount of the frame immediately before the start time of the hitting section. This feature amount may be, for example, a mel frequency cepstrum coefficient generally used for speech recognition, a mel logarithmic spectrum, or the like, or other feature amount.
  • the replacement unit 210 uses information related to a frame different from the frame of the hitting section (for example, a signal of the frame other than the hitting section, a feature amount, etc.) and relates to the frame of the hitting section. The information is replaced with information related to a frame different from the frame of the hitting section.
  • the signal that the replacement unit 210 replaces the signal of the striking section may be a signal of a frame immediately before the start time of the striking section, or a signal of a frame immediately after the end time of the striking section. Also good.
  • the replacement signal may be a signal of a frame immediately before the start time of the hitting section and a signal of a frame immediately after the end time of the hitting section.
  • the replacement unit 210 calculates the center time of the hitting section, replaces the signal of the frame before the calculated center time with the signal of the frame immediately before the start time of the hitting section, and is later than the calculated center time.
  • the signal of the frame may be replaced with the signal of the frame immediately after the end time of the hitting section. At this time, the time calculated by the replacement unit 210 may not be the central time, and may be an arbitrary time.
  • the replacement unit 210 calculates the signal of the striking section using the signal of the frame immediately before the start time of the striking section and the signal of the frame immediately after the end time of the striking section, and the striking section May be interpolated with the calculated signal. For example, the replacement unit 210 adds an arbitrary weight to the signal of the frame immediately before the start time of the hitting section and the signal of the frame immediately after the end time of the hitting section, thereby adding the signal of the hitting section. And the striking section may be interpolated with the calculated signal.
  • the replacement unit 210 may replace the signal of the hitting section with a noise such as a zero signal or white noise.
  • the replacement unit 210 may delete the signal of the hitting section and generate a signal that connects the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.
  • the replacement unit 210 may detect a predetermined number of frames from the frame immediately after the end time of the hitting section as the impact sound attenuation section and perform further noise suppression processing.
  • FIG. 7 is a flowchart showing an example of the operation of the noise suppression apparatus 200 according to the present embodiment.
  • step S71 the noise detection device 100 of the noise suppression device 200 performs a striking section determination process for determining a striking section.
  • This step S71 indicates that the processes of steps S41 to S56 described with reference to FIG. 4 are performed.
  • the replacement unit 210 identifies the frame immediately before the start time of the hitting section determined in step S71 (step S72). Then, the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame (step S73). Thereby, the replacement unit 210 can suppress noise in the hitting section of the acoustic signal. Thus, the noise suppression device 200 ends the process.
  • the noise in the striking section is suppressed by replacing the signal in the striking section with a signal of a frame different from the frame in the striking section. can do.
  • the length of the striking section when the replacement section 210 performs the replacement is preferable as the length of the striking section when the replacement section 210 performs the replacement. This is because, when speech recognition of an acoustic signal subjected to noise suppression processing is performed, the speech recognition rate can be improved when the replacement interval is shorter.
  • the noise suppression apparatus 200 can obtain an effect that noise can be further suppressed in addition to the effect according to the second embodiment described above.
  • the replacement unit 210 included in the noise suppression device 200 has been described as an example of a configuration different from that of the noise detection device 100.
  • the present embodiment is not limited to this. It is not something.
  • the replacement unit 210 may be built in the noise detection apparatus 100.
  • the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, a second detection unit 130, a storage unit 140, and a replacement unit 210.
  • Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 200 according to the present embodiment.
  • FIG. 8 is a functional block diagram illustrating an example of a functional configuration of the noise suppression apparatus 300 according to the present embodiment.
  • members having the same functions as those included in the drawings described in the second and third embodiments described above are denoted by the same reference numerals and description thereof is omitted.
  • the noise suppression apparatus 300 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, a replacement unit 210, and a waveform conversion unit 310. And.
  • the functional configurations of the noise detection device 10 and the noise detection device 100 are the same as the functional configuration described with reference to FIG. 2 and FIG. In the following description, it is assumed that the noise suppression device 200 includes the noise detection device 100, but it goes without saying that the noise suppression device 200 may be configured to include the noise detection device 10.
  • the waveform conversion unit 310 receives from the replacement unit 210 the signal on which the replacement unit 210 has performed suppression processing. Specifically, the waveform converter 310 receives the signal after the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame.
  • the specified frame is, for example, a frame immediately before the start time of the hitting section in the acoustic signal.
  • the waveform converter 310 converts the received signal into a form usable by the user. Specifically, the waveform converter 310 converts the received signal into a waveform that can be viewed and heard by the user.
  • the waveform converting unit 310 performs inverse Fourier transform on the received signal, thereby converting the received signal into a waveform. Convert to
  • the waveform converter 310 can display the waveform on a display device (not shown).
  • the noise suppression apparatus 300 can present to the user an acoustic signal in a state where the user can use it, and the noise is suppressed.
  • the waveform conversion unit 310 included in the noise suppression device 300 has been described as an example of a configuration different from that of the noise detection device 100.
  • the present embodiment is not limited to this. Is not to be done.
  • the waveform conversion unit 310 may be built in the noise detection apparatus 100. Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 300 according to the present embodiment.
  • noise suppression processing In order to improve the speech signal recognition rate, it is necessary to appropriately perform processing (noise suppression processing) for reducing noise from the speech signal. This is because if the noise suppression process is insufficient, noise remains superimposed on the audio signal, and the recognition rate of the audio signal is reduced. Moreover, if the noise suppression process is excessively performed, even necessary speech is suppressed as noise, and the recognition rate of the speech signal is reduced.
  • an object of the present embodiment is to more effectively perform noise suppression processing of an audio signal.
  • FIG. 9 is a functional block diagram illustrating an example of a functional configuration of the noise suppression device 400 according to the present embodiment.
  • noise suppression apparatus 400 according to the present embodiment includes detection section 410 and replacement section 420.
  • the detection unit 410 detects the first section of the impact sound from the acoustic signal including the impact sound.
  • This first section is a section where the power is larger than the subsequent section following the first section and the power exists in a wide band.
  • the first section detected by the detection unit 410 is the hitting section described with reference to FIG.
  • the detection unit 410 is realized by, for example, the noise detection apparatus 100 in each of the above-described embodiments.
  • the noise detection apparatus 100 may detect the hitting section using an index (phase feature amount PL n ) indicating the linearity of the phase spectrum.
  • the detection unit 410 is not limited to that realized by the noise detection device in each of the above-described embodiments.
  • a sudden change in volume, a change in magnitude of an amplitude feature, a power spectrum feature, a time change thereof, The flatness of the spectrum may be calculated as a feature amount, and the hitting section may be detected using the calculated feature amount.
  • the detection part 410 may detect a hit
  • damage part area is not specifically limited.
  • the detection unit 410 outputs the detected information indicating the hitting section to the replacement unit 420.
  • the replacement unit 420 acquires section information indicating the hitting section from the detection unit 410. Then, the replacement unit 420 specifies a hitting unit section indicated by the received section information in the acoustic signal. Then, a frame different from the frame included in the specified section is specified as a frame for replacing information.
  • the frame that the replacement unit 420 specifies as a frame for replacing information may be, for example, the frame immediately before the start time of the hitting unit section, similarly to the replacement unit 210 in the third embodiment described above. Further, the frame specified by the replacement unit 420 as a frame for replacing information may be, for example, a frame immediately after the end time of the hitting section.
  • the replacement unit 420 replaces the second information related to the frame included in the hitting section with the first information using the specified first information related to the frame for replacing the information.
  • the information related to the frame is, for example, an acoustic signal (signal sample) included in the frame
  • the replacement unit 420 replaces the signal sample of the frame included in the striking unit section with the signal sample of the specified frame.
  • the replacement unit 420 uses the first information related to the specified frame to replace information to identify the frame included in the hitting unit section. Interpolation may be performed using information based on the information of 1.
  • the replacement unit 420 may replace the signal of the hitting section with a noise such as a zero signal or white noise.
  • the replacement unit 420 may delete the signal of the hitting section and generate a signal connecting the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.
  • FIG. 10 is a flowchart showing an example of the operation of the noise suppression apparatus 400 according to the present embodiment.
  • step S101 the detection unit 410 detects a hitting section (step S101).
  • the process of step S101 may be the same process as step S71 of FIG.
  • the replacement unit 420 specifies a frame for replacing information with the frame in the section detected in step S101 (step S102). Then, the replacement unit 420 replaces the second information related to the frame corresponding to the detected section of the acoustic signal with the first information related to the identified frame (step S103). Thereby, the replacement part 420 can suppress the noise of the impact part hit
  • the second information related to the frame of the impact sound hitting section is stored in a frame different from the frame of the impact sound hitting section. It can be replaced with related first information. Thereby, the noise suppression apparatus 400 can suppress the noise in the impact sound hitting section.
  • the noise suppression device 400 can obtain an effect that the noise suppression processing of the voice signal can be performed more effectively.
  • FIG. 11 Each part of the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 is the same as the hardware shown in FIG. It may be realized with hardware resources. That is, the configuration shown in FIG. 11 includes a RAM (Random Access Memory) 91, a ROM (Read Only Memory) 92, a communication interface 93, a storage medium 94, and a CPU (Central Processing Unit) 95. The CPU 95 reads out various software programs (computer programs) stored in the ROM 92 or the storage medium 94 to the RAM 91 and executes them, so that the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400) are executed. It governs overall operation.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • the CPU 95 reads out various software programs (computer programs) stored in the ROM 92 or the storage medium 94 to the RAM 91 and executes them, so that the noise detection devices (10, 100) and the noise suppression
  • the CPU 95 executes each function (each unit) included in the noise detection device (10, 100) and the noise suppression device (200, 300, 400) while referring to the ROM 92 or the storage medium 94 as appropriate. Execute the software program to be executed.
  • the present invention described by taking each embodiment as an example supplied a computer program capable of realizing the functions described above to the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400). Thereafter, the computer program is read out by the CPU 95 to the RAM 91 and executed.
  • the supplied computer program may be stored in a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device.
  • a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device.
  • the present invention can be understood as being configured by a code representing the computer program or a storage medium storing the computer program.
  • the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 are shown in each block.
  • the case where the function is realized by a software program has been described as an example executed by the CPU 95 shown in FIG.
  • some or all of the functions shown in the blocks shown in FIGS. 2, 3, 5, 8, and 9 may be realized as hardware circuits.
  • the calculation means which calculates the feature-value showing the steepness of the change of the said acoustic signal from the acoustic signal containing an impact sound for every flame
  • the first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value.
  • a frame in which the feature value is calculated is detected as a start time of the impact sound section, and the second detection unit is configured to detect the feature value and the first sound value.
  • the second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value.
  • the supplementary note 1 wherein a frame immediately before the frame for which the feature amount is calculated is detected as an end time of the impact sound section when the acoustic signal is represented by Noise detection device.
  • the said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise detection apparatus according to appendix 1 or 2, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.
  • the said linearity calculation means calculates the linearity of the said phase spectrum using the value based on the dispersion
  • the noise detection device according to supplementary note 3, wherein the noise detection device is calculated.
  • the second information related to the frame included in the shock sound section is replaced with the first information.
  • noise detection according to any one of appendices 1 to 4, further comprising replacement means for interpolating a frame included in the impact sound section with information based on the first information. apparatus.
  • a noise suppression device comprising: replacement means for replacing with information or interpolating a frame included in the first section with information based on the first information.
  • a noise suppression apparatus comprising: a detection unit that detects the signal and a replacement unit that replaces or deletes the signal in the first section with a predetermined signal prepared in advance.
  • the said detection means calculates the feature-value showing the steepness of the change of the said acoustic signal from the said acoustic signal for every flame
  • the said characteristic First detection means for detecting a frame having a greater signal change steepness than the audio signal based on the amount as the start time of the first section, and continuing from the start time based on the feature amount Or a second detection means for detecting the last frame of the frames whose signal change is sharper than the audio signal as the end time of the first section.
  • the noise suppression device according to 7.
  • the first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value.
  • a frame in which the feature amount is calculated is detected as a start time of the first section, and the second detection unit is configured to detect the feature amount and the first
  • the second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value.
  • the said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise suppression device according to appendix 8 or 9, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.
  • the said linearity calculation means calculates
  • a feature amount representing the steepness of the change of the acoustic signal is calculated for each frame obtained by dividing the acoustic signal into a predetermined time length, and based on the feature amount, A frame in which the change of the signal is sharper than that of the audio signal is detected as a start time of the impact sound section where the impact sound exists, and based on the feature amount, the frame is continuously detected from the start time.
  • a noise detection method comprising: detecting a last frame of frames having a large signal change steepness as an end time of the impact sound section.
  • the first section of the impact sound From the acoustic signal containing the impact sound, it is the first section of the impact sound, the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band And the second information related to the frame included in the first section is replaced with the first information using the first information related to the frame different from the frame included in the first section. Or interpolating a frame included in the first section with information based on the first information.
  • Noise detection apparatus 11
  • Calculation part 12 1st detection part 13 2nd detection part 100
  • Noise detection apparatus 110 Calculation part 111 Conversion part 1111 Frame division part 1112 Windowing process part 1113 Fourier transform part 112
  • Index calculation part 1121 Change amount calculation Unit 1122 difference calculation unit 1123 feature quantity calculation unit 120 first detection unit 130 second detection unit 140 storage unit 200 noise suppression device 210 replacement unit 300 noise suppression device 310 waveform conversion unit 400 noise suppression device 410 detection unit 420 replacement unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

Provided is a technology for suitably detecting an interval including an impact sound from an acoustic signal. A noise detection device comprises: a calculation unit for calculating, from the acoustic signal including the impact sound, a feature amount representing a gradient in an acoustic signal for each frame with a prescribed time length into which the acoustic signal is divided; a first detection unit for detecting, on the basis of the feature amount, a frame that has a larger gradient in the signal than a speech signal as the start time of an impact sound interval during which the impact sound exists; and a second detection unit for detecting, on the basis of the feature amount, the last frame having a larger gradient in the signal than the speech signal continuously from the start time as the end of the impact sound interval.

Description

雑音検出装置、雑音抑圧装置、雑音検出方法、雑音抑圧方法、および、記録媒体Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium
 本発明は、雑音検出装置、雑音抑圧装置、雑音検出方法、雑音抑圧方法、および、記録媒体に関する。 The present invention relates to a noise detection device, a noise suppression device, a noise detection method, a noise suppression method, and a recording medium.
 現在、音声信号から雑音を低減させる技術が考えられている。例えば、特許文献1および非特許文献1には、突発性雑音の存在の有無を判別し、存在する場合、突発性雑音を低減する技術が記載されている。 Currently, technologies for reducing noise from audio signals are being considered. For example, Patent Document 1 and Non-Patent Document 1 describe a technique for determining whether or not there is a sudden noise, and reducing the sudden noise if it exists.
 また、特許文献2には、周波数領域における位相成分信号の直線性に基づいて、入力信号の急変の存在を判定することが記載されている。 Patent Document 2 describes that the presence of a sudden change in the input signal is determined based on the linearity of the phase component signal in the frequency domain.
 また、音声ファイルを抽出する方法の一例として、例えば、音楽情報を含む再生情報から音声情報を抽出することが特許文献3に記載されている。 Also, as an example of a method for extracting an audio file, for example, Patent Document 3 describes that audio information is extracted from reproduction information including music information.
 また、音声の品質を改善する一例が、例えば、特許文献4に記載されている。 An example of improving the quality of voice is described in Patent Document 4, for example.
特許第4456504号公報Japanese Patent No. 4456504 国際公開第2014/136628号International Publication No. 2014/136628 特開2011-248202号公報JP 2011-248202 A 特許第4098817号公報Japanese Patent No. 4098817 特開平9-331310号公報JP-A-9-331310
 突発性雑音は、例えば、衝撃音である。この衝撃音とは、物体と物体とが衝突した際に発生する物音や、爆発音、瞬間的に急激な力が物体に作用するときに生じる音である。 Sudden noise is, for example, an impact sound. The impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object.
 音声信号の認識率を向上させる場合、音声信号から雑音を低減する処理(雑音抑圧処理)を行う区間を適切な長さにする必要がある。なぜならば、長時間、雑音抑圧処理を行うことにより、正しい音声の区間まで、雑音として抑圧してしまい、認識率を低下させてしまう可能性があるからである。 In order to improve the speech signal recognition rate, it is necessary to set an appropriate length for a section for performing noise reduction processing (noise suppression processing) from the speech signal. This is because, by performing noise suppression processing for a long time, it may be suppressed as noise up to the correct speech section, and the recognition rate may be reduced.
 しかしながら、上述した各特許文献および非特許文献には、衝撃音が存在する区間を検出することについては、何ら開示されていない。 However, in each of the above-mentioned patent documents and non-patent documents, there is no disclosure about detecting a section where an impact sound exists.
 本発明は上記課題に鑑みてなされたものであり、その目的は、音響信号から、衝撃音の区間をより好適に検出する技術を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for more suitably detecting an impact sound section from an acoustic signal.
 本発明の一態様に係る雑音検出装置は、衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する算出手段と、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出する第1の検出手段と、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する第2の検出手段と、を備える。 The noise detection device according to an aspect of the present invention calculates, from an acoustic signal including an impact sound, a feature amount representing a steep change in the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length. Calculating means; first detection means for detecting, as a start time of an impact sound section in which the impact sound is present, a frame in which a signal change is sharper than an audio signal based on the feature amount; Second detection means for detecting, based on the feature amount, the last frame among the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section; .
 また、本発明の一態様に係る雑音抑圧装置は、衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する置換手段と、を備える。 Further, the noise suppression device according to one aspect of the present invention is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and The detection means for detecting the first section where the power exists in a wide band and the first information related to the frame different from the frame included in the first section are used to relate to the frame included in the first section. Replacement means for replacing the second information to be replaced with the first information or interpolating a frame included in the first section with information based on the first information.
 また、本発明の一態様に係る雑音抑圧装置は、衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、前記初めの区間の信号を、予め用意された所定の信号に置換する、または、削除する置換手段と、を備える。 Further, the noise suppression device according to one aspect of the present invention is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and Detection means for detecting an initial section in which the power exists in a wide band, and replacement means for replacing or deleting a signal in the first section with a predetermined signal prepared in advance.
 また、本発明の一態様に係る雑音検出方法は、衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出し、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出し、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する。 In addition, the noise detection method according to one aspect of the present invention provides, for each frame obtained by dividing a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound by dividing the acoustic signal into a predetermined time length. Calculated, based on the feature amount, detects a frame having a greater steep change of the signal than the audio signal as a start time of the impact sound section where the impact sound exists, and based on the feature amount, The last frame is detected as the end time of the impact sound section among the frames in which the signal change is sharper than the sound signal continuously from the start time.
 また、本発明の一態様に係る雑音抑圧方法は、衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出し、前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する。 Further, in the noise suppression method according to one aspect of the present invention, from an acoustic signal including an impact sound, the first interval of the impact sound, the power is greater than the subsequent interval following the first interval, and A second interval related to a frame included in the first interval is detected using a first information related to a frame different from a frame included in the initial interval, by detecting an initial interval in which the power exists in a wide band. Is replaced with the first information, or a frame included in the first section is interpolated with information based on the first information.
 なお、上記各装置または方法を、コンピュータによって実現するコンピュータプログラム、およびそのコンピュータプログラムが格納されている、コンピュータ読み取り可能な非一時的記録媒体も、本発明の範疇に含まれる。 Note that a computer program that realizes each of the above apparatuses or methods by a computer and a computer-readable non-transitory recording medium in which the computer program is stored are also included in the scope of the present invention.
 本発明によれば、音響信号から、衝撃音の区間をより好適に検出することができる。 According to the present invention, it is possible to more suitably detect the section of the impact sound from the acoustic signal.
衝撃音のスペクトログラムの一例を示す図である。It is a figure which shows an example of the spectrogram of an impact sound. 本発明の第1の実施の形態に係る雑音検出装置の機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a function structure of the noise detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第2の実施の形態に係る雑音検出装置の機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a function structure of the noise detection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第2の実施の形態に係る雑音検出装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the noise detection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第3の実施の形態に係る雑音抑圧装置の機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第3の実施の形態に係る雑音抑圧装置における置換部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the replacement part in the noise suppression apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第3の形態に係る雑音抑圧装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the noise suppression apparatus which concerns on the 3rd form of this invention. 本発明の第4の実施の形態に係る雑音抑圧装置の機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 4th Embodiment of this invention. 本発明の第5の実施の形態に係る雑音抑圧装置の機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 5th Embodiment of this invention. 本発明の第5の形態に係る雑音抑圧装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the noise suppression apparatus which concerns on the 5th form of this invention. 本発明の各実施の形態を実現可能なコンピュータのハードウェア構成を例示的に説明する図である。It is a figure explaining hardware configuration of a computer which can realize each embodiment of the present invention exemplarily.
 以下に、本発明の実施の形態について図面を参照して詳細に説明する。ただし、以下の実施の形態に記載されている構成要素はあくまで例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.
 まず、突発性雑音について説明する。突発性雑音は、例えば、衝撃音である。この衝撃音とは、物体と物体とが衝突した際に発生する物音や、爆発音、瞬間的に急激な力が物体に作用するときに生じる音である。また、本発明の各実施の形態における衝撃音は、上記に限定されず、例えば、拍手、コインが落ちる音、カスタネットをたたく音、割り箸をおる音、ガラス、プラスチック、金属、セラミック、木、陶器および缶を叩いたり、ぶつけたりする音などであってもよい。 First, sudden noise will be described. The sudden noise is, for example, an impact sound. The impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object. Further, the impact sound in each embodiment of the present invention is not limited to the above, for example, applause, the sound of falling coins, the sound of hitting a castanette, the sound of clap chopsticks, glass, plastic, metal, ceramic, wood, It may be a sound of hitting or hitting pottery and cans.
 この衝撃音について、図1を参照して説明する。図1は、衝撃音のスペクトログラムの一例を示す図である。図1において横軸は、時刻(秒)を示し、縦軸は、周波数(kHz)を示す。図1に示す通り、衝撃音は、信号のパワーが大きく且つパワーが広帯域に存在する区間と、それに続く、信号のパワーが小さく且つパワーが狭帯域に存在する区間と、を含む。本明細書では、この前者の区間を、打撃部または打撃部区間と呼び、後者の区間を減衰部または減衰部区間と呼ぶ。 This impact sound will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of a spectrogram of an impact sound. In FIG. 1, the horizontal axis indicates time (seconds), and the vertical axis indicates frequency (kHz). As shown in FIG. 1, the impact sound includes a section in which the signal power is large and the power is present in a wide band, and a section in which the signal power is small and the power is present in a narrow band. In the present specification, the former section is referred to as a hitting section or a hitting section, and the latter section is referred to as an attenuation section or an attenuation section.
 このように、衝撃音には打撃部区間と、減衰部区間が含まれる。打撃部区間は、上述したとおり、信号のパワーが大きく、広帯域に存在する。したがって、衝撃音全体の区間を検出してこの衝撃音全体の区間の雑音を抑圧する場合に比べ、上記打撃部区間を検出して、この打撃部区間の雑音を抑圧する方が、より音響信号の認識率を向上させることができる。なぜならば、よりパワーが大きい雑音を抑圧し、且つ、雑音抑圧処理を行う区間をより短くすることができるからである。 Thus, the impact sound includes the hitting section and the attenuation section. As described above, the hitting section has a large signal power and exists in a wide band. Therefore, compared with the case where the entire impact sound section is detected and the noise of the entire impact sound section is suppressed, it is more acoustic signal to detect the hitting section section and suppress the noise of the striking section section. Recognition rate can be improved. This is because noise with higher power can be suppressed, and a section in which noise suppression processing is performed can be shortened.
 したがって、以下では、衝撃音のうち、この打撃部区間を検出する方法について説明する。 Therefore, in the following, a method for detecting the hitting section of the impact sound will be described.
 <第1の実施の形態>
 本発明の第1の実施の形態について、図面を参照して説明する。本実施の形態では、本発明の課題を解決する基本の構成について説明する。図2は、本実施の形態に係る雑音検出装置10の機能構成の一例を示す機能ブロック図である。図2に示す通り、雑音検出装置10は、算出部11と、第1の検出部12と、第2の検出部13と、を備えている。
<First Embodiment>
A first embodiment of the present invention will be described with reference to the drawings. In this embodiment, a basic configuration for solving the problems of the present invention will be described. FIG. 2 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 10 according to the present embodiment. As shown in FIG. 2, the noise detection apparatus 10 includes a calculation unit 11, a first detection unit 12, and a second detection unit 13.
 算出部11は、衝撃音を含む音響信号から、該音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する。算出部11は、算出した特徴量を、第1の検出部12および第2の検出部13に出力する。 The calculation unit 11 calculates, from the acoustic signal including the impact sound, a feature amount indicating the steepness of the change of the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length. The calculation unit 11 outputs the calculated feature amount to the first detection unit 12 and the second detection unit 13.
 第1の検出部12は、算出部11からフレーム毎に算出された特徴量を受信する。そして、第1の検出部12は、受信した特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、音響信号のうち衝撃音が存在する区間である衝撃音区間の開始時刻として検出する。第1の検出部12は、検出した衝撃音区間の開始時刻を、第2の検出部13に出力する。 The first detection unit 12 receives the feature amount calculated for each frame from the calculation unit 11. Based on the received feature value, the first detection unit 12 selects a frame having a greater steep change of the signal than the audio signal in an impact sound section that is a section where the impact sound exists in the acoustic signal. Detect as start time. The first detection unit 12 outputs the detected start time of the impact sound section to the second detection unit 13.
 第2の検出部13は、算出部11からフレーム毎に算出された特徴量を受信する。また、第2の検出部13は、第1の検出部12から、衝撃音区間の開始時刻を受信する。第2の検出部13は、受信した特徴量に基づいて、開始時刻から継続して音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、衝撃音区間の終了時刻として検出する。 The second detection unit 13 receives the feature amount calculated for each frame from the calculation unit 11. The second detection unit 13 receives the start time of the impact sound section from the first detection unit 12. Based on the received feature value, the second detection unit 13 detects the last frame of the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section. To do.
 以上のように、本実施の形態に係る雑音検出装置10の第1の検出部12が、衝撃音区間の開始時刻を検出し、第2の検出部13が衝撃音区間の終了時刻を検出する。これにより、本実施の形態に係る雑音検出装置10は、音響信号のうち、衝撃音が存在する区間である衝撃音区間を検出することができる。 As described above, the first detection unit 12 of the noise detection apparatus 10 according to the present embodiment detects the start time of the impact sound interval, and the second detection unit 13 detects the end time of the impact sound interval. . Thereby, the noise detection apparatus 10 which concerns on this Embodiment can detect the impact sound area which is an area where an impact sound exists among acoustic signals.
 ここで、衝撃音区間は、信号のパワーが大きく且つパワーが広帯域に存在する打撃部区間であり、その区間では、音声のみが存在する音響信号や区間よりも信号の変化が急である。これにより、本実施の形態に係る雑音検出装置10は、より好適に、音響信号のうち衝撃音の打撃部区間を検出することができる。 Here, the impact sound section is a striking section section where the power of the signal is large and the power exists in a wide band, and in that section, the signal changes more rapidly than the acoustic signal or section where only the sound exists. Thereby, the noise detection apparatus 10 which concerns on this Embodiment can detect the impact part area of an impact sound among acoustic signals more suitably.
 <第2の実施の形態>
 次に、上述した第1の実施の形態を基本とする、本発明の第2の実施の形態について、図面を参照して説明する。図3は、本実施の形態に係る雑音検出装置100の機能構成の一例を示す機能ブロック図である。図3に示す通り、雑音検出装置100は、算出部110と、第1の検出部120と、第2の検出部130と、を備えている。
<Second Embodiment>
Next, a second embodiment of the present invention based on the above-described first embodiment will be described with reference to the drawings. FIG. 3 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 100 according to the present embodiment. As illustrated in FIG. 3, the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, and a second detection unit 130.
 (算出部110)
 算出部110は、図3に示す通り、変換部111と、指標算出部(直線性算出部)112とを備えている。また、変換部111は、フレーム分割部1111と、窓がけ処理部1112と、フーリエ変換部1113とを備えている。また、指標算出部112は、変化量算出部1121と、差分算出部1122と、特徴量算出部1123とを備えている。
(Calculation unit 110)
As shown in FIG. 3, the calculation unit 110 includes a conversion unit 111 and an index calculation unit (linearity calculation unit) 112. Further, the conversion unit 111 includes a frame division unit 1111, a windowing processing unit 1112, and a Fourier transform unit 1113. The index calculation unit 112 includes a change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123.
 (変換部111)
 変換部111のフレーム分割部1111は、例えば、雑音検出装置100の外部から音響信号(入力信号とも呼ぶ)を受信する。フレーム分割部1111は、受信した音響信号を、1フレームがKサンプルを含むフレームに分割する。ここで、Kは、正の偶数であるとする。フレーム分割部1111は、フレームに分割した音響信号である信号サンプルを窓がけ処理部1112に出力する。
(Conversion unit 111)
The frame division unit 1111 of the conversion unit 111 receives an acoustic signal (also referred to as an input signal) from the outside of the noise detection device 100, for example. The frame dividing unit 1111 divides the received acoustic signal into frames in which one frame includes K samples. Here, K is assumed to be a positive even number. The frame division unit 1111 outputs signal samples, which are acoustic signals divided into frames, to the windowing processing unit 1112.
 窓がけ処理部1112は、フレーム分割部1111から信号サンプルを受信する。窓がけ処理部1112は、受信した信号サンプルと、窓関数w(t)との乗算を行う。以下、窓関数と乗算を行うことを窓がけする、または、窓がけ処理を行うともいう。ここで、tは時間サンプルを示す。第nフレーム(nは、フレーム番号を示す0以上の自然数)の信号サンプルをx(t)(t=0,1,...,K-1)とすると、x(t)に対して窓関数w(t)で窓がけされた信号サンプル(窓がけ信号とも呼ぶ)は、以下の式(1)で算出することができる。 The window processing unit 1112 receives the signal sample from the frame division unit 1111. The windowing processing unit 1112 multiplies the received signal sample by the window function w (t). Hereinafter, performing multiplication with the window function is also referred to as windowing or windowing processing. Here, t indicates a time sample. If a signal sample of the nth frame (n is a natural number of 0 or more indicating a frame number) is x n (t) (t = 0, 1,..., K−1), x n (t) Thus, a signal sample (also referred to as a window signal) windowed by the window function w (t) can be calculated by the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 また、窓がけ処理部1112は、連続する2フレームの一部を重ね合わせ(オーバラップ)して窓がけしてもよい。 Further, the windowing processing unit 1112 may window by overlapping (overlapping) a part of two consecutive frames.
 なお、窓関数w(t)としては、例えば、以下の式(2)に示すハニング窓を用いることができる。 As the window function w (t), for example, a Hanning window represented by the following equation (2) can be used.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 このほかにも、窓がけ処理部1112は、ハミング窓、三角窓など、様々な窓関数を用いて窓がけしてもよい。窓がけ処理部1112は、窓がけ信号を、フーリエ変換部1113に出力する。 In addition to this, the windowing processing unit 1112 may window using various window functions such as a Hamming window and a triangular window. The windowing processing unit 1112 outputs the windowing signal to the Fourier transform unit 1113.
 フーリエ変換部1113は、窓がけ処理部1112から窓がけ信号を受信する。フーリエ変換部1113は、受信した窓がけ信号に対し、フーリエ変換する。フーリエ変換された窓がけ信号である信号スペクトルX(k)(kは周波数インデックス(k=1,2,...,N))は、以下の式(3)で表される。 The Fourier transform unit 1113 receives the windowing signal from the windowing processing unit 1112. The Fourier transform unit 1113 performs a Fourier transform on the received windowed signal. A signal spectrum X n (k) (k is a frequency index (k = 1, 2,..., N)), which is a Fourier-transformed window signal, is expressed by the following equation (3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 なお、上記式において、jは、虚数単位を表しており、|X(k)|は振幅スペクトルを示し、p(k)は位相スペクトルを示す。 In the above equation, j represents an imaginary unit, | X n (k) | represents an amplitude spectrum, and p n (k) represents a phase spectrum.
 フーリエ変換部1113は、信号スペクトルX(k)を、位相スペクトルp(k)と、振幅スペクトル|X(k)|とに分離する。フーリエ変換部1113は、信号スペクトルX(k)を分離することにより得られた位相スペクトルp(k)を、フレーム毎に、指標算出部112に出力する。なお、以下では、フーリエ変換部1113が、フレーム単位で出力する位相スペクトルを、位相成分信号とも呼ぶ。また、フーリエ変換部1113が、フレーム単位で出力する振幅スペクトルを、振幅成分信号とも呼ぶ。このように、窓がけ信号に対しフーリエ変換を行うことにより、フーリエ変換部1113は、音響信号から周波数領域における位相成分信号を抽出することができる。 The Fourier transform unit 1113 separates the signal spectrum X n (k) into a phase spectrum p n (k) and an amplitude spectrum | X n (k) |. The Fourier transform unit 1113 outputs the phase spectrum p n (k) obtained by separating the signal spectrum X n (k) to the index calculation unit 112 for each frame. Hereinafter, the phase spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as a phase component signal. The amplitude spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as an amplitude component signal. In this way, by performing Fourier transform on the window signal, the Fourier transform unit 1113 can extract the phase component signal in the frequency domain from the acoustic signal.
 なお、本実施の形態では、フーリエ変換部1113は、窓がけ信号をフーリエ変換することについて説明したが、本実施の形態はこれに限定されるものではない。フーリエ変換部1113は、窓がけ信号に対し、フーリエ変換ではなく、例えば、アダマール変換、ハール変換、ウェーブレット変換等を行ってもよい。 In the present embodiment, the Fourier transform unit 1113 has been described with respect to Fourier transform of the windowed signal, but the present embodiment is not limited to this. The Fourier transform unit 1113 may perform, for example, Hadamard transform, Haar transform, wavelet transform, or the like on the windowed signal instead of Fourier transform.
 (指標算出部112)
 次に、指標算出部112について説明する。指標算出部112の変化量算出部1121は、変換部111のフーリエ変換部1113から、フレーム毎に、位相成分信号を受信する。以下に説明する変化量算出部1121、差分算出部1122および特徴量算出部1123は、フレーム単位で処理を行う。
(Indicator calculation unit 112)
Next, the index calculation unit 112 will be described. The change amount calculation unit 1121 of the index calculation unit 112 receives the phase component signal from the Fourier transform unit 1113 of the conversion unit 111 for each frame. A change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123 described below perform processing in units of frames.
 変化量算出部1121は、受信した位相成分信号を用いて、隣接する周波数インデックス間(隣接周波数帯域)の位相差である、位相成分の変化量Δp(k)を以下の式(4)を用いて算出する。 Using the received phase component signal, the change amount calculation unit 1121 calculates a phase component change amount Δp n (k), which is a phase difference between adjacent frequency indexes (adjacent frequency bands), using the following equation (4). Use to calculate.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 そして、変化量算出部1121は、算出した位相成分の変化量Δp(k)を、差分算出部1122に出力する。 Then, the change amount calculation unit 1121 outputs the calculated change amount Δp n (k) of the phase component to the difference calculation unit 1122.
 差分算出部1122は、変化量算出部1121から位相成分の変化量Δp(k)を受信する。差分算出部1122は、受信した位相成分の変化量Δp(k)を用いて、隣接する周波数インデックス間における位相成分の変化量の変化量ΔΔp(k)を以下の式(5)を用いて算出する。 The difference calculation unit 1122 receives the phase component change amount Δp n (k) from the change amount calculation unit 1121. The difference calculation unit 1122 uses the received phase component variation Δp n (k) to calculate the phase component variation ΔΔp n (k) between adjacent frequency indexes using the following equation (5). To calculate.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 これにより、差分算出部1122は、周波数軸に沿った位相成分の変化量Δp(k)のばらつきを求めることができる。以下、位相成分の変化量の変化量ΔΔp(k)を、変化量差分ΔΔp(k)とも呼ぶ。 Thereby, the difference calculation unit 1122 can obtain the variation of the change amount Δp n (k) of the phase component along the frequency axis. Hereinafter, the change amount ΔΔp n (k) of the change amount of the phase component is also referred to as a change amount difference ΔΔp n (k).
 ある周波数インデックスにおける位相成分の変化量Δp(k)と、このある周波数インデックスに隣接する周波数インデックスにおける位相成分の変化量Δp(k-1)との差分が無い場合、つまり、周波数方向でp(k)が一次関数になる場合、変化量差分ΔΔp(k)は0になる。 When there is no difference between the phase component variation Δp n (k) at a certain frequency index and the phase component variation Δp n (k−1) at a frequency index adjacent to the certain frequency index, that is, in the frequency direction When p n (k) is a linear function, the change amount difference ΔΔp n (k) is zero.
 差分算出部1122は、算出した変化量差分ΔΔp(k)を、特徴量算出部1123に出力する。 The difference calculation unit 1122 outputs the calculated change amount difference ΔΔp n (k) to the feature amount calculation unit 1123.
 特徴量算出部1123は、差分算出部1122から変化量差分ΔΔp(k)を受信する。そして、特徴量算出部1123は、この変化量差分ΔΔp(k)を求める対象となったフレーム(この場合、第nフレーム)における、全周波数インデックスにおける変化量差分ΔΔp(k)の平均値を算出する。この算出した平均値は、該平均値を算出した対象となるフレームにおける位相特徴量である。また、この変化量差分ΔΔp(k)の平均値は、フレームにおける位相成分の変化量Δp(k)のばらつきの度合い(ばらつきを表す指標)であるとも言える。 The feature amount calculation unit 1123 receives the change amount difference ΔΔp n (k) from the difference calculation unit 1122. Then, the feature amount calculation unit 1123 averages the change amount differences ΔΔp n (k) in all frequency indexes in the frame (in this case, the nth frame) for which the change amount difference ΔΔp n (k) is obtained. Is calculated. The calculated average value is a phase feature amount in a frame for which the average value is calculated. Further, it can be said that the average value of the change amount difference ΔΔp n (k) is the degree of variation (index indicating variation) of the phase component change amount Δp n (k) in the frame.
 特徴量算出部1123による、変化量差分ΔΔp(k)の平均値の算出方法について、更に説明する。特徴量算出部1123は、以下の式(6)を用いて、変化量差分ΔΔp(k)の平均値である位相特徴量PLを算出する。 A method of calculating the average value of the change amount difference ΔΔp n (k) by the feature amount calculation unit 1123 will be further described. The feature amount calculation unit 1123 calculates the phase feature amount PL n that is an average value of the change amount difference ΔΔp n (k) using the following equation (6).
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 式(6)に示す通り、特徴量算出部1123は、変化量差分ΔΔp(k)のコサインを周波数インデックスの個数Nで割ることにより得られる平均値を、1から引いた値を算出し、算出した値を位相特徴量PLとする。なお、上述したとおり、位相特徴量PLは、フレームにおける位相成分の変化量Δp(k)のばらつきを表す指標であるため、指標PLとも呼ぶ。 As shown in Expression (6), the feature amount calculation unit 1123 calculates a value obtained by subtracting from 1 an average value obtained by dividing the cosine of the change amount difference ΔΔp n (k) by the number N of frequency indexes. The calculated value is defined as a phase feature amount PL n . Note that, as described above, the phase feature amount PL n is also an indicator representing the variation of the phase component variation Δp n (k) in the frame, and is also referred to as an indicator PL n .
 位相特徴量PLは、0から2までの値をとる。この位相特徴量PLは、位相スペクトルp(k)が直線にどれだけ近いかを表しているため、位相スペクトルp(k)の直線性を示す指標とも言える。周波数軸に沿った位相成分の変化量Δp(k)のばらつきが小さいほど、周波数軸方向でp(k)が一次関数に近くなる。つまり、p(k)は、直線性が高くなる。このとき、変化量差分ΔΔp(k)の平均値(位相特徴量PL)は、0により近い値をとる。このように、位相特徴量PLの値が0に近いほど、位相スペクトルp(k)の直線性が高いことがわかる。 The phase feature amount PL n takes a value from 0 to 2. The phase characteristic amount PL n is the phase spectrum p n (k) represents the how close to a straight line, it can be said that the index indicating the linearity of the phase spectrum p n (k). The smaller the variation of the phase component variation Δp n (k) along the frequency axis is, the closer p n (k) is to a linear function in the frequency axis direction. That is, p n (k) has high linearity. At this time, the average value (phase feature amount PL n ) of the change amount difference ΔΔp n (k) takes a value closer to 0. Thus, it can be seen that the closer the value of the phase feature amount PL n is to 0, the higher the linearity of the phase spectrum p n (k).
 また、特徴量算出部1123は、周波数軸に沿った位相成分の変化量Δp(k)のばらつきを表す指標として、平均値ではなく分散値を求めてもよい。この場合も、位相特徴量PLは、0により近い値のとき、位相スペクトルp(k)の直線性が高いことがわかる。 In addition, the feature amount calculation unit 1123 may obtain a variance value instead of the average value as an index representing the variation of the phase component variation Δp n (k) along the frequency axis. Also in this case, when the phase feature amount PL n is a value closer to 0, it can be seen that the linearity of the phase spectrum p n (k) is high.
 また、上記では、指標算出部112が、変化量差分ΔΔp(k)の平均値または分散値を算出することによって位相特徴量PLを求めることについて説明したが、本実施の形態は、これに限定されるものではない。指標算出部112は、位相スペクトルp(k)の回帰直線を求めて、その回帰直線からのずれを算出してもよい。これにより、指標算出部112は、上記回帰直線からのずれを、位相特徴量PLとして算出することができる。 Further, in the above description, the index calculation unit 112 has been described to obtain the phase feature amount PL n by calculating the average value or the variance value of the change amount difference ΔΔp n (k). It is not limited to. The index calculation unit 112 may obtain a regression line of the phase spectrum pn (k) and calculate a deviation from the regression line. Thereby, the index calculation unit 112 can calculate the deviation from the regression line as the phase feature amount PL n .
 なお、本実施の形態では、特徴量算出部1123は、周波数軸に沿った位相成分の変化量Δp(k)のばらつきを表す指標として、上述した位相特徴量PLを算出するとして説明した。この指標は、位相特徴量そのものであってもよいし、位相特徴量を含む情報であってもよい。 In the present embodiment, the feature amount calculation unit 1123 has been described as calculating the above-described phase feature amount PL n as an index representing the variation of the phase component variation Δp n (k) along the frequency axis. . This index may be the phase feature amount itself or information including the phase feature amount.
 指標算出部112の特徴量算出部1123は、算出した位相特徴量PLを、第1の検出部120および第2の検出部130に出力する。この指標算出部112が送信する位相特徴量PLには、該位相特徴量PLの算出の対象となったフレームを示す情報が関連付けられている。フレームを示す情報とは、例えば、フレーム番号である。本実施の形態では、位相特徴量PLにフレーム番号が関連付けられているとして説明を行う。 The feature amount calculation unit 1123 of the index calculation unit 112 outputs the calculated phase feature amount PL n to the first detection unit 120 and the second detection unit 130. Information indicating the frame for which the phase feature amount PL n is calculated is associated with the phase feature amount PL n transmitted by the index calculation unit 112. The information indicating the frame is, for example, a frame number. In the present embodiment, description will be made assuming that a frame number is associated with the phase feature amount PL n .
 (記憶部140)
 記憶部140には、閾値Thstart(第1の閾値)および閾値Thend(第2の閾値)が格納されている。この記憶部140は、雑音検出装置100内に内蔵されるものであってもよいし、雑音検出装置100とは別個の記憶装置で実現されるものであってもよい。また、閾値Thstartと、閾値Thendとは、異なる記憶部に格納されるものであってもよい。閾値Thstartと、閾値Thendとは、夫々、第1の検出部120および第2の検出部130内の図示しない記憶部に格納されるものであってもよい。
(Storage unit 140)
The storage unit 140 stores a threshold value Th start (first threshold value) and a threshold value Th end (second threshold value). The storage unit 140 may be built in the noise detection device 100 or may be realized by a storage device separate from the noise detection device 100. Further, the threshold value Th start and the threshold value Th end may be stored in different storage units. The threshold Th start and the threshold Th end may be stored in a storage unit (not shown) in the first detection unit 120 and the second detection unit 130, respectively.
 閾値Thstartは、後述する第1の検出部120が、第1の検出点を検出する際に用いる値である。第1の検出点は、音響信号のパワーが数ミリ秒~数十ミリ秒程度の短時間に、急激に大きくなり且つ広帯域に存在し始めた時点を示す。 The threshold value Th start is a value used when the first detection unit 120 described later detects the first detection point. The first detection point indicates a point in time when the power of the acoustic signal suddenly increases and starts to exist in a wide band in a short time of about several milliseconds to several tens of milliseconds.
 閾値Thendは、後述する第2の検出部130が、第2の検出点を検出する際に用いる値である。第2の検出点は、音響信号のパワーが、急激に小さくなり且つ狭帯域に存在し始めた時点を示す。第2の検出点は、上述した第1の検出点より後の時刻となる。 The threshold value Th end is a value used when the second detection unit 130 described later detects the second detection point. The second detection point indicates a point in time when the power of the acoustic signal suddenly decreases and begins to exist in a narrow band. The second detection point is a time after the first detection point described above.
 ここで、音響信号に音声のみが含まれる区間があり、その区間に対して、上述した位相特徴量PLが算出されているとする。この音声のみの区間に対する位相特徴量PLを、以下では、PLspeechと呼ぶ。なお、PLspeechは、音声のみの区間の位相特徴量PLに限定されず、例えば、大量の学習データの夫々から算出した位相特徴量PLの平均であってもよい。上記学習データは、例えば、音響のうち、衝撃音が存在しないフレームに対して算出した位相特徴量PLと、音声の区間を含むフレームに対して算出した位相特徴量PLとを含むデータであってもよいし、その他のデータであってもよい。 Here, it is assumed that there is a section in which only the sound is included in the acoustic signal, and the above-described phase feature amount PL n is calculated for the section. Hereinafter, the phase feature amount PL n for the voice-only section is referred to as PL speech . Note that PL speech is not limited to the phase feature amount PL n of the speech-only section, and may be, for example, the average of the phase feature amount PL n calculated from each of a large amount of learning data. The training data, for example, of the sound, the data comprising a phase characteristic amount PL n calculated for frames impact noise is not present, the phase characteristic amount PL n calculated for frames comprising a section of the speech There may be other data.
 また、PLspeechは、例えば、音響信号のうち、背景雑音の区間に対して予め算出された位相特徴量PLであってもよい。ここで、背景雑音とは、例えば、車両の走行音、空調機の音などの機械音、バブル雑音、およびこれらが複数重なる雑音等である。また、PLspeechは、例えば、ホワイトノイズ、ピンクノイズ等から算出された位相特徴量PLであってもよい。 The PL speech may be, for example, a phase feature amount PL n calculated in advance for a background noise section of an acoustic signal. Here, the background noise is, for example, a vehicle sound, a mechanical sound such as an air conditioner, a bubble noise, and a noise in which a plurality of these sounds overlap. The PL speech may be a phase feature amount PL n calculated from, for example, white noise, pink noise, or the like.
 ここで、音響信号の変化の度合い(急峻さ)を示す値について説明する。本実施の形態において、音響信号の変化の度合いを示す値は、変化の度合いが大きい(急峻さが大きい)ほど、小さい値を示すとする。 Here, a value indicating the degree of change (steepness) of the acoustic signal will be described. In the present embodiment, it is assumed that the value indicating the degree of change in the acoustic signal indicates a smaller value as the degree of change is larger (steepness is greater).
 音響信号の変化の度合いが大きいと、一般にその位相スペクトルp(k)の直線性は高くなり、上述の位相特徴量PLの値は小さくなる。したがって、位相特徴量は、音響信号の変化の度合い(急峻さ)を示す値となる。 When the degree of change of the acoustic signal is large, generally, the linearity of the phase spectrum p n (k) is high, and the value of the phase feature amount PL n is small. Therefore, the phase feature amount is a value indicating the degree of change (steepness) of the acoustic signal.
 一般に、衝撃音の打撃部の音響信号は、音声のみが存在する音響信号や音声のみが存在する区間の信号よりも変化が急である。また、打撃部の開始時刻の方が、終了時刻よりも信号の変化がより急である。したがって、位相特徴量PLは、音声信号よりも打撃部の終了時刻の方が小さく、打撃部の開始時刻ではそれよりもさらに小さい値となる。これにより、閾値Thstartと、閾値Thendと、PLspeechとは、以下の式(7)が成り立つ。 In general, the acoustic signal of the impact sound striking portion changes more rapidly than the acoustic signal in which only the sound exists or the signal in the section in which only the sound exists. In addition, the change in signal is more abrupt at the start time of the hitting portion than at the end time. Therefore, the phase characteristic amount PL n, smaller towards the end time of the striking part than the speech signal, a smaller value than the start time of the striking part. As a result, the following formula (7) is established for the threshold value Th start , the threshold value Th end, and the PL speech .
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 なお、閾値Thstartは、衝撃音を含む音響信号を学習データとして用いて算出された位相特徴量PLに基づいて、算出されたものであってもよい。また、0に近い任意の値、例えば、0.1を、閾値Thstartとしてもよい。 Note that the threshold Th start may be calculated based on the phase feature amount PL n calculated using an acoustic signal including an impact sound as learning data. Further, an arbitrary value close to 0, for example, 0.1 may be set as the threshold Th start .
 そして、閾値Thendは、予め算出された閾値Thstartと、PLspeechとを用いて、上記式(7)を満たすように算出される値である。閾値Thstartと、閾値ThendとをPLspeechより小さくすることにより、音声信号の変化、特に、子音における信号の変化による衝撃音の誤検出を防ぐことができる。 The threshold value Th end is a value calculated so as to satisfy the above formula (7) by using the previously calculated threshold value Th start and PL speech . By making the threshold value Th start and threshold value Th end smaller than PL speech , it is possible to prevent erroneous detection of an impact sound due to a change in an audio signal, particularly a signal change in a consonant.
 以上のように、記憶部140には、例えば、算出部110によって算出された、上記式(7)を満たす閾値Thstartと、閾値Thendとが予め格納されている。 As described above, in the storage unit 140, for example, the threshold value Th start and the threshold value Th end that are calculated by the calculation unit 110 and satisfy the equation (7) are stored in advance.
 (第1の検出部120)
 第1の検出部120は、算出部110から、位相特徴量PLを受信する。第1の検出部120は、受信した位相特徴量PLから、第1の検出点を検出する。具体的には、第1の検出部120は、受信した位相特徴量PLの値と、記憶部140に格納された閾値Thstartとを比較する。そして、位相特徴量PLが、閾値Thstartより小さいとき、すなわち、PL<Thstartを満たすとき、第1の検出部120は、該PLに関連付けられた、フレーム番号によって示されるフレームを、打撃部区間の開始フレームであると判定する。
(First detection unit 120)
The first detection unit 120 receives the phase feature amount PL n from the calculation unit 110. The first detection unit 120 detects a first detection point from the received phase feature quantity PL n . Specifically, the first detection unit 120 compares the value of the received phase feature quantity PL n with the threshold value Th start stored in the storage unit 140. When the phase feature amount PL n is smaller than the threshold value Th start , that is, when PL n <Th start is satisfied, the first detection unit 120 displays a frame indicated by the frame number associated with the PL n. It is determined that this is the start frame of the hitting section.
 そして、第1の検出部120は、開始フレームであると判定したフレームから、該フレームの時刻を示す時刻情報を取得する。ここで、第1の検出部120が取得する時刻情報とは、フレーム番号であってもよいし、フレームの開始時刻であってもよいし、該フレームに含まれるその他の時刻であってもよい。第1の検出部120は、取得した時刻情報によって示されるフレーム番号または時刻を第1の検出点として検出する。なお、以下では、第1の検出点は、フレーム番号として説明を行う。 Then, the first detection unit 120 acquires time information indicating the time of the frame from the frame determined to be the start frame. Here, the time information acquired by the first detection unit 120 may be a frame number, a start time of the frame, or other time included in the frame. . The first detection unit 120 detects a frame number or time indicated by the acquired time information as a first detection point. In the following description, the first detection point is described as a frame number.
 音響信号が、音声信号や背景雑音に衝撃音が重畳されていない信号である場合、つまり、音響信号が音声信号や背景雑音の場合、位相スペクトルp(k)は、直線にはならない。そのため、このような音響信号における位相特徴量PLの値は、音声信号や背景雑音に衝撃音が重畳されている場合の位相特徴量PLより大きい値を取る。 When the acoustic signal is a signal in which the impact sound is not superimposed on the voice signal or background noise, that is, when the acoustic signal is the voice signal or background noise, the phase spectrum pn (k) does not become a straight line. Therefore, the value of the phase feature amount PL n in such an acoustic signal is larger than the phase feature amount PL n when the impact sound is superimposed on the audio signal or background noise.
 このように、第1の検出部120は、位相特徴量PLと閾値Thstartとを比較することにより、衝撃音の打撃部区間の開始時刻を判定することができる。 As described above, the first detection unit 120 can determine the start time of the impact sound hitting section by comparing the phase feature amount PL n and the threshold Th start .
 そして、第1の検出部120は、検出した第1の検出点を表す時刻情報を、打撃部区間の開始時刻情報として、第2の検出部130に出力する。上述したとおり、本実施の形態において、第1の検出点は、フレーム番号であるため、打撃部区間の開始時刻情報は、フレーム番号を表している。なお、第1の検出点を表す時刻情報は、打撃部区間の開始時刻情報であることから、以降、第1の検出点を、打撃部区間の開始時刻とも呼ぶ。 And the 1st detection part 120 outputs the time information showing the detected 1st detection point to the 2nd detection part 130 as start time information of a hit | damage part area. As described above, in the present embodiment, since the first detection point is the frame number, the start time information of the hitting section represents the frame number. Since the time information indicating the first detection point is the start time information of the hitting section, the first detection point is hereinafter also referred to as the start time of the hitting section.
 (第2の検出部130)
 第2の検出部130は、算出部110から、位相特徴量PLを受信する。また、第2の検出部130は、第1の検出部120から打撃部区間の開始時刻情報を受信する。そして、第2の検出部130は、受信した位相特徴量PLと、打撃部区間の開始時刻情報とに基づいて、第2の検出点を検出する。
(Second detection unit 130)
The second detection unit 130 receives the phase feature amount PL n from the calculation unit 110. Further, the second detection unit 130 receives the start time information of the hitting section from the first detection unit 120. Then, the second detector 130, a phase characteristic amount PL n received, based on the start time information of the striking part section, detects the second detection point.
 具体的には、第2の検出部130は、受信した位相特徴量PLに関連付けられた打撃部区間の開始時刻情報が表すフレーム番号より、時間的に後のフレームに対して算出された位相特徴量PLと記憶部140に格納された閾値Thendとを比較する。そして、第2の検出部130は、位相特徴量PLの値が閾値Thendより大きい、すなわち、Thend<PLを満たすか否かを判定し、位相特徴量PLが閾値Thendより大きいとき、該PLに関連付けられた、フレーム番号によって示されるフレームを特定する。そして、第2の検出部130は、この特定されたフレームの1つ前のフレームを、打撃部区間の終了フレームであると判定する。 Specifically, the second detection unit 130 calculates the phase calculated with respect to the frame that is temporally later than the frame number represented by the start time information of the hitting unit section associated with the received phase feature amount PL n. The feature amount PL n is compared with the threshold value Th end stored in the storage unit 140. Then, the second detection unit 130 determines whether or not the value of the phase feature quantity PL n is larger than the threshold value Th end , that is, whether Th end <PL n is satisfied, and the phase feature quantity PL n is greater than the threshold value Th end . When larger, the frame indicated by the frame number associated with the PL n is specified. Then, the second detection unit 130 determines that the frame immediately before the identified frame is the end frame of the hitting section.
 そして、第2の検出部130は、終了フレームであると判定したフレームから、該フレームの時刻を示す時刻情報を取得する。ここで、第2の検出部130が取得する時刻情報とは、フレーム番号であってもよいし、フレームの終了時刻であってもよいし、該フレームに含まれるその他の時刻であってもよい。第2の検出部130は、取得した時刻情報によって示されるフレーム番号または時刻を第2の検出点として検出する。なお、以下では、第2の検出点は、フレーム番号として説明を行う。 Then, the second detection unit 130 acquires time information indicating the time of the frame from the frame determined to be the end frame. Here, the time information acquired by the second detection unit 130 may be a frame number, a frame end time, or another time included in the frame. . The second detection unit 130 detects a frame number or time indicated by the acquired time information as a second detection point. In the following description, the second detection point is described as a frame number.
 音響信号のうち、打撃部区間では信号の変化が急峻な状態が継続し、位相特徴量PLの値は低い状態が続く。そして、打撃部区間が終わると、音響信号の変化が緩やかになり、位相特徴量の値が大きくなる。これにより、第2の検出部130は、判定対象のフレームが打撃部区間終了直後の箇所であると特定することができ、その1つ前のフレームを打撃部区間の終了箇所と判定することができる。 Of the acoustic signals, the hitting portion interval signal change continues steep state, the value of the phase characteristic amount PL n is low condition persists. Then, when the hitting section ends, the change in the acoustic signal becomes gradual and the value of the phase feature amount increases. Thereby, the 2nd detection part 130 can specify that the flame | frame of determination object is a location immediately after completion | finish of a hit | damage part area, and can determine the 1st previous frame as an end point of a hit | damage part area. it can.
 そして、第2の検出部130は、検出した第2の検出点を表す時刻情報を、打撃部区間の終了時刻情報とする。なお、第2の検出点を、打撃部区間の終了時刻とも呼ぶ。 And the 2nd detection part 130 makes the time information showing the detected 2nd detection point the end time information of a hit | damage part area. The second detection point is also referred to as the end time of the hitting section.
 なお、第2の検出部130は、上記により検出した終了時刻と、開始時刻から任意の時間経過後(例えば、1秒後)の時刻のうち、早い方を終了時刻としてもよい。これは、実環境では残響等の影響を受け、第2の検出部130が検出する終了時刻が、開始時刻からかなり遅れた時刻になる場合があるからである。一般的に、衝撃音の打撃部区間は、1秒以下となることが多い。したがって、第2の検出部130は、開始時刻から、例えば、1秒後に、終了時刻が検出されていない場合は、その時刻を終了時刻としてもよい。これにより、雑音検出装置100は、衝撃音の打撃部区間が長くなることによる、音声認識の誤認識を減らすことができる。 Note that the second detection unit 130 may set the earlier one of the end time detected as described above and the time after an arbitrary time has elapsed from the start time (for example, one second later) as the end time. This is because in an actual environment, the end time detected by the second detection unit 130 may be considerably delayed from the start time due to the influence of reverberation and the like. Generally, the impact sound hitting section is often 1 second or less. Therefore, when the end time is not detected, for example, after 1 second from the start time, the second detection unit 130 may set the end time as the end time. Thereby, the noise detection apparatus 100 can reduce the misrecognition of the speech recognition due to the impact sound hitting section becoming longer.
 そして、第2の検出部130は、第1の検出部120から受信した第1の検出点によって示される打撃部区間の開始時刻と、第2の検出点によって示される打撃部区間の終了時刻とから、打撃部区間を決定する。 And the 2nd detection part 130 is the start time of the hit | damage part area shown by the 1st detection point received from the 1st detection part 120, and the end time of the hit | damage part area shown by the 2nd detection point. From this, the hitting section is determined.
 (雑音検出装置100の動作)
 次に、本実施の形態に係る雑音検出装置100の動作について説明する。図4は、本実施の形態に係る雑音検出装置100の動作の一例を示すフローチャートである。
(Operation of the noise detection apparatus 100)
Next, the operation of the noise detection apparatus 100 according to the present embodiment will be described. FIG. 4 is a flowchart showing an example of the operation of the noise detection apparatus 100 according to the present embodiment.
 算出部110の変換部111のフレーム分割部1111が、音響信号を所定の時間長のフレームに分割する(ステップS41)。そして、雑音検出装置100は、初期値として、flagを0とし、nを0とする(ステップS42)。flagは0または1の値を取る。nはフレーム番号を示す変数であり、ステップS41において分割された数(DIVとする)から1を引いた数を上限とする。 The frame dividing unit 1111 of the converting unit 111 of the calculating unit 110 divides the acoustic signal into frames having a predetermined time length (step S41). The noise detection apparatus 100 sets flag to 0 and n to 0 as initial values (step S42). flag takes a value of 0 or 1. n is a variable indicating a frame number, and the upper limit is a number obtained by subtracting 1 from the number divided in step S41 (denoted as DIV).
 次に、変換部111の窓がけ処理部1112が、分割されたフレームに含まれる信号サンプルに対し、窓がけ処理を行う(ステップS43)。その後、変換部111のフーリエ変換部1113が、フレーム毎に窓がけされた信号サンプルに、フーリエ変換を施すことにより、位相スペクトルp(k)を算出する(ステップS44)。 Next, the windowing processing unit 1112 of the conversion unit 111 performs windowing processing on the signal samples included in the divided frames (step S43). After that, the Fourier transform unit 1113 of the transform unit 111 calculates the phase spectrum pn (k) by performing Fourier transform on the signal sample that has been windowed for each frame (step S44).
 その後、算出部110の指標算出部112の変化量算出部1121が、位相成分の変化量Δp(k)を算出する(ステップS45)。そして、指標算出部112の差分算出部1122が、位相成分の変化量の変化量である、変化量差分ΔΔp(k)を算出する(ステップS46)。その後、指標算出部112の特徴量算出部1123が、位相スペクトルp(k)の直線性を示す指標である、位相特徴量PLを算出する(ステップS47)。 Thereafter, the change amount calculation unit 1121 of the index calculation unit 112 of the calculation unit 110 calculates the change amount Δp n (k) of the phase component (step S45). Then, the difference calculation unit 1122 of the index calculation unit 112 calculates a change amount difference ΔΔp n (k) that is a change amount of the change amount of the phase component (step S46). Thereafter, the feature amount calculation unit 1123 of the index calculation unit 112 calculates a phase feature amount PL n that is an index indicating the linearity of the phase spectrum p n (k) (step S47).
 そして、第1の検出部120は、flagが0か否かを判定し(ステップS48)、flagが0ではない場合(ステップS48にてNO)、処理をステップS54に進める。flagが0の場合(ステップS48にてYES)、処理をステップS49に進める。このflagは、打撃部区間の開始時刻が検出されたか否かを示すものであり、0の場合、検出されていないことを示し、1の場合、検出されたことを示している。 Then, the first detection unit 120 determines whether or not the flag is 0 (step S48). If the flag is not 0 (NO in step S48), the process proceeds to step S54. If flag is 0 (YES in step S48), the process proceeds to step S49. This flag indicates whether or not the start time of the hitting section is detected. When it is 0, it indicates that it is not detected, and when it is 1, it indicates that it is detected.
 ステップS48にてYESの場合、つまり、打撃部区間の開始時刻が検出されていない場合、第1の検出部120は、ステップS47で算出された位相特徴量PLが、閾値Thstartより小さいか否かを判定する(ステップS49)。位相特徴量PLが、閾値Thstart以上のとき(ステップS49にてNO)、雑音検出装置100は、nをインクリメントし(ステップS52)、インクリメントしたnがDIVより小さいか否かを判定する(ステップS53)。nがDIV以上の場合、(ステップS53にてNO)、雑音検出装置100は、処理を終了する。nがDIVより小さい場合(ステップS53にてYES)、雑音検出装置100は、処理をステップS43に戻す。そして、雑音検出装置100は、次のフレームに対し、ステップS43からステップS48の処理を実行する。 If YES in step S48, that is, if the start time of the hitting section has not been detected, the first detection unit 120 determines whether the phase feature amount PL n calculated in step S47 is smaller than the threshold Th start . It is determined whether or not (step S49). When phase feature amount PL n is equal to or greater than threshold value Th start (NO in step S49), noise detection apparatus 100 increments n (step S52) and determines whether or not incremented n is smaller than DIV (step S52). Step S53). If n is greater than or equal to DIV (NO in step S53), noise detection apparatus 100 ends the process. If n is smaller than DIV (YES in step S53), noise detection apparatus 100 returns the process to step S43. And the noise detection apparatus 100 performs the process of step S43 to step S48 with respect to the following flame | frame.
 位相特徴量PLが、閾値Thstartより小さいとき(ステップS49にてYES)、第1の検出部120は、位相特徴量PLに関連付けられた、フレーム番号によって示されるフレームを、打撃部区間の開始フレーム(開始時刻)として検出する(ステップS50)。そして、雑音検出装置100は、flagを1とする(ステップS51)。そして、雑音検出装置100は、処理をステップS52に進める。そして、雑音検出装置100は、nをインクリメントし(ステップS52)、インクリメントしたnがDIVより小さい場合(ステップS53にてYES)、次のフレームに対し、ステップS43からステップS48の処理を実行する。 When the phase feature amount PL n is smaller than the threshold value Th start (YES in step S49), the first detection unit 120 detects the frame indicated by the frame number associated with the phase feature amount PL n as the hitting unit section. Is detected as a start frame (start time) (step S50). And the noise detection apparatus 100 sets flag to 1 (step S51). And the noise detection apparatus 100 advances a process to step S52. Then, noise detection apparatus 100 increments n (step S52), and when incremented n is smaller than DIV (YES in step S53), processing from step S43 to step S48 is executed for the next frame.
 ステップS48にてNOの場合、つまり、打撃部区間の開始時刻が検出されている場合、第2の検出部130は、ステップS47で算出された位相特徴量PLが、閾値Thendより大きいか否かを判定する(ステップS54)。位相特徴量PLが、閾値Thend以下のとき(ステップS54にてNO)、雑音検出装置100は、処理をステップS52に進める。 If NO in step S48, that is, if the start time of the hitting section is detected, the second detection unit 130 determines whether the phase feature amount PL n calculated in step S47 is greater than the threshold Th end . It is determined whether or not (step S54). When phase feature amount PL n is equal to or smaller than threshold value Th end (NO in step S54), noise detection apparatus 100 advances the process to step S52.
 位相特徴量PLが、閾値Thendより大きいとき(ステップS54にてYES)、第2の検出部130は、位相特徴量PLに関連付けられた、フレーム番号によって示されるフレームの1つ前のフレームを、打撃部区間の終了フレーム(終了時刻)として検出する(ステップS55)。 When the phase feature amount PL n is larger than the threshold value Th end (YES in step S54), the second detection unit 130 relates to the phase feature amount PL n and is one frame before the frame indicated by the frame number. The frame is detected as the end frame (end time) of the hitting section (step S55).
 そして、第2の検出部130は、ステップS52で第1の検出部120が検出した打撃部区間の開始時刻と、ステップS55にて検出した打撃部区間の終了時刻とから、打撃部区間を決定する(ステップS56)。そして、雑音検出装置100は、flagに0を代入し(ステップS57)、処理をステップS52に進める。 And the 2nd detection part 130 determines a hit | damage part area from the start time of the hit | damage part area which the 1st detection part 120 detected in step S52, and the end time of the hit | damage part area detected in step S55. (Step S56). And the noise detection apparatus 100 substitutes 0 to flag (step S57), and advances a process to step S52.
 なお、雑音検出装置100は、音響信号を逐次受信して、リアルタイムで雑音検出処理を行ってもよい。そして、雑音検出装置100は上記処理を行っていない音響信号が無い場合に、処理を終了してもよい。 Note that the noise detection apparatus 100 may sequentially receive acoustic signals and perform noise detection processing in real time. And the noise detection apparatus 100 may complete | finish a process, when there is no acoustic signal which has not performed the said process.
 (効果)
 以上のように、本実施の形態に係る雑音検出装置100によれば、上述した第1の実施の形態に係る雑音検出装置10と同様の効果を奏することができる。
(effect)
As described above, according to the noise detection apparatus 100 according to the present embodiment, the same effects as those of the noise detection apparatus 10 according to the first embodiment described above can be achieved.
 また、本実施の形態に係る雑音検出装置100によれば、第1の検出部120が、算出部110が算出した特徴量と、第1の閾値とを比較する。そして、第1の検出部120は、特徴量によって表される音響信号の変化の急峻さが、第1の閾値によって表される音響信号の変化の急峻さよりも大きい場合に、特徴量を算出したフレームを、衝撃音区間の開始時刻として検出する。また、第2の検出部130が、算出部110が算出した特徴量と、第2の閾値とを比較する。この第2の閾値は、第1の閾値によって表される音響信号の変化の急峻さよりも、小さい急峻さを表すものである。第2の検出部130は、特徴量によって表される音響信号の変化の急峻さが、第2の閾値によって表される音響信号の変化の急峻さ以下の場合に、特徴量を算出したフレームの1つ前のフレームを、衝撃音区間の終了時刻として検出する。 Moreover, according to the noise detection apparatus 100 according to the present embodiment, the first detection unit 120 compares the feature amount calculated by the calculation unit 110 with the first threshold value. Then, the first detection unit 120 calculates the feature amount when the steepness of the change in the acoustic signal represented by the feature amount is larger than the steepness of the change in the acoustic signal represented by the first threshold value. The frame is detected as the start time of the impact sound section. Further, the second detection unit 130 compares the feature amount calculated by the calculation unit 110 with the second threshold value. The second threshold value represents a steepness smaller than the steepness of the change in the acoustic signal represented by the first threshold value. When the steepness of the change in the acoustic signal represented by the feature amount is equal to or less than the steepness of the change in the acoustic signal represented by the second threshold, the second detection unit 130 The previous frame is detected as the end time of the impact sound section.
 これにより、雑音検出装置100は、音響信号の急峻な変化が開始する時点と終了する時点とを、より精度よく検出することができる。この音響信号の急峻な変化が開始する時点とは、図1に示した打撃部区間の開始時刻に相当する。また、音響信号の急峻な変化が終了する時点とは、打撃部区間の終了時刻に相当する。したがって、雑音検出装置100は、衝撃音のうち、図1に示した打撃部区間の開始時刻と終了時刻とをより精度よく検出することができる。 Thereby, the noise detection apparatus 100 can detect the time when the steep change of the acoustic signal starts and the time when it ends more accurately. The point in time when the abrupt change of the acoustic signal starts corresponds to the start time of the hitting section shown in FIG. Moreover, the time when the steep change of the acoustic signal ends corresponds to the end time of the hitting section. Therefore, the noise detection apparatus 100 can more accurately detect the start time and the end time of the hitting section shown in FIG. 1 among the impact sounds.
 また、算出部110が位相スペクトルの直線性を表す指標を位相特徴量として算出することにより、雑音検出装置100は、打撃部区間の開始時刻および終了時刻を検出することができる。これにより、雑音検出装置100は、より好適に、打撃部区間を決定することができる。 Further, when the calculation unit 110 calculates an index representing the linearity of the phase spectrum as the phase feature amount, the noise detection apparatus 100 can detect the start time and the end time of the hitting section. Thereby, the noise detection apparatus 100 can determine a hit | damage part area more suitably.
 このようにして決定した打撃部区間を抑圧することで、本実施の形態に係る雑音検出装置100は、音声認識を行った際の認識性能をより向上させることができる。例えば、店舗の窓口等で、店員が客を接客しているときの音声を認識する場面について説明する。この場面において、店員を目的話者とし、店員の音声を目的音声としたときに、窓口では、店員は、客にカタログを見せながら話したり、客情報入力のためにキーボードやマウス操作したりしながら、接客し、発話する場合がある。この場合、目的音声に物音や目的話者が発生させた作業音が重畳されるため、収集した音響信号における音声の認識精度が低くなる可能性がある。 By suppressing the hitting section determined in this way, the noise detection apparatus 100 according to the present embodiment can further improve the recognition performance when performing speech recognition. For example, a scene where a voice is recognized when a store clerk is serving a customer at a store window or the like will be described. In this situation, when the store clerk is the target speaker and the store clerk's voice is the target voice, the store clerk talks while showing the catalog to the customer, or operates the keyboard and mouse to enter customer information. However, there are cases where customers come in and speak. In this case, since the object sound and the work sound generated by the target speaker are superimposed on the target voice, the voice recognition accuracy in the collected acoustic signal may be lowered.
 しかしながら、本実施の形態に係る雑音検出装置100を適用することにより、作業音等の衝撃音の区間、特に、打撃部区間を決定することができるため、雑音を抑圧する装置が、この決定した区間に対する雑音を抑圧する処理を行うことができる。これにより、雑音を抑圧した音声を抽出することができるため、この音声に対する認識精度を向上させることができる。このように、目的話者が発生させる作業音等は、音声を収集する場面において起こりうる雑音であるため、雑音検出装置100による打撃部区間の決定は、雑音の検出分野に好適に適用することができる。 However, by applying the noise detection apparatus 100 according to the present embodiment, it is possible to determine a section of impact sound such as work sound, in particular, a hitting section, and thus the apparatus for suppressing noise is determined. It is possible to perform processing for suppressing noise in the section. As a result, it is possible to extract a voice with suppressed noise, and thus it is possible to improve the recognition accuracy for this voice. As described above, since the working sound generated by the target speaker is noise that may occur in a scene where voice is collected, the determination of the hitting section by the noise detection device 100 should be preferably applied to the noise detection field. Can do.
 また、雑音抑圧により、例えば、マイクに触れる音や大きな衝撃音を減らすことで、収集した音声を利用者が視聴する場合においても適用可能である。 In addition, the present invention can be applied to cases where the user views the collected sound by reducing noise, for example, touching a microphone or a large impact sound by noise suppression.
 また、例えば、衝撃音は、ドアの開閉音や拍手等の音であるため、本実施の形態に係る雑音検出装置100によれば、ドアの開閉や拍手といったイベントを検出することができる。 Further, for example, since the impact sound is a sound such as door opening / closing sound and applause, the noise detection apparatus 100 according to the present embodiment can detect an event such as door opening / closing and applause.
 以上に述べた本実施の形態に係る雑音検出装置100は、例えば、目的話者の音声を取りたい場合に話者が発生させる物音等の雑音を抑圧する区間の検出に適用できる。加えて、雑音検出装置100は、音声、音楽等の目的とする信号と、これらに重畳された雑音信号とから、重畳された雑音を抑圧することができる。また、雑音検出装置100は、その他にも、入力信号に急変する区間が含まれるか否かの判定を要求されるあらゆる信号処理装置に適用可能である。 The noise detection apparatus 100 according to the present embodiment described above can be applied to, for example, detection of a section that suppresses noise such as a sound generated by a speaker when the target speaker's voice is desired. In addition, the noise detection apparatus 100 can suppress the superimposed noise from the target signals such as voice and music and the noise signal superimposed on them. In addition, the noise detection apparatus 100 can be applied to any other signal processing apparatus that is required to determine whether or not an input signal includes a rapidly changing section.
 (変形例)
 また、第2の実施の形態では、特徴量算出部1123は、特徴量として、変化量差分ΔΔp(k)の平均値を算出することによって位相特徴量PLを求めることについて説明した。本変形例では、特徴量算出部1123が、特徴量として、変化量差分ΔΔp(k)の分布を算出することによって位相特徴量PLを求めることについて説明する。
(Modification)
In the second embodiment, the feature amount calculation unit 1123 has been described as obtaining the phase feature amount PL n by calculating the average value of the change amount difference ΔΔp n (k) as the feature amount. In the present modification, description will be given of the case where the feature quantity calculating unit 1123 obtains the phase feature quantity PL n by calculating the distribution of the change amount difference ΔΔp n (k) as the feature quantity.
 まず、特徴量算出部1123は、差分算出部1122から受信した変化量差分ΔΔp(k)を用いて、変化量差分ΔΔp(k)を算出したフレームにおけるヒストグラムを求める。このとき、特徴量算出部1123は、変化量差分ΔΔp(k)の値をビンとして、ヒストグラムを求める。 First, the feature amount calculation unit 1123 uses the change amount difference ΔΔp n (k) received from the difference calculation unit 1122 to obtain a histogram in the frame in which the change amount difference ΔΔp n (k) is calculated. At this time, the feature amount calculation unit 1123 obtains a histogram using the value of the change amount difference ΔΔp n (k) as a bin.
 これにより、変化量差分ΔΔp(k)が小さい値に分布が偏っている場合、特徴量算出部1123は、位相スペクトルの直線性が高いと判定することができる。そして、特徴量算出部1123は、このヒストグラムに基づいて指標PLを算出してもよい。 Accordingly, when the distribution difference ΔΔp n (k) is biased to a small value, the feature amount calculation unit 1123 can determine that the linearity of the phase spectrum is high. Then, the feature amount calculation unit 1123 may calculate the index PL n based on this histogram.
 また、特徴量算出部1123は、任意の周波数インデックスの範囲、例えば、k-100~k-1,k~k+99等を定め、周波数インデックスの値をビンとして、変化量差分ΔΔp(k)の分布を求めてもよい。そして、特徴量算出部1123は、この分布に基づいて、分布間距離を算出し、指標PLを算出してもよい。 In addition, the feature amount calculation unit 1123 determines an arbitrary frequency index range, for example, k-100 to k-1, k to k + 99, etc., and uses the frequency index value as a bin to calculate the change amount difference ΔΔp n (k). A distribution may be obtained. Then, the feature quantity calculation unit 1123 may calculate the inter-distribution distance based on this distribution and calculate the index PL n .
 このように、特徴量算出部1123は、変化量差分ΔΔp(k)の平均値に基づいた特徴量ではなく、分布に基づいた特徴量を算出することができる。そして、雑音検出装置100はこのように算出された特徴量を用いても、好適に、打撃部区間を決定することができる。 As described above, the feature amount calculation unit 1123 can calculate the feature amount based on the distribution, not the feature amount based on the average value of the change amount difference ΔΔp n (k). And the noise detection apparatus 100 can determine a hit | damage part area suitably also using the feature-value calculated in this way.
 <第3の実施の形態>
 次に、本発明の第3の実施の形態について、図面を参照して説明する。図5は、本実施の形態に係る雑音抑圧装置200の機能構成の一例を示す機能ブロック図である。なお、説明の便宜上、前述した第2の実施の形態で説明した図面に含まれる部材と同じ機能を有する部材については、同じ符号を付し、その説明を省略する。
<Third Embodiment>
Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a functional block diagram showing an example of a functional configuration of the noise suppression apparatus 200 according to the present embodiment. For convenience of explanation, members having the same functions as those included in the drawings described in the second embodiment described above are given the same reference numerals, and descriptions thereof are omitted.
 図5に示す通り、雑音抑圧装置200は、第1の実施の形態において説明した雑音検出装置10または第2の実施の形態において説明した雑音検出装置100と、置換部210とを備えている。 As shown in FIG. 5, the noise suppression apparatus 200 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, and a replacement unit 210.
 雑音検出装置10および雑音検出装置100の機能構成は、図2および図3を用いて説明した機能構成と同様であるため、説明を省略する。以下では、雑音抑圧装置200が雑音検出装置100を備えるとして説明するが、雑音抑圧装置200が雑音検出装置10を備える構成であってもよいことは言うまでもない。 The functional configurations of the noise detection device 10 and the noise detection device 100 are the same as the functional configuration described with reference to FIG. 2 and FIG. In the following description, it is assumed that the noise suppression device 200 includes the noise detection device 100, but it goes without saying that the noise suppression device 200 may be configured to include the noise detection device 10.
 雑音検出装置100は、打撃部区間を示す情報を、置換部210に出力する。具体的には、雑音検出装置100は、第1の検出部120が算出した開始時刻と、第2の検出部130が算出した終了時刻とを示す情報を、打撃部区間を示す情報として、該打撃部区間を決定する対象となった音響信号を示す情報と共に置換部210に出力する。 The noise detection apparatus 100 outputs information indicating the hitting section to the replacing unit 210. Specifically, the noise detection apparatus 100 uses the information indicating the start time calculated by the first detection unit 120 and the end time calculated by the second detection unit 130 as information indicating the hitting section. It outputs to the substitution part 210 with the information which shows the acoustic signal used as the object which determines an impact part area.
 置換部210は、雑音検出装置100から打撃部区間を示す情報を、音響信号を示す情報と共に受信する。そして、置換部210は、受信した音響信号を示す情報が表す音響信号を、例えば、雑音抑圧装置200の外部から受信する。そして、置換部210は、受信した音響信号の時間情報と、雑音検出装置100から受信した打撃部区間を示す情報が表す時間情報とを対応付け、受信した音響信号における、打撃部区間の開始時刻と、終了時刻とを特定する。 The replacement unit 210 receives information indicating the hitting section from the noise detection device 100 together with information indicating the acoustic signal. Then, the replacement unit 210 receives an acoustic signal represented by information indicating the received acoustic signal, for example, from the outside of the noise suppression device 200. Then, the replacement unit 210 associates the time information of the received acoustic signal with the time information represented by the information indicating the hitting section interval received from the noise detection apparatus 100, and the start time of the hitting section in the received acoustic signal. And the end time.
 そして、置換部210は、特定した開始時刻によって示されるフレームの直前のフレームの信号を用いて、打撃部区間に含まれるフレームの信号をこの直前フレームの信号に置換する。 Then, the replacement unit 210 replaces the signal of the frame included in the hitting section with the signal of the immediately preceding frame using the signal of the frame immediately before the frame indicated by the specified start time.
 置換部210の動作を、図6を用いて更に説明する。図6は、置換部210の動作を説明するための図である。図6に示す横軸は、フレーム番号を示し、縦軸は周波数(kHz)を示す。また、図6の上部の図は、置換前の音響信号を示し、図6の下部の図は、置換後の音響信号を示している。 The operation of the replacement unit 210 will be further described with reference to FIG. FIG. 6 is a diagram for explaining the operation of the replacement unit 210. The horizontal axis shown in FIG. 6 indicates the frame number, and the vertical axis indicates the frequency (kHz). The upper diagram in FIG. 6 shows the acoustic signal before replacement, and the lower diagram in FIG. 6 shows the acoustic signal after replacement.
 図6の上部に示す通り、例えば、雑音検出装置100が決定した打撃部区間が、第nフレームから第n+1フレームまでの区間であるとする。これにより、打撃部区間の開始時刻は、第nフレームであることがわかり、終了時刻は第n+1フレームであることがわかる。 As shown in the upper part of FIG. 6, for example, the hitting section determined by the noise detection device 100 is a section from the nth frame to the n + 1th frame. Thereby, it can be seen that the start time of the hitting section is the nth frame and the end time is the (n + 1) th frame.
 そして、置換部210は、第nフレームから第n+1フレームの信号サンプルであるx(t)およびxn+1(t)を、開始時刻の直前のフレームである第n-1フレームの信号サンプルxn-1(t)に、置換する。これにより、図6の下部に示す通り、第nフレームおよび第n+1フレームの信号は、第n-1フレームの信号と同様の信号に置換される。 Then, the replacement unit 210, the n-th frame is the signal samples of the n + 1 frame x n (t) and x n + 1 a (t), the n-1 frame signal samples x n which is the immediately preceding frame start time -1 Replace with (t). As a result, as shown in the lower part of FIG. 6, the signals of the nth frame and the (n + 1) th frame are replaced with the same signals as the signals of the (n−1) th frame.
 なお、上記では、置換部210が、打撃部区間の信号を、打撃部区間の開始時刻の直前のフレームの信号に置換することを例に説明を行ったが、本実施の形態はこれに限定されるものではない。置換部210は、打撃部区間の特徴量を、打撃部区間の開始時刻の直前のフレームの特徴量に置換してもよい。この特徴量は、例えば、音声認識に一般的に用いられるメル周波数ケプストラム係数、メル対数スペクトル等を用いたものであってもよいし、その他の特徴量であってもよい。このように、置換部210は、打撃部区間のフレームとは異なるフレームに関連する情報(例えば、打撃部区間以外のフレームの信号、特徴量等)を用いて、打撃部区間のフレームに関連する情報を、打撃部区間のフレームとは異なるフレームに関連する情報に置換する。 In the above description, the replacement unit 210 replaces the signal of the hitting section with the signal of the frame immediately before the start time of the hitting section, but the present embodiment is not limited thereto. Is not to be done. The replacement unit 210 may replace the feature amount of the hitting section with the feature amount of the frame immediately before the start time of the hitting section. This feature amount may be, for example, a mel frequency cepstrum coefficient generally used for speech recognition, a mel logarithmic spectrum, or the like, or other feature amount. As described above, the replacement unit 210 uses information related to a frame different from the frame of the hitting section (for example, a signal of the frame other than the hitting section, a feature amount, etc.) and relates to the frame of the hitting section. The information is replaced with information related to a frame different from the frame of the hitting section.
 また、置換部210が打撃部区間の信号を置き換える信号は、打撃部区間の開始時刻の直前のフレームの信号であってもよいし、打撃部区間の終了時刻の直後のフレームの信号であってもよい。また、上記置き換える信号は、打撃部区間の開始時刻の直前のフレームの信号と、打撃部区間の終了時刻の直後のフレームの信号とを用いてもよい。例えば、置換部210は、打撃部区間の中心時間を算出し、算出した中心時間より前のフレームの信号を、打撃部区間の開始時刻の直前のフレームの信号で置き換え、算出した中心時間より後のフレームの信号を、打撃部区間の終了時刻の直後のフレームの信号で置き換えてもよい。このとき、置換部210が算出する時間は中心時間でなくてもよく、任意の時間であってもよい。 Further, the signal that the replacement unit 210 replaces the signal of the striking section may be a signal of a frame immediately before the start time of the striking section, or a signal of a frame immediately after the end time of the striking section. Also good. The replacement signal may be a signal of a frame immediately before the start time of the hitting section and a signal of a frame immediately after the end time of the hitting section. For example, the replacement unit 210 calculates the center time of the hitting section, replaces the signal of the frame before the calculated center time with the signal of the frame immediately before the start time of the hitting section, and is later than the calculated center time. The signal of the frame may be replaced with the signal of the frame immediately after the end time of the hitting section. At this time, the time calculated by the replacement unit 210 may not be the central time, and may be an arbitrary time.
 また、置換部210は、打撃部区間の開始時刻の直前のフレームの信号と、打撃部区間の終了時刻の直後のフレームの信号とを用いて、打撃部区間の信号を算出し、打撃部区間をこの算出した信号で補間してもよい。例えば、置換部210は、打撃部区間の開始時刻の直前のフレームの信号と、打撃部区間の終了時刻の直後のフレームの信号とに任意の重みを掛け足し合わせることにより、打撃部区間の信号を算出し、打撃部区間をこの算出した信号で補間してもよい。 Further, the replacement unit 210 calculates the signal of the striking section using the signal of the frame immediately before the start time of the striking section and the signal of the frame immediately after the end time of the striking section, and the striking section May be interpolated with the calculated signal. For example, the replacement unit 210 adds an arbitrary weight to the signal of the frame immediately before the start time of the hitting section and the signal of the frame immediately after the end time of the hitting section, thereby adding the signal of the hitting section. And the striking section may be interpolated with the calculated signal.
 また、置換部210は、打撃部区間の信号を、ゼロ信号または白色雑音等の雑音で置換してもよい。 Further, the replacement unit 210 may replace the signal of the hitting section with a noise such as a zero signal or white noise.
 また、置換部210は、打撃部区間の信号を削除し、打撃部区間の開始時刻の直前のフレームと、打撃部区間の終了時刻の直後のフレームとをつなげた信号を生成してもよい。 Further, the replacement unit 210 may delete the signal of the hitting section and generate a signal that connects the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.
 また、置換部210は、打撃部区間の終了時刻の直後のフレームから所定数のフレームを、衝撃音の減衰部区間として検出し、更なる雑音抑圧処理を行ってもよい。 Further, the replacement unit 210 may detect a predetermined number of frames from the frame immediately after the end time of the hitting section as the impact sound attenuation section and perform further noise suppression processing.
 (雑音抑圧装置200の動作)
 次に、図7を用いて、本実施の形態に係る雑音抑圧装置200の動作について説明する。図7は、本実施の形態に係る雑音抑圧装置200の動作の一例を示すフローチャートである。
(Operation of noise suppression apparatus 200)
Next, the operation of the noise suppression apparatus 200 according to the present embodiment will be described using FIG. FIG. 7 is a flowchart showing an example of the operation of the noise suppression apparatus 200 according to the present embodiment.
 まず、雑音抑圧装置200の雑音検出装置100が、打撃部区間を決定する打撃部区間決定処理を行う(ステップS71)。このステップS71は、図4を用いて説明したステップS41~ステップS56の処理を行うことを示している。 First, the noise detection device 100 of the noise suppression device 200 performs a striking section determination process for determining a striking section (step S71). This step S71 indicates that the processes of steps S41 to S56 described with reference to FIG. 4 are performed.
 次に、置換部210が、ステップS71で決定された打撃部区間の開始時刻の直前のフレームを特定する(ステップS72)。そして、置換部210は、音響信号の上記打撃部区間に対応するフレームの信号を、特定したフレームの信号に置換する(ステップS73)。これにより、置換部210は、音響信号の打撃部区間の雑音を抑圧することができる。以上により、雑音抑圧装置200は処理を終了する。 Next, the replacement unit 210 identifies the frame immediately before the start time of the hitting section determined in step S71 (step S72). Then, the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame (step S73). Thereby, the replacement unit 210 can suppress noise in the hitting section of the acoustic signal. Thus, the noise suppression device 200 ends the process.
 以上のように、本実施の形態に係る雑音抑圧装置200によれば、打撃部区間の信号を、打撃部区間のフレームとは別のフレームの信号に置き換えることにより、打撃部区間の雑音を抑圧することができる。 As described above, according to the noise suppression apparatus 200 according to the present embodiment, the noise in the striking section is suppressed by replacing the signal in the striking section with a signal of a frame different from the frame in the striking section. can do.
 なお、置換部210が置換を行う場合の打撃部区間の長さは、0.05秒程度が好ましい。なぜならば、雑音抑圧処理を施した音響信号の音声認識をする場合に、置換する区間がより短い方が、音声の認識率を向上させることができるからである。 In addition, about 0.05 second is preferable as the length of the striking section when the replacement section 210 performs the replacement. This is because, when speech recognition of an acoustic signal subjected to noise suppression processing is performed, the speech recognition rate can be improved when the replacement interval is shorter.
 これにより、雑音抑圧装置200は、上述した第2の実施の形態に係る効果に加え、より雑音を抑圧することができるという効果を得ることができる。 Thereby, the noise suppression apparatus 200 can obtain an effect that noise can be further suppressed in addition to the effect according to the second embodiment described above.
 なお、本実施の形態では、雑音抑圧装置200に含まれる置換部210が、雑音検出装置100とは別個の構成であることを例に説明を行ったが、本実施の形態はこれに限定されるものではない。置換部210は、雑音検出装置100内に内蔵されるものであってもよい。この場合、雑音検出装置100は、算出部110と、第1の検出部120と、第2の検出部130と、記憶部140と、置換部210とを備える構成となる。このような雑音検出装置100は、本実施の形態に係る雑音抑圧装置200と同様の効果を得ることができる。 In the present embodiment, the replacement unit 210 included in the noise suppression device 200 has been described as an example of a configuration different from that of the noise detection device 100. However, the present embodiment is not limited to this. It is not something. The replacement unit 210 may be built in the noise detection apparatus 100. In this case, the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, a second detection unit 130, a storage unit 140, and a replacement unit 210. Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 200 according to the present embodiment.
 <第4の実施の形態>
 次に、本発明の第4の実施の形態について、図面を参照して説明する。図8は、本実施の形態に係る雑音抑圧装置300の機能構成の一例を示す機能ブロック図である。なお、説明の便宜上、前述した第2および第3の実施の形態で説明した図面に含まれる部材と同じ機能を有する部材については、同じ符号を付し、その説明を省略する。
<Fourth embodiment>
Next, a fourth embodiment of the present invention will be described with reference to the drawings. FIG. 8 is a functional block diagram illustrating an example of a functional configuration of the noise suppression apparatus 300 according to the present embodiment. For convenience of explanation, members having the same functions as those included in the drawings described in the second and third embodiments described above are denoted by the same reference numerals and description thereof is omitted.
 図8に示す通り、雑音抑圧装置300は、第1の実施の形態において説明した雑音検出装置10または第2の実施の形態において説明した雑音検出装置100と、置換部210と、波形変換部310と、を備えている。 As shown in FIG. 8, the noise suppression apparatus 300 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, a replacement unit 210, and a waveform conversion unit 310. And.
 雑音検出装置10および雑音検出装置100の機能構成は、図2および図3を用いて説明した機能構成と同様であるため、説明を省略する。以下では、雑音抑圧装置200が雑音検出装置100を備えるとして説明するが、雑音抑圧装置200が雑音検出装置10を備える構成であってもよいことは言うまでもない。 The functional configurations of the noise detection device 10 and the noise detection device 100 are the same as the functional configuration described with reference to FIG. 2 and FIG. In the following description, it is assumed that the noise suppression device 200 includes the noise detection device 100, but it goes without saying that the noise suppression device 200 may be configured to include the noise detection device 10.
 波形変換部310は、置換部210から、置換部210が抑圧処理を施した信号を受信する。具体的には、波形変換部310は、置換部210が、音響信号の打撃部区間に対応するフレームの信号を、特定したフレームの信号に置換した後の信号を受信する。この特定したフレームとは、例えば、音響信号における打撃部区間の開始時刻の直前のフレームである。 The waveform conversion unit 310 receives from the replacement unit 210 the signal on which the replacement unit 210 has performed suppression processing. Specifically, the waveform converter 310 receives the signal after the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame. The specified frame is, for example, a frame immediately before the start time of the hitting section in the acoustic signal.
 そして、波形変換部310は、受信した信号を、利用者が利用可能な形態に変換する。具体的には、波形変換部310は、受信した信号を、利用者が見たり、聞いたりできる形の波形に変換する。 Then, the waveform converter 310 converts the received signal into a form usable by the user. Specifically, the waveform converter 310 converts the received signal into a waveform that can be viewed and heard by the user.
 例えば、受信した信号が、フーリエ変換が施されることにより取得された周波数領域におけるスペクトル信号である場合、波形変換部310は、受信した信号に逆フーリエ変換を施すことで、受信した信号を波形に変換する。 For example, when the received signal is a spectrum signal in the frequency domain acquired by performing Fourier transform, the waveform converting unit 310 performs inverse Fourier transform on the received signal, thereby converting the received signal into a waveform. Convert to
 これにより、雑音抑圧装置300は、図示しない表示装置に波形変換部310が波形を表示させることができる。これにより、雑音抑圧装置300は、ユーザが利用可能な状態の音響信号であって、雑音が抑圧された音響信号を、ユーザに提示することができる。 Thereby, in the noise suppression device 300, the waveform converter 310 can display the waveform on a display device (not shown). Thereby, the noise suppression apparatus 300 can present to the user an acoustic signal in a state where the user can use it, and the noise is suppressed.
 なお、本実施の形態では、雑音抑圧装置300に含まれる波形変換部310が、雑音検出装置100とは別個の構成であることを例に説明を行ったが、本実施の形態はこれに限定されるものではない。波形変換部310は、雑音検出装置100内に内蔵されるものであってもよい。このような雑音検出装置100は、本実施の形態に係る雑音抑圧装置300と同様の効果を得ることができる。 In the present embodiment, the waveform conversion unit 310 included in the noise suppression device 300 has been described as an example of a configuration different from that of the noise detection device 100. However, the present embodiment is not limited to this. Is not to be done. The waveform conversion unit 310 may be built in the noise detection apparatus 100. Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 300 according to the present embodiment.
 <第5の実施の形態>
 次に、図9および図10を参照して、本実施の形態における雑音抑圧装置400について説明する。
<Fifth embodiment>
Next, with reference to FIG. 9 and FIG. 10, the noise suppression apparatus 400 in this Embodiment is demonstrated.
 音声信号の認識率を向上させる場合、音声信号から雑音を低減する処理(雑音抑圧処理)を適切に行う必要がある。なぜならば、雑音抑圧処理が不十分である場合、音声信号に雑音が重畳されたままになってしまうため、音声信号の認識率が低下してしまうからである。また、雑音抑圧処理を過度に行ってしまうと、必要な音声さえも雑音として抑圧されてしまい、音声信号の認識率が低下してしまうからである。 In order to improve the speech signal recognition rate, it is necessary to appropriately perform processing (noise suppression processing) for reducing noise from the speech signal. This is because if the noise suppression process is insufficient, noise remains superimposed on the audio signal, and the recognition rate of the audio signal is reduced. Moreover, if the noise suppression process is excessively performed, even necessary speech is suppressed as noise, and the recognition rate of the speech signal is reduced.
 したがって、本実施の形態では、音声信号の雑音抑圧処理をより効果的に行うことを目的とする。 Therefore, an object of the present embodiment is to more effectively perform noise suppression processing of an audio signal.
 図9は、本実施の形態に係る雑音抑圧装置400の機能構成の一例を示す機能ブロック図である。図9に示す通り、本実施の形態に係る雑音抑圧装置400は、検出部410と、置換部420とを備える。 FIG. 9 is a functional block diagram illustrating an example of a functional configuration of the noise suppression device 400 according to the present embodiment. As shown in FIG. 9, noise suppression apparatus 400 according to the present embodiment includes detection section 410 and replacement section 420.
 検出部410は、衝撃音を含む音響信号から、衝撃音の初めの区間を検出する。この初めの区間は、パワーが初めの区間に後続する後続区間よりも大きく、且つ、パワーが広帯域に存在する区間である。検出部410が検出する、初めの区間とは、図1を用いて説明した打撃部区間である。検出部410は、例えば、上述した各実施の形態における雑音検出装置100で実現される。このとき、雑音検出装置100は、第2の実施の形態と同様に、位相スペクトルの直線性を示す指標(位相特徴量PL)を用いて、打撃部区間を検出してもよい。また、検出部410は、上述した各実施の形態ににおける雑音検出装置によって実現されるものに限定されず、例えば、音量の急変、振幅特徴の大きさの変化、パワースペクトル特徴やその時間変化やスペクトルの平坦さを、特徴量として算出し、該算出した特徴量を用いて打撃部区間を検出してもよい。また、検出部410は、上記特徴量を複数組み合わせたものを、特徴量として用いて、打撃部区間を検出してもよい。このように、検出部410が打撃部区間を検出する方法は、特に限定されない。 The detection unit 410 detects the first section of the impact sound from the acoustic signal including the impact sound. This first section is a section where the power is larger than the subsequent section following the first section and the power exists in a wide band. The first section detected by the detection unit 410 is the hitting section described with reference to FIG. The detection unit 410 is realized by, for example, the noise detection apparatus 100 in each of the above-described embodiments. At this time, similarly to the second embodiment, the noise detection apparatus 100 may detect the hitting section using an index (phase feature amount PL n ) indicating the linearity of the phase spectrum. In addition, the detection unit 410 is not limited to that realized by the noise detection device in each of the above-described embodiments. For example, a sudden change in volume, a change in magnitude of an amplitude feature, a power spectrum feature, a time change thereof, The flatness of the spectrum may be calculated as a feature amount, and the hitting section may be detected using the calculated feature amount. Moreover, the detection part 410 may detect a hit | damage part area | region using what combined the said feature-value two or more as a feature-value. Thus, the method in which the detection part 410 detects a hit | damage part area is not specifically limited.
 検出部410は、検出した、打撃部区間を示す情報を、置換部420に出力する。 The detection unit 410 outputs the detected information indicating the hitting section to the replacement unit 420.
 置換部420は、打撃部区間を示す区間情報を、検出部410から取得する。そして、置換部420は、音響信号のうち、受信した区間情報によって示される、打撃部区間を特定する。そして、特定した区間に含まれるフレームとは異なるフレームを、情報を置き換えるフレームとして特定する。置換部420が、情報を置き換えるフレームとして特定するフレームは、例えば、上述した第3の実施の形態における置換部210と同様に、打撃部区間の開始時刻の直前のフレームであってもよい。また、置換部420が、情報を置き換えるフレームとして特定するフレームは、例えば、打撃部区間の終了時刻の直後のフレームであってもよい。 The replacement unit 420 acquires section information indicating the hitting section from the detection unit 410. Then, the replacement unit 420 specifies a hitting unit section indicated by the received section information in the acoustic signal. Then, a frame different from the frame included in the specified section is specified as a frame for replacing information. The frame that the replacement unit 420 specifies as a frame for replacing information may be, for example, the frame immediately before the start time of the hitting unit section, similarly to the replacement unit 210 in the third embodiment described above. Further, the frame specified by the replacement unit 420 as a frame for replacing information may be, for example, a frame immediately after the end time of the hitting section.
 そして、置換部420は、特定した、情報を置き換えるフレームに関連する第1の情報を用いて、打撃部区間に含まれるフレームに関連する第2の情報を、第1の情報に置換する。ここで、フレームに関連する情報が、例えば、フレームに含まれる音響信号(信号サンプル)の場合、置換部420は、打撃部区間に含まれるフレームの信号サンプルを、特定したフレームの信号サンプルに置換する。 Then, the replacement unit 420 replaces the second information related to the frame included in the hitting section with the first information using the specified first information related to the frame for replacing the information. Here, when the information related to the frame is, for example, an acoustic signal (signal sample) included in the frame, the replacement unit 420 replaces the signal sample of the frame included in the striking unit section with the signal sample of the specified frame. To do.
 なお、置換部420は、上述した第3の実施の形態における置換部210と同様に、特定した、情報を置き換えるフレームに関連する第1の情報を用いて、打撃部区間に含まれるフレームを第1の情報に基づいた情報で補間してもよい。 Note that, similarly to the replacement unit 210 in the third embodiment described above, the replacement unit 420 uses the first information related to the specified frame to replace information to identify the frame included in the hitting unit section. Interpolation may be performed using information based on the information of 1.
 また、置換部420は、打撃部区間の信号を、ゼロ信号または白色雑音等の雑音で置換してもよい。 Further, the replacement unit 420 may replace the signal of the hitting section with a noise such as a zero signal or white noise.
 また、置換部420は、打撃部区間の信号を削除し、打撃部区間の開始時刻の直前のフレームと、打撃部区間の終了時刻の直後のフレームとをつなげた信号を生成してもよい。 Further, the replacement unit 420 may delete the signal of the hitting section and generate a signal connecting the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.
 (雑音抑圧装置400の動作)
 次に、図10を用いて、本実施の形態に係る雑音抑圧装置400の動作について説明する。図10は、本実施の形態に係る雑音抑圧装置400の動作の一例を示すフローチャートである。
(Operation of noise suppression device 400)
Next, the operation of the noise suppression apparatus 400 according to the present embodiment will be described using FIG. FIG. 10 is a flowchart showing an example of the operation of the noise suppression apparatus 400 according to the present embodiment.
 図10に示す通り、まず、検出部410が、打撃部区間を検出する(ステップS101)。このステップS101の処理は、図7のステップS71と同様の処理であってもよい。 As shown in FIG. 10, first, the detection unit 410 detects a hitting section (step S101). The process of step S101 may be the same process as step S71 of FIG.
 次に、置換部420が、ステップS101で検出した区間のフレームと情報を置き換えるフレームを特定する(ステップS102)。そして、置換部420は、音響信号の上記検出した区間に対応するフレームに関連する第2の情報を、特定したフレームに関連する第1の情報に置換する(ステップS103)。これにより、置換部420は、音響信号のうち、衝撃音の打撃部区間の雑音を抑圧することができる。以上により、雑音抑圧装置400は処理を終了する。 Next, the replacement unit 420 specifies a frame for replacing information with the frame in the section detected in step S101 (step S102). Then, the replacement unit 420 replaces the second information related to the frame corresponding to the detected section of the acoustic signal with the first information related to the identified frame (step S103). Thereby, the replacement part 420 can suppress the noise of the impact part hit | damage part area among acoustic signals. Thus, the noise suppression device 400 ends the process.
 以上のように、本実施の形態に係る雑音抑圧装置400によれば、衝撃音の打撃部区間のフレームに関連する第2の情報を、衝撃音の打撃部区間のフレームとは別のフレームに関連する第1の情報に置き換えることができる。これにより、雑音抑圧装置400は、衝撃音の打撃部区間の雑音を抑圧することができる。 As described above, according to the noise suppression apparatus 400 according to the present embodiment, the second information related to the frame of the impact sound hitting section is stored in a frame different from the frame of the impact sound hitting section. It can be replaced with related first information. Thereby, the noise suppression apparatus 400 can suppress the noise in the impact sound hitting section.
 これにより、雑音抑圧装置400は、音声信号の雑音抑圧処理をより効果的に行うことができるという効果を得ることができる。 Thereby, the noise suppression device 400 can obtain an effect that the noise suppression processing of the voice signal can be performed more effectively.
 (ハードウェア構成について)
 なお、図2および図3に示した雑音検出装置(10、100)並びに図5、図8および図9に示した雑音抑圧装置(200、300、400)の各部は、図11に例示するハードウェア資源で実現してもよい。すなわち、図11に示す構成は、RAM(Random Access Memory)91、ROM(Read Only Memory)92、通信インタフェース93、記憶媒体94およびCPU(Central Processing Unit)95を備える。CPU95は、ROM92または記憶媒体94に記憶された各種ソフトウェアプログラム(コンピュータプログラム)を、RAM91に読み出して実行することにより、雑音検出装置(10、100)および雑音抑圧装置(200、300、400)の全体的な動作を司る。すなわち、上記各実施形態において、CPU95は、ROM92または記憶媒体94を適宜参照しながら、雑音検出装置(10、100)および雑音抑圧装置(200、300、400)が備える各機能(各部)を実行するソフトウェアプログラムを実行する。
(About hardware configuration)
Each part of the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 is the same as the hardware shown in FIG. It may be realized with hardware resources. That is, the configuration shown in FIG. 11 includes a RAM (Random Access Memory) 91, a ROM (Read Only Memory) 92, a communication interface 93, a storage medium 94, and a CPU (Central Processing Unit) 95. The CPU 95 reads out various software programs (computer programs) stored in the ROM 92 or the storage medium 94 to the RAM 91 and executes them, so that the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400) are executed. It governs overall operation. That is, in each of the above embodiments, the CPU 95 executes each function (each unit) included in the noise detection device (10, 100) and the noise suppression device (200, 300, 400) while referring to the ROM 92 or the storage medium 94 as appropriate. Execute the software program to be executed.
 また、各実施形態を例に説明した本発明は、雑音検出装置(10、100)および雑音抑圧装置(200、300、400)に対して、上記説明した機能を実現可能なコンピュータプログラムを供給した後、そのコンピュータプログラムを、CPU95がRAM91に読み出して実行することによって達成される。 Further, the present invention described by taking each embodiment as an example supplied a computer program capable of realizing the functions described above to the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400). Thereafter, the computer program is read out by the CPU 95 to the RAM 91 and executed.
 また、係る供給されたコンピュータプログラムは、読み書き可能なメモリ(一時記憶媒体)またはハードディスク装置等のコンピュータ読み取り可能な記憶デバイスに格納すればよい。そして、このような場合において、本発明は、係るコンピュータプログラムを表すコード或いは係るコンピュータプログラムを格納した記憶媒体によって構成されると捉えることができる。 The supplied computer program may be stored in a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device. In such a case, the present invention can be understood as being configured by a code representing the computer program or a storage medium storing the computer program.
 上述した各実施形態では、図2および図3に示した雑音検出装置(10、100)並びに図5、図8および図9に示した雑音抑圧装置(200、300、400)における各ブロックに示す機能を、図11に示すCPU95が実行する一例として、ソフトウェアプログラムによって実現する場合について説明した。しかしながら、図2、3、5、8および9に示した各ブロックに示す機能は、一部または全部を、ハードウェアの回路として実現してもよい。 In each of the above-described embodiments, the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 are shown in each block. The case where the function is realized by a software program has been described as an example executed by the CPU 95 shown in FIG. However, some or all of the functions shown in the blocks shown in FIGS. 2, 3, 5, 8, and 9 may be realized as hardware circuits.
 なお、上述した各実施の形態は、本発明の好適な実施の形態であり、上記各実施の形態にのみ本発明の範囲を限定するものではなく、本発明の要旨を逸脱しない範囲において当業者が上記各実施の形態の修正や代用を行い、種々の変更を施した形態を構築することが可能である。 Each of the above-described embodiments is a preferred embodiment of the present invention, and the scope of the present invention is not limited only to the above-described embodiments, and those skilled in the art do not depart from the gist of the present invention. However, it is possible to construct a form in which various modifications are made by correcting or substituting the above-described embodiments.
 上記の実施の形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above embodiments can be described as in the following supplementary notes, but are not limited thereto.
 (付記1)衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する算出手段と、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出する第1の検出手段と、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する第2の検出手段と、を備えることを特徴とする雑音検出装置。 (Additional remark 1) The calculation means which calculates the feature-value showing the steepness of the change of the said acoustic signal from the acoustic signal containing an impact sound for every flame | frame which divided | segmented this acoustic signal into predetermined time length, and the said feature-value First detection means for detecting a frame having a greater signal change steepness than the audio signal as a start time of an impact sound section in which the impact sound exists, and based on the feature amount, the start Second detection means for detecting the last frame of the frames whose signal change is more steep than the audio signal continuously from the time as the end time of the impact sound section. Noise detection device.
 (付記2)前記第1の検出手段は、前記特徴量と、第1の閾値とを比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第1の閾値によって表される音響信号の変化の急峻さよりも大きい場合に、前記特徴量を算出したフレームを、前記衝撃音区間の開始時刻として検出し、前記第2の検出手段は、前記特徴量と、前記第1の閾値によって表される音響信号の変化の急峻さよりも、小さい急峻さを表す第2の閾値と、を比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第2の閾値によって表される音響信号の変化の急峻さよりも小さい場合に、前記特徴量を算出したフレームの直前のフレームを、前記衝撃音区間の終了時刻として検出する、ことを特徴とする付記1に記載の雑音検出装置。 (Supplementary Note 2) The first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value. A frame in which the feature value is calculated is detected as a start time of the impact sound section, and the second detection unit is configured to detect the feature value and the first sound value. The second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value. The supplementary note 1, wherein a frame immediately before the frame for which the feature amount is calculated is detected as an end time of the impact sound section when the acoustic signal is represented by Noise detection device.
 (付記3)前記算出手段は、前記音響信号を位相スペクトルに変換する変換手段と、前記位相スペクトルの直線性を算出する直線性算出手段と、を備え、前記直線性算出手段が算出した、前記位相スペクトルの直線性を表す指標を、前記特徴量として算出する、ことを特徴とする付記1または2に記載の雑音検出装置。 (Additional remark 3) The said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise detection apparatus according to appendix 1 or 2, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.
 (付記4)前記直線性算出手段は、前記位相スペクトルと、該位相スペクトルの周波数帯域に隣接する隣接周波数帯域における位相スペクトルとの差分のばらつきに基づく値を用いて、前記位相スペクトルの直線性を算出する、ことを特徴とする付記3に記載の雑音検出装置。 (Additional remark 4) The said linearity calculation means calculates the linearity of the said phase spectrum using the value based on the dispersion | variation in the difference of the said phase spectrum and the phase spectrum in the adjacent frequency band adjacent to the frequency band of this phase spectrum. The noise detection device according to supplementary note 3, wherein the noise detection device is calculated.
 (付記5)前記衝撃音区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記衝撃音区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記衝撃音区間に含まれるフレームを前記第1の情報に基づいた情報で補間する置換手段を更に備える、ことを特徴とする付記1から4の何れか1つに記載の雑音検出装置。 (Supplementary Note 5) Using the first information related to a frame different from the frame included in the impact sound section, the second information related to the frame included in the shock sound section is replaced with the first information. Or noise detection according to any one of appendices 1 to 4, further comprising replacement means for interpolating a frame included in the impact sound section with information based on the first information. apparatus.
 (付記6)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する置換手段と、を備えることを特徴とする雑音抑圧装置。 (Additional remark 6) From the acoustic signal containing an impact sound, it is the first area of the said impact sound, and the power is larger than the subsequent area which follows the said first area, and the said area where the said power exists in a wide band. Second information related to the frame included in the first section is detected using the detection means for detecting the first information related to the frame different from the frame included in the first section. A noise suppression device comprising: replacement means for replacing with information or interpolating a frame included in the first section with information based on the first information.
 (付記7)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、前記初めの区間の信号を、予め用意された所定の信号に置換する、または、削除する置換手段と、を備えることを特徴とする雑音抑圧装置。 (Additional remark 7) From the acoustic signal containing impact sound, it is the first section of the said shock sound, Comprising: The first section where the power is larger than the subsequent section which follows the said first section, and the said power exists in a wide band A noise suppression apparatus, comprising: a detection unit that detects the signal and a replacement unit that replaces or deletes the signal in the first section with a predetermined signal prepared in advance.
 (付記8)前記検出手段は、前記音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する算出手段と、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記初めの区間の開始時刻として検出する第1の検出手段と、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記初めの区間の終了時刻として検出する第2の検出手段と、を備えることを特徴とする付記6または7に記載の雑音抑圧装置。 (Additional remark 8) The said detection means calculates the feature-value showing the steepness of the change of the said acoustic signal from the said acoustic signal for every flame | frame which divided | segmented this acoustic signal into predetermined time length, The said characteristic First detection means for detecting a frame having a greater signal change steepness than the audio signal based on the amount as the start time of the first section, and continuing from the start time based on the feature amount Or a second detection means for detecting the last frame of the frames whose signal change is sharper than the audio signal as the end time of the first section. 8. The noise suppression device according to 7.
 (付記9)前記第1の検出手段は、前記特徴量と、第1の閾値とを比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第1の閾値によって表される音響信号の変化の急峻さよりも大きい場合に、前記特徴量を算出したフレームを、前記初めの区間の開始時刻として検出し、前記第2の検出手段は、前記特徴量と、前記第1の閾値によって表される音響信号の変化の急峻さよりも、小さい急峻さを表す第2の閾値と、を比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第2の閾値によって表される音響信号の変化の急峻さよりも小さい場合に、前記特徴量を算出したフレームの直前のフレームを、前記初めの区間の終了時刻として検出する、ことを特徴とする付記8に記載の雑音抑圧装置。 (Supplementary Note 9) The first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value. A frame in which the feature amount is calculated is detected as a start time of the first section, and the second detection unit is configured to detect the feature amount and the first The second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value. 9. The supplementary note 8, wherein a frame immediately before the frame for which the feature amount has been calculated is detected as an end time of the first section when the change in the acoustic signal represented by is smaller than the steepness of the change. Noise suppression device.
 (付記10)前記算出手段は、前記音響信号を位相スペクトルに変換する変換手段と、前記位相スペクトルの直線性を算出する直線性算出手段と、を備え、前記直線性算出手段が算出した、前記位相スペクトルの直線性を表す指標を、前記特徴量として算出する、ことを特徴とする付記8または9に記載の雑音抑圧装置。 (Additional remark 10) The said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise suppression device according to appendix 8 or 9, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.
 (付記11)前記直線性算出手段は、前記位相スペクトルと、該位相スペクトルの周波数帯域に隣接する隣接周波数帯域における位相スペクトルとの差分のばらつきに基づく値を用いて、前記位相スペクトルの直線性を算出する、ことを特徴とする付記10に記載の雑音抑圧装置。 (Additional remark 11) The said linearity calculation means calculates | requires the linearity of the said phase spectrum using the value based on the dispersion | variation in the difference of the said phase spectrum and the phase spectrum in the adjacent frequency band adjacent to the frequency band of this phase spectrum. The noise suppression device according to appendix 10, wherein the noise suppression device is calculated.
 (付記12)衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出し、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出し、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する、ことを特徴とする雑音検出方法。 (Additional remark 12) From the acoustic signal including the impact sound, a feature amount representing the steepness of the change of the acoustic signal is calculated for each frame obtained by dividing the acoustic signal into a predetermined time length, and based on the feature amount, A frame in which the change of the signal is sharper than that of the audio signal is detected as a start time of the impact sound section where the impact sound exists, and based on the feature amount, the frame is continuously detected from the start time. A noise detection method comprising: detecting a last frame of frames having a large signal change steepness as an end time of the impact sound section.
 (付記13)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出し、前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する、ことを特徴とする雑音抑圧方法。 (Additional remark 13) From the acoustic signal containing the impact sound, it is the first section of the impact sound, the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band And the second information related to the frame included in the first section is replaced with the first information using the first information related to the frame different from the frame included in the first section. Or interpolating a frame included in the first section with information based on the first information.
 (付記14)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出し、前記初めの区間の信号を、予め用意された所定の信号に置換する、または、削除する雑音抑圧方法。 (Additional remark 14) From the acoustic signal containing the impact sound, it is the first section of the impact sound, and the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band. , And the signal in the first section is replaced with a predetermined signal prepared in advance or deleted.
 (付記15)衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する処理と、前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出する処理と、前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する処理と、をコンピュータに実行させるプログラム。 (Additional remark 15) Based on the said feature-value, the process which calculates the feature-value showing the steepness of the change of the said acoustic signal from the acoustic signal containing an impact sound for every flame | frame which divided | segmented this acoustic signal into predetermined time length Then, a process of detecting a frame having a greater signal change steepness than the audio signal as the start time of the impact sound section in which the impact sound exists, and continuously from the start time based on the feature amount A program that causes a computer to execute a process of detecting a last frame of frames having a greater signal change steepness than the audio signal as an end time of the impact sound section.
 (付記16)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する処理と、前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する処理と、をコンピュータに実行させるプログラム。 (Additional remark 16) From the acoustic signal containing the impact sound, the first section of the impact sound, the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band And the second information related to the frame included in the first section is converted into the first information using the first information related to the frame different from the frame included in the first section. Or a program for causing a computer to execute a process of interpolating a frame included in the first section with information based on the first information.
 (付記17)衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する処理と、前記初めの区間の信号を、予め用意された所定の信号に置換する、または、削除する処理とをコンピュータに実行させるプログラム。 (Supplementary Note 17) From an acoustic signal including an impact sound, the first interval of the impact sound, the power being greater than the subsequent interval following the initial interval, and the first interval where the power exists in a wide band And a program for causing a computer to execute a process of detecting a signal and a process of replacing or deleting the signal of the first section with a predetermined signal prepared in advance.
 この出願は、2015年6月16日に出願された日本出願特願2015-121229を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2015-121229 filed on June 16, 2015, the entire disclosure of which is incorporated herein.
 10  雑音検出装置
 11  算出部
 12  第1の検出部
 13  第2の検出部
 100  雑音検出装置
 110  算出部
 111  変換部
 1111  フレーム分割部
 1112  窓がけ処理部
 1113  フーリエ変換部
 112  指標算出部
 1121  変化量算出部
 1122  差分算出部
 1123  特徴量算出部
 120  第1の検出部
 130  第2の検出部
 140  記憶部
 200  雑音抑圧装置
 210  置換部
 300  雑音抑圧装置
 310  波形変換部
 400  雑音抑圧装置
 410  検出部
 420  置換部
DESCRIPTION OF SYMBOLS 10 Noise detection apparatus 11 Calculation part 12 1st detection part 13 2nd detection part 100 Noise detection apparatus 110 Calculation part 111 Conversion part 1111 Frame division part 1112 Windowing process part 1113 Fourier transform part 112 Index calculation part 1121 Change amount calculation Unit 1122 difference calculation unit 1123 feature quantity calculation unit 120 first detection unit 130 second detection unit 140 storage unit 200 noise suppression device 210 replacement unit 300 noise suppression device 310 waveform conversion unit 400 noise suppression device 410 detection unit 420 replacement unit

Claims (10)

  1.  衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する算出手段と、
     前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出する第1の検出手段と、
     前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する第2の検出手段と、を備えることを特徴とする雑音検出装置。
    A calculation means for calculating a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound for each frame obtained by dividing the acoustic signal into predetermined time lengths;
    First detection means for detecting, based on the feature amount, a frame having a greater change in signal than the audio signal as a start time of an impact sound section in which the impact sound exists;
    Second detection means for detecting, based on the feature amount, the last frame among the frames having a greater signal change steepness than the audio signal continuously from the start time as an end time of the impact sound section. And a noise detection device comprising:
  2.  前記第1の検出手段は、前記特徴量と、第1の閾値とを比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第1の閾値によって表される音響信号の変化の急峻さよりも大きい場合に、前記特徴量を算出したフレームを、前記衝撃音区間の開始時刻として検出し、
     前記第2の検出手段は、前記特徴量と、前記第1の閾値によって表される音響信号の変化の急峻さよりも、小さい急峻さを表す第2の閾値と、を比較し、前記特徴量によって表される音響信号の変化の急峻さが、前記第2の閾値によって表される音響信号の変化の急峻さよりも小さい場合に、前記特徴量を算出したフレームの直前のフレームを、前記衝撃音区間の終了時刻として検出する、ことを特徴とする請求項1に記載の雑音検出装置。
    The first detection means compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is the acoustic signal represented by the first threshold value. When the change is greater than the steepness of the change, the frame in which the feature amount is calculated is detected as the start time of the impact sound section,
    The second detection means compares the feature amount with a second threshold value representing a steepness smaller than the steepness of the change in the acoustic signal represented by the first threshold value, and determines the feature amount according to the feature amount. When the steepness of the change of the acoustic signal represented is smaller than the steepness of the change of the acoustic signal represented by the second threshold value, the frame immediately before the frame in which the feature amount is calculated is defined as the impact sound section. The noise detection device according to claim 1, wherein the noise detection device is detected as an end time.
  3.  前記算出手段は、
      前記音響信号を位相スペクトルに変換する変換手段と、
      前記位相スペクトルの直線性を算出する直線性算出手段と、を備え、
     前記直線性算出手段が算出した、前記位相スペクトルの直線性を表す指標を、前記特徴量として算出する、ことを特徴とする請求項1または2に記載の雑音検出装置。
    The calculating means includes
    Conversion means for converting the acoustic signal into a phase spectrum;
    Linearity calculating means for calculating the linearity of the phase spectrum,
    The noise detection apparatus according to claim 1, wherein an index representing linearity of the phase spectrum calculated by the linearity calculation unit is calculated as the feature amount.
  4.  前記衝撃音区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記衝撃音区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記衝撃音区間に含まれるフレームを前記第1の情報に基づいた情報で補間する置換手段を更に備える、ことを特徴とする請求項1から3の何れか1項に記載の雑音検出装置。 Using the first information related to a frame different from the frame included in the impact sound section, replacing the second information related to the frame included in the shock sound section with the first information, or 4. The noise detection device according to claim 1, further comprising replacement means for interpolating a frame included in the impact sound section with information based on the first information. 5.
  5.  衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、
     前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する置換手段と、
     を備えることを特徴とする雑音抑圧装置。
    Detection from an acoustic signal including an impact sound for detecting an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the power is present in a wide band. Means,
    Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or Replacement means for interpolating frames included in the first section with information based on the first information;
    A noise suppression device comprising:
  6.  衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する検出手段と、
     前記初めの区間の信号を、予め用意された所定の信号に置換する、または、削除する置換手段と、
     を備えることを特徴とする雑音抑圧装置。
    Detection from an acoustic signal including an impact sound for detecting an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the power is present in a wide band. Means,
    Replacement means for replacing or deleting the signal of the first section with a predetermined signal prepared in advance;
    A noise suppression device comprising:
  7.  衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出し、
     前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出し、
     前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する、ことを特徴とする雑音検出方法。
    From the acoustic signal including the impact sound, a feature amount representing the steepness of the change of the acoustic signal is calculated for each frame obtained by dividing the acoustic signal into a predetermined time length,
    Based on the feature amount, a frame in which the signal change is sharper than the audio signal is detected as a start time of the impact sound section where the impact sound exists,
    Based on the feature amount, the last frame of the frames having a greater steep change of the signal than the audio signal is detected as the end time of the impact sound section, continuously from the start time. Noise detection method.
  8.  衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出し、
     前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する、ことを特徴とする雑音抑圧方法。
    From an acoustic signal including an impact sound, an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the first section where the power exists in a wide band is detected,
    Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or A noise suppression method, wherein a frame included in the first section is interpolated with information based on the first information.
  9.  衝撃音を含む音響信号から、前記音響信号の変化の急峻さを表す特徴量を、該音響信号を所定の時間長に分割したフレーム毎に算出する処理と、
     前記特徴量に基づいて、音声信号よりも、信号の変化の急峻さが大きいフレームを、前記衝撃音が存在する衝撃音区間の開始時刻として検出する処理と、
     前記特徴量に基づいて、前記開始時刻から継続して前記音声信号よりも信号の変化の急峻さが大きいフレームのうち最後のフレームを、前記衝撃音区間の終了時刻として検出する処理と、をコンピュータに実行させるプログラムを記憶する、コンピュータ読み取り可能な記録媒体。
    A process for calculating a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound for each frame obtained by dividing the acoustic signal into predetermined time lengths;
    Based on the feature amount, a process of detecting a frame having a greater signal change steepness than an audio signal as a start time of an impact sound section where the impact sound exists;
    A process of detecting, based on the feature amount, the last frame of the frames having a greater signal change steepness than the audio signal as the end time of the impact sound section continuously from the start time. A computer-readable recording medium for storing a program to be executed by the computer.
  10.  衝撃音を含む音響信号から、前記衝撃音の初めの区間であって、パワーが前記初めの区間に後続する後続区間よりも大きく、且つ、前記パワーが広帯域に存在する初めの区間を検出する処理と、
     前記初めの区間に含まれるフレームとは異なるフレームに関連する第1の情報を用いて、前記初めの区間に含まれるフレームに関連する第2の情報を前記第1の情報に置換する、または、前記初めの区間に含まれるフレームを前記第1の情報に基づいた情報で補間する処理と、を、コンピュータに実行させるプログラムを記憶する、コンピュータ読み取り可能な記録媒体。
    Processing for detecting an initial section in which the power is larger than a subsequent section following the first section and the power is present in a wide band from an acoustic signal including an impact sound When,
    Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or A computer-readable recording medium storing a program for causing a computer to execute a process of interpolating a frame included in the first section with information based on the first information.
PCT/JP2016/002839 2015-06-16 2016-06-13 Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium WO2016203753A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017524606A JPWO2016203753A1 (en) 2015-06-16 2016-06-13 Noise detection device, noise suppression device, noise detection method, noise suppression method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015-121229 2015-06-16
JP2015121229 2015-06-16

Publications (1)

Publication Number Publication Date
WO2016203753A1 true WO2016203753A1 (en) 2016-12-22

Family

ID=57545516

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/002839 WO2016203753A1 (en) 2015-06-16 2016-06-13 Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium

Country Status (2)

Country Link
JP (1) JPWO2016203753A1 (en)
WO (1) WO2016203753A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020039598A1 (en) * 2018-08-24 2020-02-27 日本電気株式会社 Signal processing device, signal processing method, and signal processing program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06130984A (en) * 1992-10-21 1994-05-13 Sanyo Electric Co Ltd Voice recognizing device
JP2001236085A (en) * 2000-02-25 2001-08-31 Matsushita Electric Ind Co Ltd Sound domain detecting device, stationary noise domain detecting device, nonstationary noise domain detecting device and noise domain detecting device
JP2008102551A (en) * 2007-12-27 2008-05-01 Sony Corp Apparatus for processing voice signal and processing method thereof
JP2011100082A (en) * 2009-11-09 2011-05-19 Nec Corp Signal processing method, information processor, and signal processing program
JP2012027186A (en) * 2010-07-22 2012-02-09 Sony Corp Sound signal processing apparatus, sound signal processing method and program
JP2012127701A (en) * 2010-12-13 2012-07-05 Sogo Keibi Hosho Co Ltd Device and method for sound detection
WO2015029546A1 (en) * 2013-08-30 2015-03-05 日本電気株式会社 Signal processing device, signal processing method, and signal processing program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06130984A (en) * 1992-10-21 1994-05-13 Sanyo Electric Co Ltd Voice recognizing device
JP2001236085A (en) * 2000-02-25 2001-08-31 Matsushita Electric Ind Co Ltd Sound domain detecting device, stationary noise domain detecting device, nonstationary noise domain detecting device and noise domain detecting device
JP2008102551A (en) * 2007-12-27 2008-05-01 Sony Corp Apparatus for processing voice signal and processing method thereof
JP2011100082A (en) * 2009-11-09 2011-05-19 Nec Corp Signal processing method, information processor, and signal processing program
JP2012027186A (en) * 2010-07-22 2012-02-09 Sony Corp Sound signal processing apparatus, sound signal processing method and program
JP2012127701A (en) * 2010-12-13 2012-07-05 Sogo Keibi Hosho Co Ltd Device and method for sound detection
WO2015029546A1 (en) * 2013-08-30 2015-03-05 日本電気株式会社 Signal processing device, signal processing method, and signal processing program

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020039598A1 (en) * 2018-08-24 2020-02-27 日本電気株式会社 Signal processing device, signal processing method, and signal processing program
JPWO2020039598A1 (en) * 2018-08-24 2021-08-12 日本電気株式会社 Signal processing equipment, signal processing methods and signal processing programs
JP7152112B2 (en) 2018-08-24 2022-10-12 日本電気株式会社 Signal processing device, signal processing method and signal processing program
US11769517B2 (en) 2018-08-24 2023-09-26 Nec Corporation Signal processing apparatus, signal processing method, and signal processing program

Also Published As

Publication number Publication date
JPWO2016203753A1 (en) 2018-04-19

Similar Documents

Publication Publication Date Title
US10411669B2 (en) Volume leveler controller and controlling method
JP5127754B2 (en) Signal processing device
US8775173B2 (en) Erroneous detection determination device, erroneous detection determination method, and storage medium storing erroneous detection determination program
JP4818335B2 (en) Signal band expander
EP3598448A1 (en) Apparatuses and methods for audio classifying and processing
US20140177853A1 (en) Sound processing device, sound processing method, and program
KR101616112B1 (en) Speaker separation system and method using voice feature vectors
JP2016180839A (en) Noise-suppressed speech recognition device and program therefor
JP5443547B2 (en) Signal processing device
JP4445460B2 (en) Audio processing apparatus and audio processing method
JP2006126859A5 (en)
WO2016203753A1 (en) Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium
US9697848B2 (en) Noise suppression device and method of noise suppression
JP6071944B2 (en) Speaker speed conversion system and method, and speed conversion apparatus
JP2020190606A (en) Sound noise removal device and program
CN113316075B (en) Howling detection method and device and electronic equipment
US20070269056A1 (en) Method and Apparatus for Audio Signal Expansion and Compression
JP6599408B2 (en) Acoustic signal processing apparatus, method, and program
JP6930089B2 (en) Sound processing method and sound processing equipment
JP2017009657A (en) Voice enhancement device and voice enhancement method
JP5272141B2 (en) Voice processing apparatus and program
JP2019029861A (en) Acoustic signal processing device, method and program
US11348596B2 (en) Voice processing method for processing voice signal representing voice, voice processing device for processing voice signal representing voice, and recording medium storing program for processing voice signal representing voice
JP6790851B2 (en) Speech processing program, speech processing method, and speech processor
JP6559576B2 (en) Noise suppression device, noise suppression method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16811228

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017524606

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16811228

Country of ref document: EP

Kind code of ref document: A1