WO2016203753A1

WO2016203753A1 - Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium

Info

Publication number: WO2016203753A1
Application number: PCT/JP2016/002839
Authority: WO
Inventors: 旭美梅松; 亮輔磯谷; 剛範辻川; 秀治古明地
Original assignee: 日本電気株式会社
Priority date: 2015-06-16
Filing date: 2016-06-13
Publication date: 2016-12-22
Also published as: JPWO2016203753A1

Abstract

Provided is a technology for suitably detecting an interval including an impact sound from an acoustic signal. A noise detection device comprises: a calculation unit for calculating, from the acoustic signal including the impact sound, a feature amount representing a gradient in an acoustic signal for each frame with a prescribed time length into which the acoustic signal is divided; a first detection unit for detecting, on the basis of the feature amount, a frame that has a larger gradient in the signal than a speech signal as the start time of an impact sound interval during which the impact sound exists; and a second detection unit for detecting, on the basis of the feature amount, the last frame having a larger gradient in the signal than the speech signal continuously from the start time as the end of the impact sound interval.

Description

Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium

The present invention relates to a noise detection device, a noise suppression device, a noise detection method, a noise suppression method, and a recording medium.

Currently, technologies for reducing noise from audio signals are being considered. For example, Patent Document 1 and Non-Patent Document 1 describe a technique for determining whether or not there is a sudden noise, and reducing the sudden noise if it exists.

Patent Document 2 describes that the presence of a sudden change in the input signal is determined based on the linearity of the phase component signal in the frequency domain.

Also, as an example of a method for extracting an audio file, for example, Patent Document 3 describes that audio information is extracted from reproduction information including music information.

An example of improving the quality of voice is described in Patent Document 4, for example.

Japanese Patent No. 4456504 International Publication No. 2014/136628 JP 2011-248202 A Japanese Patent No. 4098817 JP-A-9-331310

Sudden noise is, for example, an impact sound. The impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object.

In order to improve the speech signal recognition rate, it is necessary to set an appropriate length for a section for performing noise reduction processing (noise suppression processing) from the speech signal. This is because, by performing noise suppression processing for a long time, it may be suppressed as noise up to the correct speech section, and the recognition rate may be reduced.

However, in each of the above-mentioned patent documents and non-patent documents, there is no disclosure about detecting a section where an impact sound exists.

The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for more suitably detecting an impact sound section from an acoustic signal.

The noise detection device according to an aspect of the present invention calculates, from an acoustic signal including an impact sound, a feature amount representing a steep change in the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length. Calculating means; first detection means for detecting, as a start time of an impact sound section in which the impact sound is present, a frame in which a signal change is sharper than an audio signal based on the feature amount; Second detection means for detecting, based on the feature amount, the last frame among the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section; .

Further, the noise suppression device according to one aspect of the present invention is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and The detection means for detecting the first section where the power exists in a wide band and the first information related to the frame different from the frame included in the first section are used to relate to the frame included in the first section. Replacement means for replacing the second information to be replaced with the first information or interpolating a frame included in the first section with information based on the first information.

Further, the noise suppression device according to one aspect of the present invention is an acoustic signal including a shock sound, the first section of the shock sound, the power is greater than the subsequent section following the first section, and Detection means for detecting an initial section in which the power exists in a wide band, and replacement means for replacing or deleting a signal in the first section with a predetermined signal prepared in advance.

In addition, the noise detection method according to one aspect of the present invention provides, for each frame obtained by dividing a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound by dividing the acoustic signal into a predetermined time length. Calculated, based on the feature amount, detects a frame having a greater steep change of the signal than the audio signal as a start time of the impact sound section where the impact sound exists, and based on the feature amount, The last frame is detected as the end time of the impact sound section among the frames in which the signal change is sharper than the sound signal continuously from the start time.

Further, in the noise suppression method according to one aspect of the present invention, from an acoustic signal including an impact sound, the first interval of the impact sound, the power is greater than the subsequent interval following the first interval, and A second interval related to a frame included in the first interval is detected using a first information related to a frame different from a frame included in the initial interval, by detecting an initial interval in which the power exists in a wide band. Is replaced with the first information, or a frame included in the first section is interpolated with information based on the first information.

Note that a computer program that realizes each of the above apparatuses or methods by a computer and a computer-readable non-transitory recording medium in which the computer program is stored are also included in the scope of the present invention.

According to the present invention, it is possible to more suitably detect the section of the impact sound from the acoustic signal.

It is a figure which shows an example of the spectrogram of an impact sound. It is a functional block diagram which shows an example of a function structure of the noise detection apparatus which concerns on the 1st Embodiment of this invention. It is a functional block diagram which shows an example of a function structure of the noise detection apparatus which concerns on the 2nd Embodiment of this invention. It is a flowchart which shows an example of operation | movement of the noise detection apparatus which concerns on the 2nd Embodiment of this invention. It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 3rd Embodiment of this invention. It is a figure for demonstrating operation | movement of the replacement part in the noise suppression apparatus which concerns on the 3rd Embodiment of this invention. It is a flowchart which shows an example of operation | movement of the noise suppression apparatus which concerns on the 3rd form of this invention. It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 4th Embodiment of this invention. It is a functional block diagram which shows an example of a function structure of the noise suppression apparatus which concerns on the 5th Embodiment of this invention. It is a flowchart which shows an example of operation | movement of the noise suppression apparatus which concerns on the 5th form of this invention. It is a figure explaining hardware configuration of a computer which can realize each embodiment of the present invention exemplarily.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.

First, sudden noise will be described. The sudden noise is, for example, an impact sound. The impact sound is a sound generated when an object collides with the object, an explosion sound, or a sound generated when an instantaneous and sudden force is applied to the object. Further, the impact sound in each embodiment of the present invention is not limited to the above, for example, applause, the sound of falling coins, the sound of hitting a castanette, the sound of clap chopsticks, glass, plastic, metal, ceramic, wood, It may be a sound of hitting or hitting pottery and cans.

This impact sound will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of a spectrogram of an impact sound. In FIG. 1, the horizontal axis indicates time (seconds), and the vertical axis indicates frequency (kHz). As shown in FIG. 1, the impact sound includes a section in which the signal power is large and the power is present in a wide band, and a section in which the signal power is small and the power is present in a narrow band. In the present specification, the former section is referred to as a hitting section or a hitting section, and the latter section is referred to as an attenuation section or an attenuation section.

Thus, the impact sound includes the hitting section and the attenuation section. As described above, the hitting section has a large signal power and exists in a wide band. Therefore, compared with the case where the entire impact sound section is detected and the noise of the entire impact sound section is suppressed, it is more acoustic signal to detect the hitting section section and suppress the noise of the striking section section. Recognition rate can be improved. This is because noise with higher power can be suppressed, and a section in which noise suppression processing is performed can be shortened.

Therefore, in the following, a method for detecting the hitting section of the impact sound will be described.

<First Embodiment>
A first embodiment of the present invention will be described with reference to the drawings. In this embodiment, a basic configuration for solving the problems of the present invention will be described. FIG. 2 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 10 according to the present embodiment. As shown in FIG. 2, the noise detection apparatus 10 includes a calculation unit 11, a first detection unit 12, and a second detection unit 13.

The calculation unit 11 calculates, from the acoustic signal including the impact sound, a feature amount indicating the steepness of the change of the acoustic signal for each frame obtained by dividing the acoustic signal into a predetermined time length. The calculation unit 11 outputs the calculated feature amount to the first detection unit 12 and the second detection unit 13.

The first detection unit 12 receives the feature amount calculated for each frame from the calculation unit 11. Based on the received feature value, the first detection unit 12 selects a frame having a greater steep change of the signal than the audio signal in an impact sound section that is a section where the impact sound exists in the acoustic signal. Detect as start time. The first detection unit 12 outputs the detected start time of the impact sound section to the second detection unit 13.

The second detection unit 13 receives the feature amount calculated for each frame from the calculation unit 11. The second detection unit 13 receives the start time of the impact sound section from the first detection unit 12. Based on the received feature value, the second detection unit 13 detects the last frame of the frames having a greater signal steepness than the audio signal continuously from the start time as the end time of the impact sound section. To do.

As described above, the first detection unit 12 of the noise detection apparatus 10 according to the present embodiment detects the start time of the impact sound interval, and the second detection unit 13 detects the end time of the impact sound interval. . Thereby, the noise detection apparatus 10 which concerns on this Embodiment can detect the impact sound area which is an area where an impact sound exists among acoustic signals.

Here, the impact sound section is a striking section section where the power of the signal is large and the power exists in a wide band, and in that section, the signal changes more rapidly than the acoustic signal or section where only the sound exists. Thereby, the noise detection apparatus 10 which concerns on this Embodiment can detect the impact part area of an impact sound among acoustic signals more suitably.

<Second Embodiment>
Next, a second embodiment of the present invention based on the above-described first embodiment will be described with reference to the drawings. FIG. 3 is a functional block diagram illustrating an example of a functional configuration of the noise detection apparatus 100 according to the present embodiment. As illustrated in FIG. 3, the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, and a second detection unit 130.

(Calculation unit 110)
As shown in FIG. 3, the calculation unit 110 includes a conversion unit 111 and an index calculation unit (linearity calculation unit) 112. Further, the conversion unit 111 includes a frame division unit 1111, a windowing processing unit 1112, and a Fourier transform unit 1113. The index calculation unit 112 includes a change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123.

(Conversion unit 111)
The frame division unit 1111 of the conversion unit 111 receives an acoustic signal (also referred to as an input signal) from the outside of the noise detection device 100, for example. The frame dividing unit 1111 divides the received acoustic signal into frames in which one frame includes K samples. Here, K is assumed to be a positive even number. The frame division unit 1111 outputs signal samples, which are acoustic signals divided into frames, to the windowing processing unit 1112.

The window processing unit 1112 receives the signal sample from the frame division unit 1111. The windowing processing unit 1112 multiplies the received signal sample by the window function w (t). Hereinafter, performing multiplication with the window function is also referred to as windowing or windowing processing. Here, t indicates a time sample. If a signal sample of the nth frame (n is a natural number of 0 or more indicating a frame number) is x _n (t) (t = 0, 1,..., K−1), x _n (t) Thus, a signal sample (also referred to as a window signal) windowed by the window function w (t) can be calculated by the following equation (1).

Further, the windowing processing unit 1112 may window by overlapping (overlapping) a part of two consecutive frames.

As the window function w (t), for example, a Hanning window represented by the following equation (2) can be used.

In addition to this, the windowing processing unit 1112 may window using various window functions such as a Hamming window and a triangular window. The windowing processing unit 1112 outputs the windowing signal to the Fourier transform unit 1113.

The Fourier transform unit 1113 receives the windowing signal from the windowing processing unit 1112. The Fourier transform unit 1113 performs a Fourier transform on the received windowed signal. A signal spectrum X _n (k) (k is a frequency index (k = 1, 2,..., N)), which is a Fourier-transformed window signal, is expressed by the following equation (3).

In the above equation, j represents an imaginary unit, | X _n (k) | represents an amplitude spectrum, and p _n (k) represents a phase spectrum.

The Fourier transform unit 1113 separates the signal spectrum X _n (k) into a phase spectrum p _n (k) and an amplitude spectrum | X _n (k) |. The Fourier transform unit 1113 outputs the phase spectrum p _n (k) obtained by separating the signal spectrum X _n (k) to the index calculation unit 112 for each frame. Hereinafter, the phase spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as a phase component signal. The amplitude spectrum output by the Fourier transform unit 1113 in units of frames is also referred to as an amplitude component signal. In this way, by performing Fourier transform on the window signal, the Fourier transform unit 1113 can extract the phase component signal in the frequency domain from the acoustic signal.

In the present embodiment, the Fourier transform unit 1113 has been described with respect to Fourier transform of the windowed signal, but the present embodiment is not limited to this. The Fourier transform unit 1113 may perform, for example, Hadamard transform, Haar transform, wavelet transform, or the like on the windowed signal instead of Fourier transform.

(Indicator calculation unit 112)
Next, the index calculation unit 112 will be described. The change amount calculation unit 1121 of the index calculation unit 112 receives the phase component signal from the Fourier transform unit 1113 of the conversion unit 111 for each frame. A change amount calculation unit 1121, a difference calculation unit 1122, and a feature amount calculation unit 1123 described below perform processing in units of frames.

Using the received phase component signal, the change amount calculation unit 1121 calculates a phase component change amount Δp _n (k), which is a phase difference between adjacent frequency indexes (adjacent frequency bands), using the following equation (4). Use to calculate.

Then, the change amount calculation unit 1121 outputs the calculated change amount Δp _n (k) of the phase component to the difference calculation unit 1122.

The difference calculation unit 1122 receives the phase component change amount Δp _n (k) from the change amount calculation unit 1121. The difference calculation unit 1122 uses the received phase component variation Δp _n (k) to calculate the phase component variation ΔΔp _n (k) between adjacent frequency indexes using the following equation (5). To calculate.

Thereby, the difference calculation unit 1122 can obtain the variation of the change amount Δp _n (k) of the phase component along the frequency axis. Hereinafter, the change amount ΔΔp _n (k) of the change amount of the phase component is also referred to as a change amount difference ΔΔp _n (k).

When there is no difference between the phase component variation Δp _n (k) at a certain frequency index and the phase component variation Δp _n (k−1) at a frequency index adjacent to the certain frequency index, that is, in the frequency direction When p _n (k) is a linear function, the change amount difference ΔΔp _n (k) is zero.

The difference calculation unit 1122 outputs the calculated change amount difference ΔΔp _n (k) to the feature amount calculation unit 1123.

The feature amount calculation unit 1123 receives the change amount difference ΔΔp _n (k) from the difference calculation unit 1122. Then, the feature amount calculation unit 1123 averages the change amount differences ΔΔp _n (k) in all frequency indexes in the frame (in this case, the nth frame) for which the change amount difference ΔΔp _n (k) is obtained. Is calculated. The calculated average value is a phase feature amount in a frame for which the average value is calculated. Further, it can be said that the average value of the change amount difference ΔΔp _n (k) is the degree of variation (index indicating variation) of the phase component change amount Δp _n (k) in the frame.

A method of calculating the average value of the change amount difference ΔΔp _n (k) by the feature amount calculation unit 1123 will be further described. The feature amount calculation unit 1123 calculates the phase feature amount PL _n that is an average value of the change amount difference ΔΔp _n (k) using the following equation (6).

As shown in Expression (6), the feature amount calculation unit 1123 calculates a value obtained by subtracting from 1 an average value obtained by dividing the cosine of the change amount difference ΔΔp _n (k) by the number N of frequency indexes. The calculated value is defined as a phase feature amount PL _n . Note that, as described above, the phase feature amount PL _n is also an indicator representing the variation of the phase component variation Δp _n (k) in the frame, and is also referred to as an indicator PL _n .

The phase feature amount PL _n takes a value from 0 to 2. The phase characteristic amount PL _n is the phase spectrum p _{n (k)} represents the how close to a straight line, it can be said that the index indicating the linearity of the phase spectrum p _{n (k).} The smaller the variation of the phase component variation Δp _n (k) along the frequency axis is, the closer p _n (k) is to a linear function in the frequency axis direction. That is, p _n (k) has high linearity. At this time, the average value (phase feature amount PL _n ) of the change amount difference ΔΔp _n (k) takes a value closer to 0. Thus, it can be seen that the closer the value of the phase feature amount PL _n is to 0, the higher the linearity of the phase spectrum p _n (k).

In addition, the feature amount calculation unit 1123 may obtain a variance value instead of the average value as an index representing the variation of the phase component variation Δp _n (k) along the frequency axis. Also in this case, when the phase feature amount PL _n is a value closer to 0, it can be seen that the linearity of the phase spectrum p _n (k) is high.

Further, in the above description, the index calculation unit 112 has been described to obtain the phase feature amount PL _n by calculating the average value or the variance value of the change amount difference ΔΔp _n (k). It is not limited to. The index calculation unit 112 may obtain a regression line of the phase spectrum _pn (k) and calculate a deviation from the regression line. Thereby, the index calculation unit 112 can calculate the deviation from the regression line as the phase feature amount PL _n .

In the present embodiment, the feature amount calculation unit 1123 has been described as calculating the above-described phase feature amount PL _n as an index representing the variation of the phase component variation Δp _n (k) along the frequency axis. . This index may be the phase feature amount itself or information including the phase feature amount.

The feature amount calculation unit 1123 of the index calculation unit 112 outputs the calculated phase feature amount PL _n to the first detection unit 120 and the second detection unit 130. Information indicating the frame for which the phase feature amount PL _n is calculated is associated with the phase feature amount PL _n transmitted by the index calculation unit 112. The information indicating the frame is, for example, a frame number. In the present embodiment, description will be made _assuming that a frame number is associated with the phase feature amount PL _n .

(Storage unit 140)
The storage unit 140 stores a threshold value Th _start (first threshold value) and a threshold value Th _end (second threshold value). The storage unit 140 may be built in the noise detection device 100 or may be realized by a storage device separate from the noise detection device 100. Further, the threshold value Th _start and the threshold value Th _end may be stored in different storage units. The threshold Th _start and the threshold Th _end may be stored in a storage unit (not shown) in the first detection unit 120 and the second detection unit 130, respectively.

The threshold value Th _start is a value used when the first detection unit 120 described later detects the first detection point. The first detection point indicates a point in time when the power of the acoustic signal suddenly increases and starts to exist in a wide band in a short time of about several milliseconds to several tens of milliseconds.

The threshold value Th _end is a value used when the second detection unit 130 described later detects the second detection point. The second detection point indicates a point in time when the power of the acoustic signal suddenly decreases and begins to exist in a narrow band. The second detection point is a time after the first detection point described above.

Here, it is assumed that there is a section in which only the sound is included in the acoustic signal, and the above-described phase feature amount PL _n is calculated for the section. Hereinafter, the phase feature amount PL _n for the voice-only section is referred to as PL _speech . Note that PL _speech is not limited to the phase feature amount PL _n of the speech-only section, and may be, for example, the average of the phase feature amount PL _n calculated from each of a large amount of learning data. The training data, for example, of the sound, the data comprising a phase characteristic amount PL _n calculated for frames impact noise is not present, the phase characteristic amount PL _n calculated for frames comprising a section of the speech There may be other data.

The PL _speech may be, for example, a phase feature amount PL _n calculated in advance for a background noise section of an acoustic signal. Here, the background noise is, for example, a vehicle sound, a mechanical sound such as an air conditioner, a bubble noise, and a noise in which a plurality of these sounds overlap. The PL _speech may be a phase feature amount PL _n calculated from, for example, white noise, pink noise, or the like.

Here, a value indicating the degree of change (steepness) of the acoustic signal will be described. In the present embodiment, it is assumed that the value indicating the degree of change in the acoustic signal indicates a smaller value as the degree of change is larger (steepness is greater).

When the degree of change of the acoustic signal is large, generally, the linearity of the phase spectrum p _n (k) is high, and the value of the phase feature amount PL _n is small. Therefore, the phase feature amount is a value indicating the degree of change (steepness) of the acoustic signal.

In general, the acoustic signal of the impact sound striking portion changes more rapidly than the acoustic signal in which only the sound exists or the signal in the section in which only the sound exists. In addition, the change in signal is more abrupt at the start time of the hitting portion than at the end time. Therefore, the phase characteristic amount PL _n, smaller towards the end time of the striking part than the speech signal, a smaller value than the start time of the striking part. As a result, the following formula (7) is established for the threshold value Th _start , the threshold value Th _end, and the PL _speech .

Note that the threshold Th _start may be calculated based on the phase feature amount PL _n calculated using an acoustic signal including an impact sound as learning data. Further, an arbitrary value close to 0, for example, 0.1 may be set as the threshold Th _start .

The threshold value Th _end is a value calculated so as to satisfy the above formula (7) by using the previously calculated threshold value Th _start and PL _speech . By making the threshold value Th _start and threshold value Th _end smaller than PL _speech , it is possible to prevent erroneous detection of an impact sound due to a change in an audio signal, particularly a signal change in a consonant.

As described above, in the storage unit 140, for example, the threshold value Th _start and the threshold value Th _end that are calculated by the calculation unit 110 and satisfy the equation (7) are stored in advance.

(First detection unit 120)
The first detection unit 120 receives the phase feature amount PL _n from the calculation unit 110. The first detection unit 120 detects a first detection point from the received phase feature quantity PL _n . Specifically, the first detection unit 120 compares the value of the received phase feature quantity PL _n with the threshold value Th _start stored in the storage unit 140. When the phase feature amount PL _n is smaller than the threshold value Th _start , that is, when PL _n <Th _start is satisfied, the first detection unit 120 displays a frame indicated by the frame number associated with the PL _n. It is determined that this is the start frame of the hitting section.

Then, the first detection unit 120 acquires time information indicating the time of the frame from the frame determined to be the start frame. Here, the time information acquired by the first detection unit 120 may be a frame number, a start time of the frame, or other time included in the frame. . The first detection unit 120 detects a frame number or time indicated by the acquired time information as a first detection point. In the following description, the first detection point is described as a frame number.

When the acoustic signal is a signal in which the impact sound is not superimposed on the voice signal or background noise, that is, when the acoustic signal is the voice signal or background noise, the phase spectrum _pn (k) does not become a straight line. Therefore, the value of the phase feature amount PL _n in such an acoustic signal is larger than the phase feature amount PL _n when the impact sound is superimposed on the audio signal or background noise.

As described above, the first detection unit 120 can determine the start time of the impact sound hitting section by comparing the phase feature amount PL _n and the threshold Th _start .

And the 1st detection part 120 outputs the time information showing the detected 1st detection point to the 2nd detection part 130 as start time information of a hit | damage part area. As described above, in the present embodiment, since the first detection point is the frame number, the start time information of the hitting section represents the frame number. Since the time information indicating the first detection point is the start time information of the hitting section, the first detection point is hereinafter also referred to as the start time of the hitting section.

(Second detection unit 130)
The second detection unit 130 receives the phase feature amount PL _n from the calculation unit 110. Further, the second detection unit 130 receives the start time information of the hitting section from the first detection unit 120. Then, the second detector 130, a phase characteristic amount PL _n received, based on the start time information of the striking part section, detects the second detection point.

Specifically, the second detection unit 130 calculates the phase calculated with respect to the frame that is temporally later than the frame number represented by the start time information of the hitting unit section associated with the received phase feature amount PL _n. The feature amount PL _n is compared with the threshold value Th _end stored in the storage unit 140. Then, the second detection unit 130 determines whether or not the value of the phase feature quantity PL _n is larger than the threshold value Th _end , that is, whether Th _end <PL _n is satisfied, and the phase feature quantity PL _n is greater than the threshold value Th _end . When larger, the frame indicated by the frame number associated with the PL _n is specified. Then, the second detection unit 130 determines that the frame immediately before the identified frame is the end frame of the hitting section.

Then, the second detection unit 130 acquires time information indicating the time of the frame from the frame determined to be the end frame. Here, the time information acquired by the second detection unit 130 may be a frame number, a frame end time, or another time included in the frame. . The second detection unit 130 detects a frame number or time indicated by the acquired time information as a second detection point. In the following description, the second detection point is described as a frame number.

Of the acoustic signals, the hitting portion interval signal change continues steep state, the value of the phase characteristic amount PL _n is low condition persists. Then, when the hitting section ends, the change in the acoustic signal becomes gradual and the value of the phase feature amount increases. Thereby, the 2nd detection part 130 can specify that the flame | frame of determination object is a location immediately after completion | finish of a hit | damage part area, and can determine the 1st previous frame as an end point of a hit | damage part area. it can.

And the 2nd detection part 130 makes the time information showing the detected 2nd detection point the end time information of a hit | damage part area. The second detection point is also referred to as the end time of the hitting section.

Note that the second detection unit 130 may set the earlier one of the end time detected as described above and the time after an arbitrary time has elapsed from the start time (for example, one second later) as the end time. This is because in an actual environment, the end time detected by the second detection unit 130 may be considerably delayed from the start time due to the influence of reverberation and the like. Generally, the impact sound hitting section is often 1 second or less. Therefore, when the end time is not detected, for example, after 1 second from the start time, the second detection unit 130 may set the end time as the end time. Thereby, the noise detection apparatus 100 can reduce the misrecognition of the speech recognition due to the impact sound hitting section becoming longer.

And the 2nd detection part 130 is the start time of the hit | damage part area shown by the 1st detection point received from the 1st detection part 120, and the end time of the hit | damage part area shown by the 2nd detection point. From this, the hitting section is determined.

(Operation of the noise detection apparatus 100)
Next, the operation of the noise detection apparatus 100 according to the present embodiment will be described. FIG. 4 is a flowchart showing an example of the operation of the noise detection apparatus 100 according to the present embodiment.

The frame dividing unit 1111 of the converting unit 111 of the calculating unit 110 divides the acoustic signal into frames having a predetermined time length (step S41). The noise detection apparatus 100 sets flag to 0 and n to 0 as initial values (step S42). flag takes a value of 0 or 1. n is a variable indicating a frame number, and the upper limit is a number obtained by subtracting 1 from the number divided in step S41 (denoted as DIV).

Next, the windowing processing unit 1112 of the conversion unit 111 performs windowing processing on the signal samples included in the divided frames (step S43). After that, the Fourier transform unit 1113 of the transform unit 111 calculates the phase spectrum _pn (k) by performing Fourier transform on the signal sample that has been windowed for each frame (step S44).

Thereafter, the change amount calculation unit 1121 of the index calculation unit 112 of the calculation unit 110 calculates the change amount Δp _n (k) of the phase component (step S45). Then, the difference calculation unit 1122 of the index calculation unit 112 calculates a change amount difference ΔΔp _n (k) that is a change amount of the change amount of the phase component (step S46). Thereafter, the feature amount calculation unit 1123 of the index calculation unit 112 calculates a phase feature amount PL _n that is an index indicating the linearity of the phase spectrum p _n (k) (step S47).

Then, the first detection unit 120 determines whether or not the flag is 0 (step S48). If the flag is not 0 (NO in step S48), the process proceeds to step S54. If flag is 0 (YES in step S48), the process proceeds to step S49. This flag indicates whether or not the start time of the hitting section is detected. When it is 0, it indicates that it is not detected, and when it is 1, it indicates that it is detected.

If YES in step S48, that is, if the start time of the hitting section has not been detected, the first detection unit 120 determines whether the phase feature amount PL _n calculated in step S47 is smaller than the threshold Th _start . It is determined whether or not (step S49). When phase feature amount PL _n is _equal to or greater than threshold value Th _start (NO in step S49), noise detection apparatus 100 increments n (step S52) and determines whether or not incremented n is smaller than DIV (step S52). Step S53). If n is greater than or equal to DIV (NO in step S53), noise detection apparatus 100 ends the process. If n is smaller than DIV (YES in step S53), noise detection apparatus 100 returns the process to step S43. And the noise detection apparatus 100 performs the process of step S43 to step S48 with respect to the following flame | frame.

When the phase feature amount PL _n is smaller than the threshold value Th _start (YES in step S49), the first detection unit 120 detects the frame indicated by the frame number associated with the phase feature amount PL _n as the hitting unit section. Is detected as a start frame (start time) (step S50). And the noise detection apparatus 100 sets flag to 1 (step S51). And the noise detection apparatus 100 advances a process to step S52. Then, noise detection apparatus 100 increments n (step S52), and when incremented n is smaller than DIV (YES in step S53), processing from step S43 to step S48 is executed for the next frame.

If NO in step S48, that is, if the start time of the hitting section is detected, the second detection unit 130 determines whether the phase feature amount PL _n calculated in step S47 is greater than the threshold Th _end . It is determined whether or not (step S54). When phase feature amount PL _n is _equal to or smaller than threshold value Th _end (NO in step S54), noise detection apparatus 100 advances the process to step S52.

When the phase feature amount PL _n is larger than the threshold value Th _end (YES in step S54), the second detection unit 130 relates to the phase feature amount PL _n and is one frame before the frame indicated by the frame number. The frame is detected as the end frame (end time) of the hitting section (step S55).

And the 2nd detection part 130 determines a hit | damage part area from the start time of the hit | damage part area which the 1st detection part 120 detected in step S52, and the end time of the hit | damage part area detected in step S55. (Step S56). And the noise detection apparatus 100 substitutes 0 to flag (step S57), and advances a process to step S52.

Note that the noise detection apparatus 100 may sequentially receive acoustic signals and perform noise detection processing in real time. And the noise detection apparatus 100 may complete | finish a process, when there is no acoustic signal which has not performed the said process.

(effect)
As described above, according to the noise detection apparatus 100 according to the present embodiment, the same effects as those of the noise detection apparatus 10 according to the first embodiment described above can be achieved.

Moreover, according to the noise detection apparatus 100 according to the present embodiment, the first detection unit 120 compares the feature amount calculated by the calculation unit 110 with the first threshold value. Then, the first detection unit 120 calculates the feature amount when the steepness of the change in the acoustic signal represented by the feature amount is larger than the steepness of the change in the acoustic signal represented by the first threshold value. The frame is detected as the start time of the impact sound section. Further, the second detection unit 130 compares the feature amount calculated by the calculation unit 110 with the second threshold value. The second threshold value represents a steepness smaller than the steepness of the change in the acoustic signal represented by the first threshold value. When the steepness of the change in the acoustic signal represented by the feature amount is equal to or less than the steepness of the change in the acoustic signal represented by the second threshold, the second detection unit 130 The previous frame is detected as the end time of the impact sound section.

Thereby, the noise detection apparatus 100 can detect the time when the steep change of the acoustic signal starts and the time when it ends more accurately. The point in time when the abrupt change of the acoustic signal starts corresponds to the start time of the hitting section shown in FIG. Moreover, the time when the steep change of the acoustic signal ends corresponds to the end time of the hitting section. Therefore, the noise detection apparatus 100 can more accurately detect the start time and the end time of the hitting section shown in FIG. 1 among the impact sounds.

Further, when the calculation unit 110 calculates an index representing the linearity of the phase spectrum as the phase feature amount, the noise detection apparatus 100 can detect the start time and the end time of the hitting section. Thereby, the noise detection apparatus 100 can determine a hit | damage part area more suitably.

By suppressing the hitting section determined in this way, the noise detection apparatus 100 according to the present embodiment can further improve the recognition performance when performing speech recognition. For example, a scene where a voice is recognized when a store clerk is serving a customer at a store window or the like will be described. In this situation, when the store clerk is the target speaker and the store clerk's voice is the target voice, the store clerk talks while showing the catalog to the customer, or operates the keyboard and mouse to enter customer information. However, there are cases where customers come in and speak. In this case, since the object sound and the work sound generated by the target speaker are superimposed on the target voice, the voice recognition accuracy in the collected acoustic signal may be lowered.

However, by applying the noise detection apparatus 100 according to the present embodiment, it is possible to determine a section of impact sound such as work sound, in particular, a hitting section, and thus the apparatus for suppressing noise is determined. It is possible to perform processing for suppressing noise in the section. As a result, it is possible to extract a voice with suppressed noise, and thus it is possible to improve the recognition accuracy for this voice. As described above, since the working sound generated by the target speaker is noise that may occur in a scene where voice is collected, the determination of the hitting section by the noise detection device 100 should be preferably applied to the noise detection field. Can do.

In addition, the present invention can be applied to cases where the user views the collected sound by reducing noise, for example, touching a microphone or a large impact sound by noise suppression.

Further, for example, since the impact sound is a sound such as door opening / closing sound and applause, the noise detection apparatus 100 according to the present embodiment can detect an event such as door opening / closing and applause.

The noise detection apparatus 100 according to the present embodiment described above can be applied to, for example, detection of a section that suppresses noise such as a sound generated by a speaker when the target speaker's voice is desired. In addition, the noise detection apparatus 100 can suppress the superimposed noise from the target signals such as voice and music and the noise signal superimposed on them. In addition, the noise detection apparatus 100 can be applied to any other signal processing apparatus that is required to determine whether or not an input signal includes a rapidly changing section.

(Modification)
In the second embodiment, the feature amount calculation unit 1123 has been described as obtaining the phase feature amount PL _n by calculating the average value of the change amount difference ΔΔp _n (k) as the feature amount. In the present modification, description will be given of the case where the feature quantity calculating unit 1123 obtains the phase feature quantity PL _n by calculating the distribution of the change amount difference ΔΔp _n (k) as the feature quantity.

First, the feature amount calculation unit 1123 uses the change amount difference ΔΔp _n (k) received from the difference calculation unit 1122 to obtain a histogram in the frame in which the change amount difference ΔΔp _n (k) is calculated. At this time, the feature amount calculation unit 1123 obtains a histogram using the value of the change amount difference ΔΔp _n (k) as a bin.

Accordingly, when the distribution difference ΔΔp _n (k) is biased to a small value, the feature amount calculation unit 1123 can determine that the linearity of the phase spectrum is high. Then, the feature amount calculation unit 1123 may calculate the index PL _n based on this histogram.

In addition, the feature amount calculation unit 1123 determines an arbitrary frequency index range, for example, k-100 to k-1, k to k + 99, etc., and uses the frequency index value as a bin to calculate the change amount difference ΔΔp _n (k). A distribution may be obtained. Then, the feature quantity calculation unit 1123 may calculate the inter-distribution distance based on this distribution and calculate the index PL _n .

As described above, the feature amount calculation unit 1123 can calculate the feature amount based on the distribution, not the feature amount based on the average value of the change amount difference ΔΔp _n (k). And the noise detection apparatus 100 can determine a hit | damage part area suitably also using the feature-value calculated in this way.

<Third Embodiment>
Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a functional block diagram showing an example of a functional configuration of the noise suppression apparatus 200 according to the present embodiment. For convenience of explanation, members having the same functions as those included in the drawings described in the second embodiment described above are given the same reference numerals, and descriptions thereof are omitted.

As shown in FIG. 5, the noise suppression apparatus 200 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, and a replacement unit 210.

The functional configurations of the noise detection device 10 and the noise detection device 100 are the same as the functional configuration described with reference to FIG. 2 and FIG. In the following description, it is assumed that the noise suppression device 200 includes the noise detection device 100, but it goes without saying that the noise suppression device 200 may be configured to include the noise detection device 10.

The noise detection apparatus 100 outputs information indicating the hitting section to the replacing unit 210. Specifically, the noise detection apparatus 100 uses the information indicating the start time calculated by the first detection unit 120 and the end time calculated by the second detection unit 130 as information indicating the hitting section. It outputs to the substitution part 210 with the information which shows the acoustic signal used as the object which determines an impact part area.

The replacement unit 210 receives information indicating the hitting section from the noise detection device 100 together with information indicating the acoustic signal. Then, the replacement unit 210 receives an acoustic signal represented by information indicating the received acoustic signal, for example, from the outside of the noise suppression device 200. Then, the replacement unit 210 associates the time information of the received acoustic signal with the time information represented by the information indicating the hitting section interval received from the noise detection apparatus 100, and the start time of the hitting section in the received acoustic signal. And the end time.

Then, the replacement unit 210 replaces the signal of the frame included in the hitting section with the signal of the immediately preceding frame using the signal of the frame immediately before the frame indicated by the specified start time.

The operation of the replacement unit 210 will be further described with reference to FIG. FIG. 6 is a diagram for explaining the operation of the replacement unit 210. The horizontal axis shown in FIG. 6 indicates the frame number, and the vertical axis indicates the frequency (kHz). The upper diagram in FIG. 6 shows the acoustic signal before replacement, and the lower diagram in FIG. 6 shows the acoustic signal after replacement.

As shown in the upper part of FIG. 6, for example, the hitting section determined by the noise detection device 100 is a section from the nth frame to the n + 1th frame. Thereby, it can be seen that the start time of the hitting section is the nth frame and the end time is the (n + 1) th frame.

Then, the replacement unit 210, the n-th frame is the signal samples of the n + 1 frame _x n (t) and _{x n +} 1 a (t), the n-1 frame signal samples _{x n} which is the immediately preceding frame start time _-1 Replace with (t). As a result, as shown in the lower part of FIG. 6, the signals of the nth frame and the (n + 1) th frame are replaced with the same signals as the signals of the (n−1) th frame.

In the above description, the replacement unit 210 replaces the signal of the hitting section with the signal of the frame immediately before the start time of the hitting section, but the present embodiment is not limited thereto. Is not to be done. The replacement unit 210 may replace the feature amount of the hitting section with the feature amount of the frame immediately before the start time of the hitting section. This feature amount may be, for example, a mel frequency cepstrum coefficient generally used for speech recognition, a mel logarithmic spectrum, or the like, or other feature amount. As described above, the replacement unit 210 uses information related to a frame different from the frame of the hitting section (for example, a signal of the frame other than the hitting section, a feature amount, etc.) and relates to the frame of the hitting section. The information is replaced with information related to a frame different from the frame of the hitting section.

Further, the signal that the replacement unit 210 replaces the signal of the striking section may be a signal of a frame immediately before the start time of the striking section, or a signal of a frame immediately after the end time of the striking section. Also good. The replacement signal may be a signal of a frame immediately before the start time of the hitting section and a signal of a frame immediately after the end time of the hitting section. For example, the replacement unit 210 calculates the center time of the hitting section, replaces the signal of the frame before the calculated center time with the signal of the frame immediately before the start time of the hitting section, and is later than the calculated center time. The signal of the frame may be replaced with the signal of the frame immediately after the end time of the hitting section. At this time, the time calculated by the replacement unit 210 may not be the central time, and may be an arbitrary time.

Further, the replacement unit 210 calculates the signal of the striking section using the signal of the frame immediately before the start time of the striking section and the signal of the frame immediately after the end time of the striking section, and the striking section May be interpolated with the calculated signal. For example, the replacement unit 210 adds an arbitrary weight to the signal of the frame immediately before the start time of the hitting section and the signal of the frame immediately after the end time of the hitting section, thereby adding the signal of the hitting section. And the striking section may be interpolated with the calculated signal.

Further, the replacement unit 210 may replace the signal of the hitting section with a noise such as a zero signal or white noise.

Further, the replacement unit 210 may delete the signal of the hitting section and generate a signal that connects the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.

Further, the replacement unit 210 may detect a predetermined number of frames from the frame immediately after the end time of the hitting section as the impact sound attenuation section and perform further noise suppression processing.

(Operation of noise suppression apparatus 200)
Next, the operation of the noise suppression apparatus 200 according to the present embodiment will be described using FIG. FIG. 7 is a flowchart showing an example of the operation of the noise suppression apparatus 200 according to the present embodiment.

First, the noise detection device 100 of the noise suppression device 200 performs a striking section determination process for determining a striking section (step S71). This step S71 indicates that the processes of steps S41 to S56 described with reference to FIG. 4 are performed.

Next, the replacement unit 210 identifies the frame immediately before the start time of the hitting section determined in step S71 (step S72). Then, the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame (step S73). Thereby, the replacement unit 210 can suppress noise in the hitting section of the acoustic signal. Thus, the noise suppression device 200 ends the process.

As described above, according to the noise suppression apparatus 200 according to the present embodiment, the noise in the striking section is suppressed by replacing the signal in the striking section with a signal of a frame different from the frame in the striking section. can do.

In addition, about 0.05 second is preferable as the length of the striking section when the replacement section 210 performs the replacement. This is because, when speech recognition of an acoustic signal subjected to noise suppression processing is performed, the speech recognition rate can be improved when the replacement interval is shorter.

Thereby, the noise suppression apparatus 200 can obtain an effect that noise can be further suppressed in addition to the effect according to the second embodiment described above.

In the present embodiment, the replacement unit 210 included in the noise suppression device 200 has been described as an example of a configuration different from that of the noise detection device 100. However, the present embodiment is not limited to this. It is not something. The replacement unit 210 may be built in the noise detection apparatus 100. In this case, the noise detection apparatus 100 includes a calculation unit 110, a first detection unit 120, a second detection unit 130, a storage unit 140, and a replacement unit 210. Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 200 according to the present embodiment.

<Fourth embodiment>
Next, a fourth embodiment of the present invention will be described with reference to the drawings. FIG. 8 is a functional block diagram illustrating an example of a functional configuration of the noise suppression apparatus 300 according to the present embodiment. For convenience of explanation, members having the same functions as those included in the drawings described in the second and third embodiments described above are denoted by the same reference numerals and description thereof is omitted.

As shown in FIG. 8, the noise suppression apparatus 300 includes the noise detection apparatus 10 described in the first embodiment or the noise detection apparatus 100 described in the second embodiment, a replacement unit 210, and a waveform conversion unit 310. And.

The waveform conversion unit 310 receives from the replacement unit 210 the signal on which the replacement unit 210 has performed suppression processing. Specifically, the waveform converter 310 receives the signal after the replacement unit 210 replaces the signal of the frame corresponding to the hitting section section of the acoustic signal with the signal of the identified frame. The specified frame is, for example, a frame immediately before the start time of the hitting section in the acoustic signal.

Then, the waveform converter 310 converts the received signal into a form usable by the user. Specifically, the waveform converter 310 converts the received signal into a waveform that can be viewed and heard by the user.

For example, when the received signal is a spectrum signal in the frequency domain acquired by performing Fourier transform, the waveform converting unit 310 performs inverse Fourier transform on the received signal, thereby converting the received signal into a waveform. Convert to

Thereby, in the noise suppression device 300, the waveform converter 310 can display the waveform on a display device (not shown). Thereby, the noise suppression apparatus 300 can present to the user an acoustic signal in a state where the user can use it, and the noise is suppressed.

In the present embodiment, the waveform conversion unit 310 included in the noise suppression device 300 has been described as an example of a configuration different from that of the noise detection device 100. However, the present embodiment is not limited to this. Is not to be done. The waveform conversion unit 310 may be built in the noise detection apparatus 100. Such a noise detection apparatus 100 can obtain the same effect as the noise suppression apparatus 300 according to the present embodiment.

<Fifth embodiment>
Next, with reference to FIG. 9 and FIG. 10, the noise suppression apparatus 400 in this Embodiment is demonstrated.

In order to improve the speech signal recognition rate, it is necessary to appropriately perform processing (noise suppression processing) for reducing noise from the speech signal. This is because if the noise suppression process is insufficient, noise remains superimposed on the audio signal, and the recognition rate of the audio signal is reduced. Moreover, if the noise suppression process is excessively performed, even necessary speech is suppressed as noise, and the recognition rate of the speech signal is reduced.

Therefore, an object of the present embodiment is to more effectively perform noise suppression processing of an audio signal.

FIG. 9 is a functional block diagram illustrating an example of a functional configuration of the noise suppression device 400 according to the present embodiment. As shown in FIG. 9, noise suppression apparatus 400 according to the present embodiment includes detection section 410 and replacement section 420.

The detection unit 410 detects the first section of the impact sound from the acoustic signal including the impact sound. This first section is a section where the power is larger than the subsequent section following the first section and the power exists in a wide band. The first section detected by the detection unit 410 is the hitting section described with reference to FIG. The detection unit 410 is realized by, for example, the noise detection apparatus 100 in each of the above-described embodiments. At this time, similarly to the second embodiment, the noise detection apparatus 100 may detect the hitting section using an index (phase feature amount PL _n ) indicating the linearity of the phase spectrum. In addition, the detection unit 410 is not limited to that realized by the noise detection device in each of the above-described embodiments. For example, a sudden change in volume, a change in magnitude of an amplitude feature, a power spectrum feature, a time change thereof, The flatness of the spectrum may be calculated as a feature amount, and the hitting section may be detected using the calculated feature amount. Moreover, the detection part 410 may detect a hit | damage part area | region using what combined the said feature-value two or more as a feature-value. Thus, the method in which the detection part 410 detects a hit | damage part area is not specifically limited.

The detection unit 410 outputs the detected information indicating the hitting section to the replacement unit 420.

The replacement unit 420 acquires section information indicating the hitting section from the detection unit 410. Then, the replacement unit 420 specifies a hitting unit section indicated by the received section information in the acoustic signal. Then, a frame different from the frame included in the specified section is specified as a frame for replacing information. The frame that the replacement unit 420 specifies as a frame for replacing information may be, for example, the frame immediately before the start time of the hitting unit section, similarly to the replacement unit 210 in the third embodiment described above. Further, the frame specified by the replacement unit 420 as a frame for replacing information may be, for example, a frame immediately after the end time of the hitting section.

Then, the replacement unit 420 replaces the second information related to the frame included in the hitting section with the first information using the specified first information related to the frame for replacing the information. Here, when the information related to the frame is, for example, an acoustic signal (signal sample) included in the frame, the replacement unit 420 replaces the signal sample of the frame included in the striking unit section with the signal sample of the specified frame. To do.

Note that, similarly to the replacement unit 210 in the third embodiment described above, the replacement unit 420 uses the first information related to the specified frame to replace information to identify the frame included in the hitting unit section. Interpolation may be performed using information based on the information of 1.

Further, the replacement unit 420 may replace the signal of the hitting section with a noise such as a zero signal or white noise.

Further, the replacement unit 420 may delete the signal of the hitting section and generate a signal connecting the frame immediately before the start time of the hitting section and the frame immediately after the end time of the hitting section.

(Operation of noise suppression device 400)
Next, the operation of the noise suppression apparatus 400 according to the present embodiment will be described using FIG. FIG. 10 is a flowchart showing an example of the operation of the noise suppression apparatus 400 according to the present embodiment.

As shown in FIG. 10, first, the detection unit 410 detects a hitting section (step S101). The process of step S101 may be the same process as step S71 of FIG.

Next, the replacement unit 420 specifies a frame for replacing information with the frame in the section detected in step S101 (step S102). Then, the replacement unit 420 replaces the second information related to the frame corresponding to the detected section of the acoustic signal with the first information related to the identified frame (step S103). Thereby, the replacement part 420 can suppress the noise of the impact part hit | damage part area among acoustic signals. Thus, the noise suppression device 400 ends the process.

As described above, according to the noise suppression apparatus 400 according to the present embodiment, the second information related to the frame of the impact sound hitting section is stored in a frame different from the frame of the impact sound hitting section. It can be replaced with related first information. Thereby, the noise suppression apparatus 400 can suppress the noise in the impact sound hitting section.

Thereby, the noise suppression device 400 can obtain an effect that the noise suppression processing of the voice signal can be performed more effectively.

(About hardware configuration)
Each part of the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 is the same as the hardware shown in FIG. It may be realized with hardware resources. That is, the configuration shown in FIG. 11 includes a RAM (Random Access Memory) 91, a ROM (Read Only Memory) 92, a communication interface 93, a storage medium 94, and a CPU (Central Processing Unit) 95. The CPU 95 reads out various software programs (computer programs) stored in the ROM 92 or the storage medium 94 to the RAM 91 and executes them, so that the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400) are executed. It governs overall operation. That is, in each of the above embodiments, the CPU 95 executes each function (each unit) included in the noise detection device (10, 100) and the noise suppression device (200, 300, 400) while referring to the ROM 92 or the storage medium 94 as appropriate. Execute the software program to be executed.

Further, the present invention described by taking each embodiment as an example supplied a computer program capable of realizing the functions described above to the noise detection devices (10, 100) and the noise suppression devices (200, 300, 400). Thereafter, the computer program is read out by the CPU 95 to the RAM 91 and executed.

The supplied computer program may be stored in a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device. In such a case, the present invention can be understood as being configured by a code representing the computer program or a storage medium storing the computer program.

In each of the above-described embodiments, the noise detection device (10, 100) shown in FIGS. 2 and 3 and the noise suppression device (200, 300, 400) shown in FIGS. 5, 8, and 9 are shown in each block. The case where the function is realized by a software program has been described as an example executed by the CPU 95 shown in FIG. However, some or all of the functions shown in the blocks shown in FIGS. 2, 3, 5, 8, and 9 may be realized as hardware circuits.

Each of the above-described embodiments is a preferred embodiment of the present invention, and the scope of the present invention is not limited only to the above-described embodiments, and those skilled in the art do not depart from the gist of the present invention. However, it is possible to construct a form in which various modifications are made by correcting or substituting the above-described embodiments.

Some or all of the above embodiments can be described as in the following supplementary notes, but are not limited thereto.

(Additional remark 1) The calculation means which calculates the feature-value showing the steepness of the change of the said acoustic signal from the acoustic signal containing an impact sound for every flame | frame which divided | segmented this acoustic signal into predetermined time length, and the said feature-value First detection means for detecting a frame having a greater signal change steepness than the audio signal as a start time of an impact sound section in which the impact sound exists, and based on the feature amount, the start Second detection means for detecting the last frame of the frames whose signal change is more steep than the audio signal continuously from the time as the end time of the impact sound section. Noise detection device.

(Supplementary Note 2) The first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value. A frame in which the feature value is calculated is detected as a start time of the impact sound section, and the second detection unit is configured to detect the feature value and the first sound value. The second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value. The supplementary note 1, wherein a frame immediately before the frame for which the feature amount is calculated is detected as an end time of the impact sound section when the acoustic signal is represented by Noise detection device.

(Additional remark 3) The said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise detection apparatus according to

appendix

1 or 2, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.

(Additional remark 4) The said linearity calculation means calculates the linearity of the said phase spectrum using the value based on the dispersion | variation in the difference of the said phase spectrum and the phase spectrum in the adjacent frequency band adjacent to the frequency band of this phase spectrum. The noise detection device according to supplementary note 3, wherein the noise detection device is calculated.

(Supplementary Note 5) Using the first information related to a frame different from the frame included in the impact sound section, the second information related to the frame included in the shock sound section is replaced with the first information. Or noise detection according to any one of appendices 1 to 4, further comprising replacement means for interpolating a frame included in the impact sound section with information based on the first information. apparatus.

(Additional remark 6) From the acoustic signal containing an impact sound, it is the first area of the said impact sound, and the power is larger than the subsequent area which follows the said first area, and the said area where the said power exists in a wide band. Second information related to the frame included in the first section is detected using the detection means for detecting the first information related to the frame different from the frame included in the first section. A noise suppression device comprising: replacement means for replacing with information or interpolating a frame included in the first section with information based on the first information.

(Additional remark 7) From the acoustic signal containing impact sound, it is the first section of the said shock sound, Comprising: The first section where the power is larger than the subsequent section which follows the said first section, and the said power exists in a wide band A noise suppression apparatus, comprising: a detection unit that detects the signal and a replacement unit that replaces or deletes the signal in the first section with a predetermined signal prepared in advance.

(Additional remark 8) The said detection means calculates the feature-value showing the steepness of the change of the said acoustic signal from the said acoustic signal for every flame | frame which divided | segmented this acoustic signal into predetermined time length, The said characteristic First detection means for detecting a frame having a greater signal change steepness than the audio signal based on the amount as the start time of the first section, and continuing from the start time based on the feature amount Or a second detection means for detecting the last frame of the frames whose signal change is sharper than the audio signal as the end time of the first section. 8. The noise suppression device according to 7.

(Supplementary Note 9) The first detection unit compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is represented by the first threshold value. A frame in which the feature amount is calculated is detected as a start time of the first section, and the second detection unit is configured to detect the feature amount and the first The second threshold value representing a steepness smaller than the steepness of the change of the acoustic signal represented by the threshold value is compared, and the steepness of the change of the acoustic signal represented by the feature amount is the second threshold value. 9. The supplementary note 8, wherein a frame immediately before the frame for which the feature amount has been calculated is detected as an end time of the first section when the change in the acoustic signal represented by is smaller than the steepness of the change. Noise suppression device.

(Additional remark 10) The said calculation means is provided with the conversion means which converts the said acoustic signal into a phase spectrum, and the linearity calculation means which calculates the linearity of the said phase spectrum, The said linearity calculation means calculated, The noise suppression device according to appendix 8 or 9, wherein an index representing linearity of a phase spectrum is calculated as the feature amount.

(Additional remark 11) The said linearity calculation means calculates | requires the linearity of the said phase spectrum using the value based on the dispersion | variation in the difference of the said phase spectrum and the phase spectrum in the adjacent frequency band adjacent to the frequency band of this phase spectrum. The noise suppression device according to appendix 10, wherein the noise suppression device is calculated.

(Additional remark 12) From the acoustic signal including the impact sound, a feature amount representing the steepness of the change of the acoustic signal is calculated for each frame obtained by dividing the acoustic signal into a predetermined time length, and based on the feature amount, A frame in which the change of the signal is sharper than that of the audio signal is detected as a start time of the impact sound section where the impact sound exists, and based on the feature amount, the frame is continuously detected from the start time. A noise detection method comprising: detecting a last frame of frames having a large signal change steepness as an end time of the impact sound section.

(Additional remark 13) From the acoustic signal containing the impact sound, it is the first section of the impact sound, the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band And the second information related to the frame included in the first section is replaced with the first information using the first information related to the frame different from the frame included in the first section. Or interpolating a frame included in the first section with information based on the first information.

(Additional remark 14) From the acoustic signal containing the impact sound, it is the first section of the impact sound, and the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band. , And the signal in the first section is replaced with a predetermined signal prepared in advance or deleted.

(Additional remark 15) Based on the said feature-value, the process which calculates the feature-value showing the steepness of the change of the said acoustic signal from the acoustic signal containing an impact sound for every flame | frame which divided | segmented this acoustic signal into predetermined time length Then, a process of detecting a frame having a greater signal change steepness than the audio signal as the start time of the impact sound section in which the impact sound exists, and continuously from the start time based on the feature amount A program that causes a computer to execute a process of detecting a last frame of frames having a greater signal change steepness than the audio signal as an end time of the impact sound section.

(Additional remark 16) From the acoustic signal containing the impact sound, the first section of the impact sound, the power is larger than the subsequent section following the first section, and the first section where the power exists in a wide band And the second information related to the frame included in the first section is converted into the first information using the first information related to the frame different from the frame included in the first section. Or a program for causing a computer to execute a process of interpolating a frame included in the first section with information based on the first information.

(Supplementary Note 17) From an acoustic signal including an impact sound, the first interval of the impact sound, the power being greater than the subsequent interval following the initial interval, and the first interval where the power exists in a wide band And a program for causing a computer to execute a process of detecting a signal and a process of replacing or deleting the signal of the first section with a predetermined signal prepared in advance.

This application claims priority based on Japanese Patent Application No. 2015-121229 filed on June 16, 2015, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 10 Noise detection apparatus 11 Calculation part 12 1st detection part 13 2nd detection part 100 Noise detection apparatus 110 Calculation part 111 Conversion part 1111 Frame division part 1112 Windowing process part 1113 Fourier transform part 112 Index calculation part 1121 Change amount calculation Unit 1122 difference calculation unit 1123 feature quantity calculation unit 120 first detection unit 130 second detection unit 140 storage unit 200 noise suppression device 210 replacement unit 300 noise suppression device 310 waveform conversion unit 400 noise suppression device 410 detection unit 420 replacement unit

Claims

A calculation means for calculating a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound for each frame obtained by dividing the acoustic signal into predetermined time lengths;
First detection means for detecting, based on the feature amount, a frame having a greater change in signal than the audio signal as a start time of an impact sound section in which the impact sound exists;
Second detection means for detecting, based on the feature amount, the last frame among the frames having a greater signal change steepness than the audio signal continuously from the start time as an end time of the impact sound section. And a noise detection device comprising:
The first detection means compares the feature quantity with a first threshold value, and the steepness of the change in the acoustic signal represented by the feature quantity is the acoustic signal represented by the first threshold value. When the change is greater than the steepness of the change, the frame in which the feature amount is calculated is detected as the start time of the impact sound section,
The second detection means compares the feature amount with a second threshold value representing a steepness smaller than the steepness of the change in the acoustic signal represented by the first threshold value, and determines the feature amount according to the feature amount. When the steepness of the change of the acoustic signal represented is smaller than the steepness of the change of the acoustic signal represented by the second threshold value, the frame immediately before the frame in which the feature amount is calculated is defined as the impact sound section. The noise detection device according to claim 1, wherein the noise detection device is detected as an end time.
The calculating means includes
Conversion means for converting the acoustic signal into a phase spectrum;
Linearity calculating means for calculating the linearity of the phase spectrum,
The noise detection apparatus according to claim 1, wherein an index representing linearity of the phase spectrum calculated by the linearity calculation unit is calculated as the feature amount.
Using the first information related to a frame different from the frame included in the impact sound section, replacing the second information related to the frame included in the shock sound section with the first information, or 4. The noise detection device according to claim 1, further comprising replacement means for interpolating a frame included in the impact sound section with information based on the first information. 5.
Detection from an acoustic signal including an impact sound for detecting an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the power is present in a wide band. Means,
Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or Replacement means for interpolating frames included in the first section with information based on the first information;
A noise suppression device comprising:
Detection from an acoustic signal including an impact sound for detecting an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the power is present in a wide band. Means,
Replacement means for replacing or deleting the signal of the first section with a predetermined signal prepared in advance;
A noise suppression device comprising:
From the acoustic signal including the impact sound, a feature amount representing the steepness of the change of the acoustic signal is calculated for each frame obtained by dividing the acoustic signal into a predetermined time length,
Based on the feature amount, a frame in which the signal change is sharper than the audio signal is detected as a start time of the impact sound section where the impact sound exists,
Based on the feature amount, the last frame of the frames having a greater steep change of the signal than the audio signal is detected as the end time of the impact sound section, continuously from the start time. Noise detection method.
From an acoustic signal including an impact sound, an initial section of the impact sound, the power of which is greater than the subsequent section following the first section, and the first section where the power exists in a wide band is detected,
Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or A noise suppression method, wherein a frame included in the first section is interpolated with information based on the first information.
A process for calculating a feature amount representing a steep change in the acoustic signal from an acoustic signal including an impact sound for each frame obtained by dividing the acoustic signal into predetermined time lengths;
Based on the feature amount, a process of detecting a frame having a greater signal change steepness than an audio signal as a start time of an impact sound section where the impact sound exists;
A process of detecting, based on the feature amount, the last frame of the frames having a greater signal change steepness than the audio signal as the end time of the impact sound section continuously from the start time. A computer-readable recording medium for storing a program to be executed by the computer.
Processing for detecting an initial section in which the power is larger than a subsequent section following the first section and the power is present in a wide band from an acoustic signal including an impact sound When,
Using the first information related to a frame different from the frame included in the first section, replacing the second information related to the frame included in the first section with the first information, or A computer-readable recording medium storing a program for causing a computer to execute a process of interpolating a frame included in the first section with information based on the first information.