JP2012215606A - Sound source separating device, program, and method - Google Patents

Sound source separating device, program, and method Download PDF

Info

Publication number
JP2012215606A
JP2012215606A JP2011079026A JP2011079026A JP2012215606A JP 2012215606 A JP2012215606 A JP 2012215606A JP 2011079026 A JP2011079026 A JP 2011079026A JP 2011079026 A JP2011079026 A JP 2011079026A JP 2012215606 A JP2012215606 A JP 2012215606A
Authority
JP
Japan
Prior art keywords
sound
target sound
section
target
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2011079026A
Other languages
Japanese (ja)
Other versions
JP5772151B2 (en
Inventor
Katsuyuki Takahashi
Shinsuke Takada
克之 高橋
真資 高田
Original Assignee
Oki Electric Ind Co Ltd
沖電気工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Ind Co Ltd, 沖電気工業株式会社 filed Critical Oki Electric Ind Co Ltd
Priority to JP2011079026A priority Critical patent/JP5772151B2/en
Publication of JP2012215606A publication Critical patent/JP2012215606A/en
Application granted granted Critical
Publication of JP5772151B2 publication Critical patent/JP5772151B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Abstract

PROBLEM TO BE SOLVED: To suppress degradation of sound quality after separation processing in processing for separating a target sound and noise arriving from an arbitrary direction other than the arrival direction of the target sound from an input signal.
The present invention relates to a sound source separation device that performs sound source separation processing for separating noise and a target sound from an input signal. Then, the sound source separation device obstructs at least one frequency component in each section of the received sound signal, a means for forming a plurality of target sound dominant spectrum candidates from the received sound signal, a means for forming a noise dominant spectrum from the received sound signal, and A means for determining the reliability of the determination result when sound determination is performed, and selecting a target sound dominant spectrum candidate for each section to form a target sound dominant spectrum, and at least in the section The selection method determined using the determination result of the reliability determination includes means for applying the selection process for the section, and means for separating the target sound component using the noise spectrum and the target sound dominant spectrum. It is characterized by that.
[Selection] Figure 1

Description

  The present invention relates to a sound source separation device, a program, and a method, and can be used for acoustic signal processing in, for example, a telephone or a video conference system.

  In recent years, voice communication devices such as video conferencing equipment and mobile phones have formed directivity using a microphone array so that the voice of a desired speaker can be collected selectively in order to improve sound quality. The number of devices that can collect sound is increasing.

  A method of forming directivity using a microphone array is known, and a method using a delay subtraction process will be described below as an example.

  FIG. 10 is a block diagram showing an example of a functional configuration of a conventional delay subtraction type microphone array.

  In this specification, a vertical plane with respect to a line connecting the two microphones m1 and m2 is called a direction of 0 degree, and the direction is expressed with a clockwise direction as a positive angle and a counterclockwise direction as a negative angle. . That is, the above-described direction is expressed in a range of −180 degrees to 180 degrees (−180 degrees and 180 degrees are the same direction). In the following, it is assumed that the direction of 0 degrees is the front, the direction of 90 degrees is the right direction, the direction of -90 degrees is the left direction, and the direction of 180 degrees (-180 degrees) is the rear.

  It is assumed that sound waves arrive from the direction θ illustrated in FIG. 10 and that the microphone m1 and the microphone m2 are separated by a distance l. At this time, there is a time difference τ until the sound waves reach the microphones m1 and m2. Assuming that the sound path difference is d, d = 1 × sin θ, so this arrival time difference τ can be expressed by the following equation (1). However, in the following formula (1), c represents the speed of sound.

τ = 1 × sin θ / c (1)
The signal s1 (t−τ) obtained by delaying s1 (t) by τ calculated by the above equation (1) can be said to be the same signal as s2 (t). Therefore, the signal y (t) = s2 (t) −s1 (t−τ) taking the difference between them is a signal from which the sound coming from the θ direction is removed. As a result, the microphone array shown in FIG. 10 has directivity characteristics as shown in FIG.

  As shown in FIG. 11, the microphone array shown in FIG. 10 functions as a filter (spatial filter) that removes sound arriving from the θ direction. In other words, in this microphone array, sound arriving from the θ direction is suppressed by directing the directivity of the filter in the θ direction. Hereinafter, in the microphone array, the direction in which the sound is suppressed is also referred to as “blind spot”.

Although the calculation in the time domain has been described here, the same effect can be obtained even if it is performed in the frequency domain. The arithmetic expression in this case is as the following expression (2).

  In the following equation (2), Y (f) is a signal obtained by converting y (t) into the frequency domain. X1 (f) is a signal obtained by converting s1 (t) into the frequency domain. Further, X2 (f) is a signal obtained by converting s2 (t) into the frequency domain. Furthermore, S is a sampling frequency. N is the FFT (Fast Fourier Transform) analysis frame length. Further, τ is a difference in sound wave arrival time between microphones. Furthermore, i is an imaginary unit.

  By the way, only the conventional microphone array technique as shown in FIG. 10 is insufficient in the effect of suppressing the background noise. As one of the techniques for improving this point, there is a sound source separation device of Patent Document 1.

  A configuration example of a conventional sound source separation device will be described with reference to FIG. In order to simplify the description below, the number of input microphones is 2 ch, but is not necessarily limited to this setting.

  As shown in FIG. 12, the conventional sound source separation device E10 includes an FFT unit E11, a first directivity forming unit E12, a second directivity forming unit E13, a third directivity forming unit E14, a target sound selecting unit E15, It has a frequency subtraction unit E16 and an IFFT unit E17.

  In the present specification, “target sound” refers to the sound produced by the user (speaker) of the device (sound source separation device), and “disturbance sound” refers to the sound emitted by a person other than the user of the device. , Background noise such as office noise is `` background sound '', background noise and interference sound are combined into `` noise '', and all signals input from the microphone without distinction of target sound, interference sound and background sound are `` input signal '' Shall be called. The target sound will be described assuming that the target sound arrives almost from the front (direction of 0 degrees).

  First, it is assumed that the sound source separation device E10 acquires the input signals s1 (n) and s2 (n) from the microphones m1 and m2 through an AD converter (not shown). It is assumed that the acquired input signals s1 (n) and s2 (n) for 2ch are converted into frequency domain signals X1 (f) and X2 (f) by the FFT unit E11, respectively. X1 (f) and X2 (f) are complex numbers. Further, the analysis frame length at the time of the FFT processing in the FFT unit E11 may be 1024 samples, for example, but is not limited thereto, and may be adjusted to a length desired by the user of the apparatus.

  Next, the process of the first directivity forming unit E12 will be described. The first directivity forming unit E12 performs an operation such as the following equation (3) for X1 (f) and X2 (f) to obtain an output signal B1 (f).

  FIG. 13 is an explanatory diagram showing the directivity of the first directivity forming unit E12.

In the first directivity forming unit E12, a delay is given to the signal acquired from the microphone m1 in FIG. 13 by the calculation of the following equation (3), and the signal coming from the right direction is deleted. For example, when the arrival direction θ is 90 degrees, the directivity as shown by the thick line in FIG. 13 is formed.

  Next, the process of the 2nd directivity formation part E13 is demonstrated. The second directivity forming unit E13 performs an operation such as the following equation (4) for X1 (f) and X2 (f) to obtain an output signal B2 (f).

  FIG. 14 is an explanatory diagram showing the directivity of the second directivity forming unit E13.

In the second directivity forming unit E13, a delay is given to the signal acquired from the microphone m2 in FIG. 14 by the calculation of the following equation (4), and the signal coming from the left direction is deleted. For example, when the arrival direction θ is −90 degrees, the directivity as shown by the thick line in FIG. 14 is formed.

  Next, the process of the 3rd directivity formation part E14 is demonstrated. The third directivity forming unit E14 performs an operation such as the following equation (5) for X1 (f) and X2 (f), obtains an output signal B3 (f), and handles this as a noise signal. .

B3 (f) = X1 (f) -X2 (f) (5)
FIG. 15 is an explanatory diagram showing the directivity of the third directivity forming unit E14.

  Next, the meaning of the above equation (5) will be described. First, since sounds arriving from an azimuth (for example, the front) where the time difference between the acoustic paths between the microphone m1, the microphone m2, and the sound source is small are collected at the same level by each microphone, the expression (5) However, signals coming from directions (for example, left and right) having a large time difference are not canceled because of a difference in sound pickup levels between the microphone m1 and the microphone m2. In this way, sounds coming from the front and back are canceled out, while sounds coming from the left and right remain, so that the directivity as shown by the thick line in FIG. 15 is formed. Now, since it is assumed that the target sound comes from before, the signal obtained by the equation (5) can be regarded as a signal other than the target sound, that is, a noise signal.

  Next, the process of the target sound selection unit E15 will be described. The target sound selection unit E15 has a configuration as shown in FIG. 16, and performs the operation shown in the flowchart shown in FIG. Specifically, the target sound selection unit E15 obtains a signal P (f) by performing the following equation (6) on B1 (f) and B2 (f), and uses this as the target sound signal. .

P (f) = MIN [| B1 (f) |, | B2 (f) |] (6)
Note that MIN [x, y] in equation (6) represents an operation for selecting the smaller one from x and y, and equation (6) represents the level of B1 (f) and B2 (f) for each frequency. Indicates that the target sound component at the corresponding frequency is selected. The reason for performing such calculation is as follows.

  Since the sound collection sensitivities of B1 (f) and B2 (f) with respect to the front are the same, both contain the target sound to the same extent. On the other hand, there is a difference in the sound collection performance of noise coming from other than the front, and the noise content is smaller in the signal in which the blind spot is directed toward the source of the interfering sound or background sound. For example, when the noise source is on the right, B1 (f) having a blind spot on the right can remove noise, so the noise content is small, but B2 (f) cannot be removed, so it contains a lot of noise. It is out. Therefore, selecting a signal having a low level from B1 (f) and B2 (f) includes the target sound at the same level, and the noise component of the two signals having a difference in the way the noise component is included. In other words, it is to select fewer signals. Therefore, it can be said that a signal with a low level is more suitable as a target sound. The above is the background for estimating the target sound component by the above equation (6).

  Next, the process of the frequency subtraction unit E16 will be described. The frequency subtraction unit E16 obtains D (f) by performing an operation such as the following equation (7) on P (f) and B3 (f). By such processing of the frequency subtracting unit E16, the noise signal B3 (f) is subtracted from the target sound signal P (f) including noise, so that the noise component remaining in P (f) can be eliminated.

D (f) = P (f) −B3 (f) (7)
Next, processing of the IFFT unit E17 will be described. The IFFT unit E17 converts D (f) into a time domain signal (inverse Fourier transform), thereby obtaining an output signal y (t) in which noise and the like are suppressed.

JP 2006-197552 A

  The target sound selection unit E15 in the conventional sound source separation device E10 has a strong tendency to behave consistently with the actual acoustic environment when the level of the disturbing sound is large, whereas when the level of the disturbing sound is small, There is a tendency to make a selection operation inconsistent with the real world. As a first tendency of the operation of the target sound selection unit E15, the sound collection direction of the signal selected for each frequency within the same frame even though the sound source of the disturbing sound is one and comes from the same direction. May be different. Further, as a second tendency of the operation of the target sound selection unit E15, when the selection result of a specific frequency is observed, the sound collection direction of the selected signal frequently fluctuates despite the fact that the generation position of the interference sound is unchanged. There are things to do. These tendencies are considered to be related to the frequency characteristics of the sound to be processed by the sound source separation device E10 and the background sound. As shown in FIG. 18, the frequency characteristic of the sound component in the input signal input to the sound source separation device E10 has a structure in which a maximum value and a minimum value are repeated. When the noise component is superimposed on the audio component in the input signal, the characteristic of the noise component may be dominant in the vicinity of the minimum value. A trend may be observed.

  Due to the above-described tendency (characteristics) of the target sound selection unit E15, in the conventional sound source separation device E10, for example, in the same frame, a component with a frequency of 1000 Hz has a signal component with a blind spot in the right direction, and a sound with 1200 Hz As the signal component having the blind spot in the left direction is selected by the target sound selection unit E15, the target sound signal is composed of the components facing the blind spot azimuth that differs depending on the frequency although it should originally face one direction. There is a risk that. Thereby, in the conventional sound source separation device E10, the natural sound quality is impaired.

  Further, in the conventional sound source separation device E10, due to the above-described tendency (characteristic) of the target sound selection unit E15, even when the same frequency is observed for a long time, the arrival direction of the disturbing sound is unchanged, There is a possibility that the blind spot azimuth is frequently changed regardless of the actual acoustic environment, such as “the first is right, but the left is selected at the next moment”. This is also a factor of deterioration in sound quality in the conventional sound source separation device E10.

  In view of the above problems, in the process of separating the target sound from the input signal and noise arriving from any direction other than the direction of arrival of the target sound, it is possible to suppress deterioration in sound quality after the separation process. A sound source separation device, a program, and a method that can be used are desired.

  A first aspect of the present invention is a sound source separation apparatus that performs sound source separation processing for separating a target sound from noise that may include interference sound in addition to background sound from an input signal. (1) Arranged at intervals Among the plurality of microphones that have been received, the spectrum of the received sound signals of the two microphones is subjected to a process of forming a blind spot in a direction other than the target sound expected arrival direction where the target sound is expected to arrive, and the target sound component (2) a target sound dominant spectrum candidate forming unit that forms a plurality of target sound dominant spectrum candidates that become dominant, and (2) with respect to the spectrum of the received signal, a blind spot is set in a direction within a predetermined range including the target sound assumed arrival direction. (3) at least one frequency of each section of the received sound signal; and (3) noise dominant spectrum forming means for forming a noise dominant spectrum in which a noise component is dominant by performing a forming process. Reliability determination means for determining reliability of a determination result when performing interference sound determination for determining whether or not a component of interference sound is included in the section, and (4) the section of the received signal A selection process that selects one of the target sound dominant spectrum candidates to form a target sound dominant spectrum for each time, and is determined using at least the determination result of the reliability determination means related to the section A target sound selecting means for applying the method to the selection processing of the section; and (5) the noise component and the target sound for the received signal using the noise spectrum and the target sound dominant spectrum. And separating means for separating the components.

  The sound source separation program according to the second aspect of the present invention is mounted on a sound source separation apparatus that performs sound source separation processing for separating a target sound from noise that may include interfering sound in addition to background sound from an input signal. (2) Of the plurality of microphones arranged at intervals, (2) with respect to the spectrum of the received sound signal of two microphones, the direction of the target sound is assumed to come in a direction other than the expected arrival direction of the target sound. A target sound dominant spectrum candidate forming unit that performs a process of forming a blind spot to form a plurality of target sound dominant spectrum candidates in which the target sound component is dominant; and (3) the target sound assumption arrival for the spectrum of the received signal. Noise dominant spectrum forming means for performing a process of forming a blind spot in a direction within a predetermined range including a direction to form a noise dominant spectrum in which a noise component is dominant; (4 Reliability determination for determining the reliability of the determination result when performing the interference sound determination for determining whether or not the noise component is included in the section for at least one frequency component of each section of the received sound signal And (5) selecting any one of the target sound dominant spectrum candidates for each section of the received sound signal to form a target sound dominant spectrum, and at least the reliability determination relating to the section The target sound selection means that applies the selection processing method determined using the determination result of the means to the selection processing of the section; and (6) the received sound using the noise spectrum and the target sound dominant spectrum. The signal is made to function as a separating means for separating the noise component and the target sound component.

  According to a third aspect of the present invention, there is provided a sound source separation method for performing a sound source separation process for separating a target sound from a noise that may include an interfering sound in addition to a background sound from an input signal. Forming means, noise dominant spectrum forming means, reliability determining means, target sound selecting means, and separating means. (2) The target sound dominant spectrum candidate forming means includes a plurality of microphones arranged at intervals. Among them, the target sound dominance in which the target sound component becomes dominant by performing a process of forming a blind spot in a direction other than the target sound expected arrival direction in which the target sound is expected to be received with respect to the spectrums of the reception signals of the two microphones. A plurality of spectrum candidates are formed, and (3) the noise dominant spectrum forming means is configured to place the spectrum of the received sound signal in a direction within a predetermined range including the expected arrival direction of the target sound. A process of forming a corner is performed to form a noise dominant spectrum in which the noise component is dominant. (4) The reliability determination means includes at least one frequency component in each section of the received signal in the section. Determining the reliability of the determination result when performing the interference sound determination for determining whether or not the component of the interference sound is included, and (5) the target sound selection means is configured to perform the above-described operation for each section of the received sound signal. The target sound dominant spectrum is selected by selecting any one from the target sound dominant spectrum candidates, and the selection processing method determined using at least the determination result of the reliability determination means related to the section is (6) The separation means uses the noise spectrum and the target sound dominant spectrum to determine the noise component and the target sound component for the received signal. Min Characterized in that it.

  ADVANTAGE OF THE INVENTION According to this invention, in the process which isolate | separates the target sound and the noise which arrives from arbitrary directions other than the arrival direction of a target sound from an input signal, the quality degradation of the sound after a separation process can be suppressed.

It is the block diagram shown about the functional structure of the sound source separation apparatus which concerns on 1st Embodiment. It is the block diagram shown about the functional structure of the control signal generation part which concerns on 1st Embodiment. It is the block diagram shown about the functional structure of the target sound selection part which concerns on 1st Embodiment. It is the flowchart shown about operation | movement of the control-signal production | generation part which concerns on 1st Embodiment. It is the flowchart shown about the whole operation | movement of the target sound selection part which concerns on 1st Embodiment. It is the flowchart shown about the operation | movement of the blind spot direction memory | storage process in the target sound selection part which concerns on 1st Embodiment. It is the block diagram shown about the functional structure of the control signal generation part which concerns on 2nd Embodiment. It is the flowchart shown about operation | movement of the control-signal production | generation part which concerns on 2nd Embodiment. It is explanatory drawing shown about the characteristic of the audio | voice processed in the sound source separation apparatus which concerns on the modification of embodiment. It is the block diagram shown about the structural example of the conventional delay subtraction type microphone array. It is explanatory drawing shown about the directional characteristic formed with the conventional delay subtraction type microphone array. It is the block diagram shown about the functional structure of the conventional sound source separation apparatus. It is explanatory drawing shown about the directional characteristic of the 1st directivity formation part in the conventional sound source separation apparatus. It is explanatory drawing shown about the directional characteristic of the 2nd directivity formation part in the conventional sound source separation apparatus. It is explanatory drawing shown about the directional characteristic of the 3rd directivity formation part in the conventional sound source separation apparatus. It is the block diagram shown about the functional structure of the target sound selection part in the conventional sound source separation apparatus. It is the flowchart shown about operation | movement of the target sound selection part in the conventional sound source separation apparatus. It is explanatory drawing shown about the subject in the conventional sound source separation apparatus.

(A) First Embodiment Hereinafter, a first embodiment of a sound source separation device, program, and method according to the present invention will be described in detail with reference to the drawings.

(A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the sound source separation device 10 of the first embodiment. In FIG. 1, the reference numerals in parentheses are used only in the second embodiment described later.

  The sound source separation device 10 separates (suppresses) noise from an input signal input from a microphone and extracts a target sound. The use of the sound source separation device 10 is not limited. For example, the sound source separation device 10 may be mounted on a voice recognition device or a telephone device such as a mobile phone and used for voice capture. Specifically, for example, the sound source separation device 10 is installed in a teleconference device, and a voice of an arbitrary speaker is separated as a target sound from a mixed voice of a plurality of speakers performing remote speech, or remote speech is performed. It may be used to separate the speaker's voice as the target sound from the mixed sound of the speaker's voice and other sounds.

  The sound source separation device 10 includes microphones m1 and m2, an FFT unit 11, a first directivity forming unit 12, a second directivity forming unit 13, a third directivity forming unit 14, a target sound selecting unit 15, and a frequency subtraction. Section 16, IFFT section 17, and control signal generation section 18.

  The sound source separation apparatus 10 may be realized by installing the sound source separation program of the embodiment in an apparatus having a processor (CPU or the like) regarding components other than hardware such as a microphone. Further, some or all of the components of the sound source separation device 10 may be realized using dedicated hardware (for example, a semiconductor chip).

  The microphones m1 and m2 can be the same as the conventional sound source separation device shown in FIG. In addition, microphones m1 and m2 in the sound source separation device 10 are also arranged in the same manner as in FIG.

  In the following, as in the case of the above-described prior art, a vertical plane with respect to a line connecting the two microphones m1 and m2 is referred to as a 0 degree direction. The direction of 0 degree is represented as the front, the direction of 90 degrees as the right direction, the direction of -90 degrees as the left direction, and the direction of 180 degrees (-180 degrees) as the rear. In the following description, it is assumed that the sound source separation device 10 has a configuration that assumes that the target sound arrives almost from the front (0 degrees).

  The first directivity forming unit 12 and the second directivity forming unit 13 are components for obtaining a spectrum in which the component of the target sound is dominant. This is a filter in which a blind spot is directed in a direction different from the direction in which sound arrives.

  Here, the first directivity forming unit 12 is a filter having a blind spot in the right direction (direction of 90 degrees) as in the first directivity forming unit E12 in the above-described conventional technology (see FIG. 13 described above). Suppose that That is, it is assumed that the first directivity forming unit 12 performs an operation such as the above equation (3) on X1 (f) and X2 (f) to obtain an output signal B1 (f).

  Further, the second directivity forming unit 13 is a filter having a blind spot in the left direction (the direction of −90 degrees) (see FIG. 14 described above), similarly to the second directivity forming unit E13 in the above-described conventional technology. Suppose that That is, it is assumed that the second directivity forming unit 13 performs an operation such as the above equation (4) on X1 (f) and X2 (f) to obtain an output signal B2 (f).

  Note that, as described above, in the sound source separation device 10, since the target sound is assumed to come from a direction of approximately 0 degrees, the first directivity forming unit 12 and the second directivity forming unit 13 The blind spot is directed in a direction different from the direction in which the target sound arrives.However, depending on the direction in which the target sound is expected to arrive, the number of directivity forming parts and the combination of the blind spots to be applied should be changed. May be.

  The third directivity forming unit 14 is a filter in which a blind spot is directed in a direction in which the target sound arrives in order to extract a spectrum in which a noise component is dominant. Specifically, the third directivity forming unit 14 sets the blind spot of the filter in a direction within a predetermined range including the arrival direction of the target sound, similarly to the third directivity forming unit E14 in the conventional technology described above. Then, a noise signal is extracted.

  Here, the 3rd directivity formation part 14 is a filter (refer FIG. 15 mentioned above) which includes a front direction (0 degree direction) in a blind spot like the above-mentioned 3rd directivity formation part E14. And That is, the third directivity forming unit 14 performs an operation such as the above equation (5) on X1 (f) and X2 (f) to obtain an output signal B3 (f), which is used as a noise signal. Shall be handled as

  In the sound source separation device 10, the third directivity forming unit 14 is used to extract a spectrum in which the noise component is dominant. However, the number of directivity forming units to be used and the combination of the blind spots to be applied are not limited. Is. For example, a configuration in which a plurality of directivity forming units having blind spots in a direction within a predetermined range including a direction in which the target sound is expected to arrive may be used.

  The target sound selection unit 15 selects an appropriate one from B1 (f) and B2 (f) and uses this as the target sound signal P (f). Specific processing of the target sound selection unit 15 will be described later, but differs from the above-described conventional target sound selection unit E15 in that processing according to control of the control signal generation unit 18 is performed.

  The frequency subtracting unit 16 subtracts the noise signal B3 (f) from the target sound signal P (f) including the noise signal, and remains in P (f), similarly to the frequency subtracting unit E16 in the prior art described above. The noise component is erased. Here, the frequency subtraction unit 16 performs an operation such as the above equation (7) on P (f) and B3 (f), similarly to the frequency subtraction unit E16 in the above-described prior art, and obtains D (f) Shall be obtained.

  The IFFT unit 17 obtains an output signal y (t) in which noise and the like are suppressed by converting D (f) into a time domain signal (inverse Fourier transform), similarly to the IFFT unit E17 in the above-described prior art. Is.

  Next, the control signal generator 18 will be described.

  Before describing the function of the control signal generator 18, the relationship between the disturbing sound and the noise is first organized. In the sound source separation device 10, “when the level of the disturbing sound is low” means “no disturbing sound” or “interference sound (human voice other than the speaker)” This is the case when the frequency component is low. Since the signal component in this case has a strong characteristic as a background sound as shown in FIG. 18 described above, it is meaningless to perform the target sound selection process in the first place. Nevertheless, since it contributes to the selection result, a phenomenon contrary to the actual acoustic environment as described above occurs. Further, when the operation of the target sound selection unit is viewed from another viewpoint, the arrival direction of the interference sound (that is, the arrival direction different from the target sound) is estimated, and a signal component having a blind spot in the direction is selected. In other words. Therefore, the above problem can be rephrased as “failure in estimating the direction of arrival of the disturbing sound when the level of the disturbing sound is small”.

  Therefore, the sound source separation apparatus 10 estimates the direction of interference sound arrival with a component having a high level of interference sound and high reliability as speech in a section of only the interference sound, and reliability is low with a component with low reliability. The previous problem is solved by diverting the selection results at high places. In order to realize this, the sound source separation apparatus 10 includes a control signal generation unit 18 that extracts a signal component suitable for selection of the arrival direction of the disturbing sound and outputs a control signal for controlling the selection operation of the target sound selection unit 15. Generated and supplied to the target sound selector 15. The target sound selection unit 15 performs a selection operation according to the control signal from the control signal generation unit 18.

  FIG. 2 is an explanatory diagram showing a functional configuration of the control signal generator 18.

  As described above, the control signal generation unit 18 generates a control signal for controlling the selection operation in the target sound selection unit 15, and includes a disturbing sound section determination unit 181, a reliability determination unit 182, and a control signal update unit 183. And a control signal transmission unit 185.

  Based on the noise signal B3 (f), the interfering sound section determination unit 181 is a section where the interfering sound is generated (hereinafter referred to as “interfering sound section”) or a section where the interfering sound is not generated (hereinafter referred to as “interfering sound section”). , “Non-interfering sound section”).

  The “section” here represents a period of a processing unit on the time domain when the FFT unit 11 converts the input signal from the time domain to the frequency domain. Hereinafter, a signal for one section in the time domain is also referred to as a “frame”. The interfering sound detection unit 181 performs determination regarding the interfering sound section for each section.

Here, as an example, the interference sound section determination unit 181 calculates the noise level (hereinafter referred to as “noise level Lv”) in the noise signal B3 (f) using the following equation (8), and the calculated noise Judgment is made by applying the level Lv to the following equation (9). That is, as shown in the following equation (9), the interference sound detection unit 181 determines that the noise level Lv is equal to or greater than a predetermined threshold (hereinafter referred to as “detection threshold Ψ”) as an interference sound section, If it is smaller, it is determined as a non-interfering sound section (not a disturbing sound section). It can also be said that the noise level Lv represents the noise power in the section in the time domain. Note that the calculation method and determination method of the noise level Lv are not limited to these calculation formulas.

  The reliability determination unit 182 observes the level | X1 (f) | for each frequency component of the input signal X1 (n) and compares it with a predetermined threshold (hereinafter referred to as “reliability determination threshold Ξ”). And the reliability determination part 182 determines whether it is a high reliability component for every frequency component, combining with the result of determination in the disturbance sound area determination part 181. FIG.

  Here, as a result of the determination by the interfering sound section determining unit 181, the frame is an interfering sound section, and when | X1 (f) | is equal to or greater than the reliability determination threshold value Ξ, the reliability determining unit 182 The component is determined to be a highly reliable component. The reliability determination unit 182 determines that the other frequency components are low reliability components (not high reliability components). Then, the reliability determination unit 182 gives the result determined in the above manner to the control signal update unit 183.

  Here, the intention of determining the determination in the reliability determination unit 182 based on the magnitude of | X1 (f) | will be described. The frequency characteristic of the sound component in the input signal input to the sound source separation device 10 has a structure in which a maximum value and a minimum value are repeated as shown in FIG. When the noise component is superimposed on the voice component in the input signal, the characteristic of the noise component may be dominant in the vicinity of the minimum value. Therefore, the target sound selection unit 15 in the subsequent stage is in the vicinity of the minimum value. It can be said that the sound does not have sufficient reliability to contribute to the target sound selection operation by. On the other hand, the sound component in the input signal is not masked by the noise component (not buried) in the vicinity of the maximum value, and has the characteristics of the sound component (the sound component is sufficiently large relative to the noise component). Therefore, it can be said that it is suitable for contributing to the target sound selection operation. Therefore, in the input signal, the operation of simply selecting the component in the vicinity where the audio component has the maximum value is realized by selecting | X1 (f) | that is larger than the predetermined threshold. X1 (f) includes background noise as well as interfering sound. However, since B1 (f) does not have directivity like B3 (f), the characteristics of background noise are reflected more accurately. Therefore, it can be said that the signal is suitable for determining the influence of the background noise component. Thus, the reliability determination unit 182 can select a signal component having “reliability as speech” sufficient to contribute to estimation of the arrival direction of the disturbing sound (selection of the target sound in the subsequent stage).

That is, here, as shown in the following equation (10), the reliability determination unit 182 determines that the frequency component is a high reliability component when | X1 (f) | When | X1 (f) | is less than the reliability determination threshold value Ξ, it is determined that the frequency component is a low reliability component. However, the above-described processing is an example of a reliability determination method by the reliability determination unit 182 and is not limited to this. For example, in the following equation (10), X1 (f) may be replaced with X2 (f).

  In this example, the reliability determination unit 182 associates each frequency component (X1 (f)) with the reliability determination result (“1” or “0”) related to the frequency component (1). It is assumed that a control signal related to each frequency component for a frame) is supplied to the control signal update unit 183 as determination result information.

  The control signal update unit 183 determines whether or not it is a disturbing sound section from the information received from the disturbing sound section determination unit 181. If the determination result is a disturbing sound section, the reliability determination result received from the reliability determination unit 182 Is output as a control signal C [f] to the target sound selection unit 15 via the control signal transmission unit 185.

  On the other hand, if the determination result received from the interfering sound section determining unit 181 is a result of a non-interfering sound section, the control signal updating unit 183 rejects the reliability determining result received from the reliability determining unit 182 and controls the control signal. C [f] = 0 is output to the target sound selection unit 15 via the control signal transmission unit 185.

  Next, the configuration of the target sound selection unit 15 will be described.

  FIG. 3 is an explanatory diagram showing a functional configuration of the target sound selection unit 15.

  The target sound selection unit 15 includes an acoustic signal and control signal reception unit 151, a control switching unit 152, a minimum value extraction unit 153, a blind spot direction storage unit 154, a blind spot direction reference and signal selection unit 155, a target sound signal generation unit 156, and A target sound signal transmission unit 157 is provided.

  The acoustic signal and control signal receiving unit 151 receives the inputs of B1 (f), B2 (f) and the control signal C [f], and supplies them to the control switching unit 152. The acoustic signal and control signal receiving unit 151 associates B1 (f), B2 (f), and the control signal C [f] with which f is the same value as a set of data, and provides the control switching unit 152 with the data. .

  Then, the control switching unit 152 sets the minimum value extraction unit 153 or the blind spot direction reference and signal selection unit 155 for B1 (f) and B2 (f) according to the value of the corresponding control signal C [f]. Sort to one. When the control signal C [f] is 1, the control switching unit 152 distributes the corresponding B1 (f) and B2 (f) to the minimum value extracting unit 153. Also, when the control signal C [f] is 0, the control switching unit 152 distributes the corresponding B1 (f) and B2 (f) to the blind spot direction reference and signal selection unit 155.

  When B1 (f) and B2 (f) are given, the minimum value extraction unit 153 determines whether B1 (f) and B2 (f) The lower level is adopted and supplied to the target sound signal generation unit 156 as the signal A (f). Then, the minimum value extraction unit 153 records the blind spot azimuth φ (f) corresponding to the signal selected by the following equation (11) out of B1 (f) or B2 (f) in the blind spot azimuth storage unit 154. Perform the process.

  The “dead angle azimuth” indicates the direction in which the filter of the directivity forming unit corresponding to B1 (f) or B2 (f) suppresses sound. For example, since the blind spot azimuth of the first directivity forming unit 12 corresponding to B1 (f) is 90 degrees (right direction), if B1 (f) is selected by the minimum value extraction unit 153, the blind spot The direction φ (f) is 90 degrees. On the other hand, when B2 (f) is selected by the minimum value extraction unit 153, the blind spot direction φ (f) is −90 degrees (leftward). Hereinafter, as an example of the recording format in the blind spot direction storage unit 154, the blind spot direction of the first directivity forming unit 12 corresponding to B1 (f) is represented as “1” and corresponds to B2 (f). The blind spot direction of the second directivity forming unit 13 is represented as “2”.

A (f) = MIN [| B1 (f) |, | B2 (f) |] (11)
On the other hand, when B1 (f) and B2 (f) are given, the blind spot azimuth reference and signal selection unit 155 refers to the blind spot azimuth φ stored in the blind spot azimuth storage unit 154 and based on the reference result, B1 A blind spot direction corresponding to one of (f) and B2 (f) is adopted. The blind spot direction reference and signal selection unit 155 supplies the target sound signal generation unit 156 with A (f) corresponding to the adopted blind spot direction among B1 (f) and B2 (f).

  The processing method in which the blind spot orientation reference and signal selection unit 155 refers to the contents of the blind spot orientation storage unit 154 is not limited. For example, the blind spot orientation in another frequency component of the same frame may be used. The blind spot azimuth may be recorded for each frequency component of the past frame, and the past blind spot azimuth may be referred to for each corresponding frequency component.

  In the target sound signal generation unit 156, the signal A (f) for each frequency component supplied from the minimum value extraction unit 153 or the blind spot direction reference and signal selection unit 155 is reconfigured in order of frequency, and the target sound signal for one frame is obtained. As P (f) and output to the frequency subtraction unit 16 via the target sound signal transmission unit 157.

(A-2) Operation of the First Embodiment Next, the operation of the sound source separation device 10 of the first embodiment having the above configuration (the sound source separation method of the embodiment) will be described.

  In the sound source separation device 10, the signals input from the microphones m1 and m2 are first converted from the time domain to the frequency domain by the FFT unit 11 to form X1 (f) and X2 (f). Signals B1 (f), B2 (f), and B3 (f) having a blind spot in a predetermined direction by the directivity forming unit 12, the second directivity forming unit 13, and the third directivity forming unit 14. Is formed.

  The control signal generator 18 generates a control signal C [f] and supplies it to the target sound selector 15.

  Next, the operation of the control signal generator 18 will be described.

  FIG. 4 is a flowchart showing the operation of the control signal generator 18.

  In the flowchart of FIG. 4, F_INI and F_FIN are constants for controlling the number of repetitions of arithmetic processing in the frequency domain, and may be arbitrarily set by the apparatus user. Here, as an example, F_INI = 0 and F_FIN = 1023 are used, but the present invention is not limited to this. In the flowchart of FIG. 4, f is used as a variable for repeated processing, and is incremented by 1 when repeated. However, the increment unit is not limited to this (for example, converted to the frequency domain). It may be the minimum unit when it is done). Further, the flowchart of FIG. 4 shows that processing for one frame (f = F_INI to F_FIN = 0 to 1023) is performed for X1 (f) and X2 (f).

  First, in the control signal generator 18, f is initialized to F_INI (= 0) (S101).

  Next, the control signal generation unit 18 determines whether or not f is equal to or less than F_FIN (S102). When it is determined that f is equal to or less than F_FIN, the control signal generation unit 18 operates from the process of step S103 described later, and is not so. In the case (f> F_FIN), the process for the section (frame) is ended.

  If it is determined in step S102 that f is equal to or less than F_FIN, the reliability determination unit 182 determines whether the frequency component is a high reliability component based on the input signal X1 (n). Then, the determination result is supplied to the control signal update unit 183. Then, the control signal update unit 183 refers to the determination result of the section by the interference sound determination unit 181 (S103). If the determination result is the interference sound section, the control signal update unit 183 starts from step S104 to be described later. In the case of a non-interfering sound section, the operation starts from step S105 described later. Note that the determination processing of the interference sound determination unit 181 is preferably performed for each section (frame) instead of for each frequency component (that is, performed once per section).

  When the determination result of the disturbance sound determination unit 181 is a disturbance sound section, the control signal update unit 183 uses the determination result (1 or 0) supplied from the reliability determination unit 182 as a control signal corresponding to the frequency component. C [f] is supplied to the target sound selection unit 15 via the control signal transmission unit 185 (S104).

  On the other hand, when the determination result of the interference sound determination unit 181 is not the interference sound interval (in the case of a non-interference sound interval), the control signal update unit 183 rejects the reliability determination result received from the reliability determination unit 182; The control signal C [f] = 0 is supplied to the target sound selection unit 15 (S105).

  When the control signal C [f] is supplied to the target sound selection unit 15 in steps S104 and S105, the control signal generation unit 18 increments the variable f (f ++, that is, f = f + 1) (S106). The processing starts from step S102.

  As described above, the control signal generation unit 18 generates the control signal C [f] for each frequency component and supplies the control signal C [f] to the target sound selection unit 15. Then, the target sound selection unit 15 performs a selection process on B1 (f) and B2 (f) according to the control signal C [f], and generates a target sound signal P (f).

  Next, the operation of the target sound selection unit 15 will be described.

  5 and 6 are flowcharts showing the operation of the target sound selection unit 15.

  Constants F_INI, F_FIN, and variable f in the flowcharts of FIGS. 5 and 6 are the same as those in FIG. The flowcharts of FIGS. 5 and 6 show that B1 (f) and B2 (f) are processed for one frame (section) (f = F_INI to F_FIN = 0 to 1023).

  First, the target sound selection unit 15 initializes f to F_INI (= 0) (S201).

  Next, the target sound selection unit 15 determines whether f is equal to or less than F_FIN (S202). If it is determined that f is equal to or less than F_FIN, the operation starts from the process of step S203 described later, and is not so. In the case (f> F_FIN), the process of the section is finished.

  When it is determined that it is equal to or less than F_FIN, the control switching unit 152 reads the data of the set of B1 (f), B2 (f), and the control signal C [f]. First, the control signal C [f] The value is referenced. Then, the control switching unit 152 confirms the content of the control signal C [f] (S203). If the control signal C [f] = 1, the control switching unit 152 operates from the process of step S204 described later, and is not so. In this case, the operation starts from the processing in step S206 described later.

  When the control signal C [f] = 1 is confirmed in step S203 described above, the control switching unit 152 sets B1 (f) and B2 (f) corresponding to the control signal C [f] to the minimum. The value is supplied to the value extraction unit 153. Then, the minimum value extraction unit 153 selects either B1 (f) or B2 (f) according to the above equation (11) and generates the signal A (f) (S204).

  Then, the minimum value extraction unit 153 sets a parameter indicating the blind spot direction corresponding to the one selected as the signal A (f) in the above-described step S204 out of B1 (f) or B2 (f), and the blind spot direction storage unit 154. Is recorded (S205).

  Next, an example of processing by the minimum value extraction unit 153 in step S205 is performed with reference to FIG.

  First, the minimum value extraction unit 153 determines whether or not B1 (f) is selected as the signal A (f) in S204 described above (S301). When B1 (f) is selected as the signal A (f) in S204 described above, the minimum value extraction unit 153 sets the parameter “1” indicating the blind spot direction corresponding to B1 (f) to the parameter “1”. The blind spot direction φ (f) corresponding to the frequency component is determined (S302). On the other hand, when B1 (f) is not selected as the signal A (f) in S204 described above (that is, when B2 (f) is selected), the minimum value extraction unit 153 selects B2 (f ) Is determined as the dead angle direction φ (f) corresponding to the frequency component (S303). Then, the minimum value extraction unit 153 records the parameter of the blind spot direction φ (f) corresponding to the frequency component determined in step S302 or S303 in the blind spot direction storage unit 154 (S304).

  The minimum value extraction unit 153 performs a process of recording the blind spot azimuth φ (f) in the blind spot azimuth storage unit 154 by the process as described above.

  On the other hand, when it is confirmed in step S203 described above that the control signal C [f] = 1 is not satisfied (when the control signal C [f] = 0), the control switching unit 152 controls the control signal C [f]. B1 (f) and B2 (f) corresponding to are supplied to the blind spot direction reference and signal selection unit 155. The blind spot azimuth reference and signal selection unit 155 refers to the contents of the blind spot azimuth storage unit 154, selects either B1 (f) or B2 (f) based on the reference result, and outputs the signal A (f). (S206 to S209). As described above, the contents of the blind spot direction storage unit 154 referred to by the blind spot direction reference and signal selection unit 155 and the selection method of B1 (f) or B2 (f) are not limited. Here, as an example, in step S206, an arbitrary blind spot orientation (for example, a blind spot orientation corresponding to a frequency closest to the frequency f) is read in the same frame. And by the process of step S207-S209, the thing corresponding to the read blind spot azimuth | direction is selected among B1 (f) or B2 (f), and it produces | generates as a signal A (f).

  Then, the signal A (f) generated by the minimum value extraction unit 153 (step S204 described above) or the blind spot direction reference and signal selection unit 155 (steps S206 to S209) is used as the target sound signal P (f). The signal is supplied to the frequency subtraction unit 16 via the target sound signal transmission unit 157 (S210).

  When the target sound signal P (f) is supplied to the frequency subtracting unit 16 in the above-described step S210, the target sound selecting unit 15 increments the variable f (f ++, that is, f = f + 1) (S211), The operation starts from the processing in step S202 described above.

  As described above, the target sound selection unit 15 generates the target sound signal P (f) for one frame.

  When receiving the target sound signal P (f) from the target sound selection unit 15, the frequency subtracting unit 16 subtracts the noise signal B3 (f) from the target sound signal P (f) to obtain a signal after noise removal. D (f) is calculated and supplied to the IFFT unit 17. Then, the noise-removed signal D (f) for one frame is converted into the time-domain signal y (t) by the IFFT unit 17, and the sound source separation processing by the sound source separation device 10 ends.

(A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

  The sound source separation device 10 accurately estimates the arrival direction of the disturbing sound, and selects a component suitable for generating an appropriate target sound in advance by the control signal generation unit 18, and based on the result, the control signal C [F] is generated, and based on this, target sound selection processing (estimation of blind spot direction of interference sound) by the target sound selection unit 15 is controlled. As a result, inconsistencies in blind angle azimuth for each frequency and fluctuations in blind angle azimuth unrelated to the actual acoustic environment, which occurred in the same frame in the prior art, are reduced, and distortion of the final output sound is eliminated. Therefore, in the sound source separation device 10 of the first embodiment, it is possible to suppress deterioration in sound quality after separation processing, as compared with the conventional technique. For example, by this, by applying the sound source separation device 10 of the first embodiment to a communication device such as a video conference system or a mobile phone, improvement in call sound quality can be expected.

(B) Second Embodiment Hereinafter, a second embodiment of a sound source separation device, program, and method according to the present invention will be described in detail with reference to the drawings.

  The control signal generator 18 in the first embodiment generates a control signal based on only the magnitude of the interference sound. However, when the apparatus user (speaker) is speaking, not only the disturbing sound from the side but also the target sound from the front is input. In other words, the selection operation of the target sound selection unit 15 is to estimate the arrival direction of the disturbing sound. For this direction estimation, the target sound can be a disturbance factor (in order to capture only the sound coming from the side, from the front Because the sound arrives). Therefore, in the first embodiment, the presence of the target sound may affect the selection operation of the target sound selection unit 15, and as a result, the same problem as in the related art recurs and the sound quality is improved. There was a problem of being lowered. Hereinafter, a configuration for solving such a problem in the second embodiment will be described.

(B-1) Configuration of Second Embodiment The functional configuration of the sound source separation device 10A of the second embodiment can also be shown using FIG. In FIG. 1, the reference numerals in parentheses are those used only in the second embodiment.

  Hereinafter, the difference between the second embodiment and the first embodiment will be described.

  The second embodiment is different from the first embodiment in that the control signal generator 18 is replaced with a control signal generator 18A.

  FIG. 7 is a block diagram showing a functional configuration of the control signal generator 18A.

  The control signal generation unit 18A is the first in that the disturbance sound section determination unit 181 and the control signal update unit 183 are replaced with a non-target sound section and disturbance sound period determination unit 186, and a control signal update unit 183A, respectively. This is different from the control signal generation unit 18 of the embodiment.

  When the input signal X1 (f) and the noise signal B3 (f) for one frame are input to the non-target sound interval / interference sound interval determination unit 186, the level difference between the two signals is calculated according to the following equation (12). By calculating the TLv indicating the target sound level, the level of the target sound can be obtained although it is approximate.

  Here, the reason why the target sound level can be approximately calculated by the following equation (12) will be supplemented. Since X1 (f) is a signal obtained by omnidirectional sound of front and rear, left and right, and B3 (f) is a noise signal arriving from the left and right, calculating the difference between them leaves only the front and rear signal components. Now, since it is assumed that the target sound comes from the front, it can be expected that the remaining signal is the target sound.

  That is, the non-target sound section and interference sound section determination unit 186 can determine that the obtained TLv is equal to or greater than a certain value and is a target sound section, and if not, is determined to be a non-target sound section. .

Here, as an example, a determination is made by applying the calculated TLv to the following equation (13). That is, in the non-target sound section and interference sound section determination unit 186, as shown in the following equation (13), if the calculated TLv is equal to or greater than a predetermined threshold (hereinafter referred to as “detection threshold Γ”), the target sound section If it is smaller, it is determined as a non-target sound section. Note that the target sound section determination method is not limited to these calculation formulas. For example, the same processing may be performed by replacing X1 (f) with X2 (f) in the above equation (12).

  In this way, the non-target sound section and interference sound section determination unit 186 estimates information on the target sound section and supplies it to the control signal update unit 183A. The non-target sound section and interference sound section determination unit 186 further performs a determination process of the interference sound section similar to the interference sound determination unit 181 of the first embodiment, and the determination result is also sent to the control signal update unit 183A. Supply.

  As a result, the control signal update unit 183A can detect a section “a non-target sound section and a disturbing sound section”.

  Then, the control signal update unit 183A identifies whether or not it corresponds to “a non-target sound section and a disturbing sound section” from the information supplied from the non-target sound section and the disturbing sound section determination unit 186, If it is a corresponding section, the reliability determination result received from the reliability determination unit 182 is output to the target sound selection unit 15 via the control signal transmission unit 185 as the control signal C [f].

  On the other hand, from the information supplied from the non-target sound section and interference sound section determination unit 186, the control signal update unit 183A determines that the first non-target sound section is a section that does not correspond to the “interference sound section and disturbing sound section”. As in the embodiment, the reliability determination result received from the reliability determination unit 182 is rejected and output to the target sound selection unit 15 via the control signal transmission unit 185 as the control signal C [f] = 0. And

(B-2) Operation | movement of 2nd Embodiment Next, operation | movement (sound source separation method of embodiment) of 10 A of sound source separation apparatuses of 2nd Embodiment which has the above structures is demonstrated.

  Since the sound source separation device 10A of the second embodiment is different from the first embodiment only in the control signal generation unit 18A as described above, only the operation of the control signal generation unit 18A will be described below. Since the operation of other parts is the same as that of the first embodiment, detailed description thereof is omitted.

  FIG. 8 is a flowchart showing the operation of the control signal generator 18A.

  Constants F_INI, F_FIN, and variable f in the flowchart of FIG. 8 are the same as those in FIG. Further, the flowchart of FIG. 8 shows that B1 (f) and B2 (f) are processed for one frame (f = F_INI to F_FIN = 0 to 1023).

  First, in the control signal generator 18A, f is initialized to F_INI (= 0) (S401).

  Next, the control signal generation unit 18A determines whether f is equal to or less than F_FIN (S402). When it is determined that f is equal to or less than F_FIN, the control signal generation unit 18A operates from the process of step S403, which will be described later. In the case (f> F_FIN), the process of the section is finished.

  When it is determined in step S402 that f is equal to or less than F_FIN, the reliability determination unit 182 determines whether the frequency component is a high reliability component based on the input signal X1 (n). Then, the determination result is supplied to the control signal update unit 183A. Then, the control signal update unit 183A refers to the determination result of the relevant section by the non-target sound section and interference sound section determination unit 186 (S403), and the determination result is “non-target sound section and disturbing sound section”. If so, the operation starts from step S404, which will be described later. Otherwise, the operation starts from step S405, which will be described later. It should be noted that the determination processing by the non-target sound section and the interference sound section determination unit 186 is preferably performed for each section (frame) instead of for each frequency component (that is, performed once per section). When the section corresponds to the “non-target sound section and disturbing sound section”, the control signal update unit 183A uses the determination result (1 or 0) supplied from the reliability determination unit 182 as the frequency. The control signal C [f] corresponding to the component is supplied to the target sound selection unit 15 via the control signal transmission unit 185 (S404).

  On the other hand, when the section is not a “non-target sound section and a disturbing sound section”, the control signal update unit 183A rejects the reliability determination result received from the reliability determination unit 182 and controls the control signal. C [f] = 0 is supplied to the target sound selection unit 15 (S405).

  When the control signal C [f] is supplied to the target sound selection unit 15 in steps S404 and S405, the control signal generation unit 18A increments the variable f (f ++, that is, f = f + 1) (S406). The operation starts from step S402.

  As described above, in the control signal generation unit 18A, the control signal C [f] is generated for each frequency component and supplied to the target sound selection unit 15.

(B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be obtained in addition to the effects of the first embodiment.

  In the sound source separation device 10A of the second embodiment, the section (non-target sound section) where the apparatus user (speaker) is not speaking is detected, and the arrival direction of the disturbing sound is estimated in the section. It is possible to eliminate a determination error in the target sound selection unit 15 that occurs when a sound and an interfering sound exist simultaneously. Thereby, in 10 A of sound source separation apparatuses of 2nd Embodiment, the quality fall of the sound after a separation process can be suppressed rather than 1st Embodiment.

(C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

(C-1) In each of the above embodiments, the reliability determination unit performs the reliability determination for all frequency components in the frame, but the reliability determination is performed only for some frequency components. Also good. Hereinafter, a modified example in that case will be described.

  Since the amplitude of the frequency characteristic of sound becomes smaller as the frequency component becomes higher, as shown in FIG. 9, it is often buried in the background sound, and the reliability is generally low. In addition, the microphone array has a fundamental problem that, based on the spatial sampling theorem, frequency components higher than the boundary frequency (depending on the microphone interval) cannot reproduce the actual acoustic characteristics. It cannot be said that it has enough reliability to contribute.

  Therefore, in the sound source separation apparatus of each embodiment described above, the operation of the control signal generation unit is stopped for components having a frequency equal to or higher than a threshold value (hereinafter referred to as “threshold value Tf”). You may set so that the result of the reliable component in the same flame | frame may be applied. This produces an effect of reducing the amount of calculation of the control signal generation unit. Specifically, for example, a “calculation execution determination unit” that performs the above-described processing is provided in the control signal generation unit, and if the frequency is smaller than the threshold value Tf, the control signal generation calculation is executed, and otherwise, it is stopped. You may control.

  The value to be applied as the threshold value Tf is not limited, but for example, a threshold value corresponding to the background sound level or a threshold value calculated from the spatial sampling theorem may be applied. Note that when the threshold is calculated by the spatial sampling theorem, the following equation (14) may be used. In the following equation (14), l is the distance between microphones, and c indicates the speed of sound.

Tf = c / 2l (14)
(C-2) In the sound source separation apparatus of each embodiment described above, a part of the processing performed in the frequency domain may be performed in the time domain.

(C-3) Determination threshold values used in the “interference sound section determination unit” and “reliability determination unit” in the first embodiment, and “non-target sound section” and “interference sound section determination in the second embodiment The determination threshold used in the “part” is not a fixed value but may be changed adaptively. For example, a different value for each frequency may be applied as each determination threshold.

(C-4) In each of the above embodiments, the input signal input to the sound source separation device has been described as having been captured by a microphone and subjected to analog / digital conversion. However, the microphone is omitted, and other methods are used. You may make it input. For example, it may be read from a recording medium or the like, or may be given by communication from another device. That is, the sound source separation device 10 may have a configuration in which the microphone and the FFT unit are omitted as long as X1 (f) and X2 (f) can be held.

  Also, the output format of the signal of the sound source separation device is not limited. For example, the IFFT unit may be omitted and the signal expressed in the frequency domain may be output as it is.

(C-5) In the first embodiment, the generation of the control signal C [f] reflects the determination result of the reliability determination unit and the determination result of the interference sound section determination unit. The control signal C [f] may be generated based only on the determination result. For example, when the determination result of the reliability determination unit is a result of a highly reliable component, the control signal C [f] = 1 may be set. Otherwise, the control signal C [f] = 0 may be set. In this case, it is desirable that the reliability determination unit generates the control signal with reference to B3 (f) instead of X1 (f) because the influence of the target sound can be removed.

(C-6) The target sound selection unit 15 of each of the above embodiments selects either B1 (f) or B2 (f) for each frequency component in the frame, and the target sound signal A ( f) is generated, but it may be determined whether to adopt B1 (f) or B2 (f) as the target sound signal A (f) in units of frames.

  For example, for a frame in a disturbing sound section having a high reliability component, the result selected by the minimum value extraction unit 13 for the high reliability component may be applied to all frequency components. In addition, for example, in a frame of a disturbing sound section having a high reliability component, the minimum value is extracted for any high reliability component (for example, the component having the largest value of | X1 (f) | or | X2 (f) |). The result selected by the unit 13 (either B1 (f) or B2 (f)) may be applied to all frequency components. Further, for example, the results of selection by the minimum value extraction unit 13 for each of the high reliability components in the frame of the disturbing sound section having the high reliability component are tabulated, and selected from B1 (f) or B2 (f) The selection result that has been performed many times may be applied to all frequency components. In this case, for the frame in the non-interfering sound section, the selection result of the frame in the other interfering sound section having the high reliability component (for example, the latest corresponding frame) may be applied as it is.

  DESCRIPTION OF SYMBOLS 10 ... Sound source separation apparatus, m1, m2 ... Microphone, 11 ... FFT part, 12 ... 1st directivity formation part, 13 ... 2nd directivity formation part, 14 ... 3rd directivity formation part, 15 ... Purpose Sound selection unit, 151... Acoustic signal and control signal reception unit, 152... Control switching unit, 153... Minimum value extraction unit, 154... Blind angle direction storage unit, 155. ,...,... Target sound signal transmission unit, 16... Frequency subtraction unit, 17... IFFT unit, 18... Control signal generation unit, 181 ... interfering sound section determination unit, 182 ... reliability determination unit, 183. 185: Control signal transmission unit.

Claims (9)

  1. In the sound source separation device that performs sound source separation processing that separates the target sound from noise that may include interference sound in addition to background sound from the input signal,
    A process of forming a blind spot in a direction other than the target sound expected arrival direction where the target sound is supposed to arrive is performed on the spectrum of the received signal of two microphones among a plurality of microphones arranged at intervals. A target sound dominant spectrum candidate forming means for forming a plurality of target sound dominant spectrum candidates in which the target sound component is dominant;
    Noise dominant spectrum forming means for performing a process of forming a blind spot in a direction within a predetermined range including the intended arrival direction of the target sound with respect to the spectrum of the received sound signal to form a noise dominant spectrum in which a noise component is dominant; ,
    Reliability determination for determining the reliability of the determination result when performing the interference sound determination for determining whether or not the noise component is included in the section for at least one frequency component of each section of the received sound signal Means,
    For each section of the received sound signal, any one of the target sound dominant spectrum candidates is selected to form a target sound dominant spectrum, and at least the determination result of the reliability determination means related to the section is used. Target sound selection means for applying the selection processing method determined in this way to the selection processing of the section;
    A sound source separation device comprising: separation means for separating the noise component and the target sound component from the received signal using the noise spectrum and the target sound dominant spectrum.
  2.   2. The sound source separation device according to claim 1, wherein the target sound selecting means selects one of the target sound dominant spectrum candidates for each frequency component to form a target sound dominant spectrum.
  3.   The target sound selection unit includes a first selection processing unit that performs a selection process related to a frequency component for which the reliability of the interference sound determination is determined to be greater than or equal to a predetermined value by the reliability determination unit, and the first selection processing unit. The frequency component which was not made into the process target is provided with the 2nd selection process part which performs a selection process in consideration of the process result which the said 1st selection process part performed in the past. Sound source separation device.
  4. For each section of the received sound signal, further comprising interference sound section determination means for performing interference sound determination,
    The target sound selection means is a frequency at which the reliability of the interference sound determination is determined to be greater than or equal to a predetermined value by the reliability determination means within the interval determined as the interference sound section including the interference sound by the interference sound section determination means. A first selection processing unit that performs a selection process related to a component, and a frequency component that has not been processed by the first selection processing unit, taking into account processing results that the first selection processing unit has performed in the past. The sound source separation apparatus according to claim 2, further comprising a second selection processing unit that performs selection processing.
  5.   The target sound selection means performs selection processing using the second selection processing unit for the frequency component of the section determined as the non-interference sound section that does not include the interference sound by the interference sound section determination means. The sound source separation device according to claim 4.
  6. A target sound section determination means for determining whether or not the target sound component is included for each section of the received sound signal;
    The target sound selecting means is determined to be a non-target sound section that is determined to be a disturbing sound section including the disturbing sound by the disturbing sound section determining means, and is not included in the target sound component by the target sound section determining means. A first selection processing unit that performs a selection process related to a frequency component for which the reliability of the interference sound determination is determined to be equal to or higher than a predetermined value by the reliability determination unit, and a process performed by the first selection processing unit The frequency component which was not made into object is provided with the 2nd selection process part which performs a selection process in consideration of the process result which the said 1st selection process part performed in the past, The Claim 2 characterized by the above-mentioned. Sound source separation device.
  7.   3. The target sound selection means applies a selection result of frequency components having a frequency less than the predetermined frequency within the same section for selection processing related to frequency components of a predetermined frequency or higher. 6. The sound source separation device according to any one of 6.
  8. A computer installed in a sound source separation device that performs sound source separation processing that separates target sound from noise that may include interference sound in addition to background sound from the input signal,
    A process of forming a blind spot in a direction other than the target sound expected arrival direction where the target sound is supposed to arrive is performed on the spectrum of the received signal of two microphones among a plurality of microphones arranged at intervals. A target sound dominant spectrum candidate forming means for forming a plurality of target sound dominant spectrum candidates in which the target sound component is dominant;
    Noise dominant spectrum forming means for performing a process of forming a blind spot in a direction within a predetermined range including the intended arrival direction of the target sound with respect to the spectrum of the received sound signal to form a noise dominant spectrum in which a noise component is dominant; ,
    Reliability determination for determining the reliability of the determination result when performing the interference sound determination for determining whether or not the noise component is included in the section for at least one frequency component of each section of the received sound signal Means,
    For each section of the received sound signal, any one of the target sound dominant spectrum candidates is selected to form a target sound dominant spectrum, and at least the determination result of the reliability determination means related to the section is used. Target sound selection means for applying the selection processing method determined in this way to the selection processing of the section;
    Using the noise spectrum and the target sound dominant spectrum, the received sound signal is caused to function as a separation means for separating the noise component and the target sound component from each other. program.
  9. In the sound source separation method for performing sound source separation processing for separating the target sound from the noise that may include interference sound in addition to the background sound from the input signal,
    The target sound dominant spectrum candidate forming means, the noise dominant spectrum forming means, the reliability determining means, the target sound selecting means, and the separating means are provided. The target sound dominant spectrum candidate forming means includes a plurality of microphones arranged at intervals. Among these, the target sound in which the target sound component becomes dominant by performing a process of forming a blind spot in a direction other than the target arrival direction in which the target sound is supposed to arrive, with respect to the spectrum of the reception signals of the two microphones. Forming multiple dominant spectrum candidates,
    The noise dominant spectrum forming means performs a process of forming a blind spot in a direction within a predetermined range including the expected arrival direction of the target sound with respect to the spectrum of the received signal, and generates a noise dominant spectrum in which a noise component is dominant. Forming,
    The reliability determination unit is configured to provide a reliability of a determination result when performing a disturbing sound determination for determining whether or not an interfering sound component is included in at least one frequency component of each section of the received sound signal. Judging gender,
    The target sound selection means selects any one of the target sound dominant spectrum candidates for each section of the received signal to form a target sound dominant spectrum, and at least the reliability related to the section Apply the selection processing method determined using the determination result of the determination means to the selection processing of the section,
    The sound source separation method, wherein the separation means separates the noise component and the target sound component from the received signal using the noise spectrum and the target sound dominant spectrum.
JP2011079026A 2011-03-31 2011-03-31 sound source separation apparatus, program and method Active JP5772151B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011079026A JP5772151B2 (en) 2011-03-31 2011-03-31 sound source separation apparatus, program and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2011079026A JP5772151B2 (en) 2011-03-31 2011-03-31 sound source separation apparatus, program and method

Publications (2)

Publication Number Publication Date
JP2012215606A true JP2012215606A (en) 2012-11-08
JP5772151B2 JP5772151B2 (en) 2015-09-02

Family

ID=47268443

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2011079026A Active JP5772151B2 (en) 2011-03-31 2011-03-31 sound source separation apparatus, program and method

Country Status (1)

Country Link
JP (1) JP5772151B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014128013A (en) * 2012-12-27 2014-07-07 Canon Inc Noise rejection device and control method thereof
JP2014168188A (en) * 2013-02-28 2014-09-11 Fujitsu Ltd Microphone sensitivity correction device, method, program, and noise suppression device
WO2015125567A1 (en) * 2014-02-20 2015-08-27 ソニー株式会社 Sound signal processing device, sound signal processing method, and program
JP2016051081A (en) * 2014-08-29 2016-04-11 本田技研工業株式会社 Device and method of sound source separation
JP2018506228A (en) * 2015-01-12 2018-03-01 ユウトウ・テクノロジー(ハンジョウ)・カンパニー・リミテッド Multichannel digital microphone
US10049157B2 (en) 2014-09-04 2018-08-14 Aisin Seiki Kabushiki Kaisha Siren signal source detection, recognition and localization
WO2019116889A1 (en) * 2017-12-12 2019-06-20 ソニー株式会社 Signal processing device and method, learning device and method, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006197552A (en) * 2004-12-17 2006-07-27 Univ Waseda Sound source separation system and method, and acoustic signal acquisition device
JP2009020471A (en) * 2007-07-13 2009-01-29 Yamaha Corp Sound processor and program
JP2009086055A (en) * 2007-09-27 2009-04-23 Sony Corp Sound source direction detecting apparatus, sound source direction detecting method, and sound source direction detecting camera
JP2010217552A (en) * 2009-03-17 2010-09-30 Yamaha Corp Sound processing device and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006197552A (en) * 2004-12-17 2006-07-27 Univ Waseda Sound source separation system and method, and acoustic signal acquisition device
JP2009020471A (en) * 2007-07-13 2009-01-29 Yamaha Corp Sound processor and program
JP2009086055A (en) * 2007-09-27 2009-04-23 Sony Corp Sound source direction detecting apparatus, sound source direction detecting method, and sound source direction detecting camera
JP2010217552A (en) * 2009-03-17 2010-09-30 Yamaha Corp Sound processing device and program

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CSNG200700044225; 小林和則他: '"音源方向推定を用いた特定方向収音アダプティブアレー"' 日本音響学会2004年秋季研究発表会講演論文集-I- , 200409, pp.629-630 *
CSNJ201110039309; 水町光徳: '"空間フィルタリングと周波数フィルタリングの適応的統合による雑音除去"' 日本音響学会2011年春季研究発表会講演論文集CD-ROM , 201103, pp.681-682 *
JPN6014035570; 小林和則他: '"音源方向推定を用いた特定方向収音アダプティブアレー"' 日本音響学会2004年秋季研究発表会講演論文集-I- , 200409, pp.629-630 *
JPN6014035571; 水町光徳: '"空間フィルタリングと周波数フィルタリングの適応的統合による雑音除去"' 日本音響学会2011年春季研究発表会講演論文集CD-ROM , 201103, pp.681-682 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014128013A (en) * 2012-12-27 2014-07-07 Canon Inc Noise rejection device and control method thereof
JP2014168188A (en) * 2013-02-28 2014-09-11 Fujitsu Ltd Microphone sensitivity correction device, method, program, and noise suppression device
WO2015125567A1 (en) * 2014-02-20 2015-08-27 ソニー株式会社 Sound signal processing device, sound signal processing method, and program
US10013998B2 (en) 2014-02-20 2018-07-03 Sony Corporation Sound signal processing device and sound signal processing method
JP2016051081A (en) * 2014-08-29 2016-04-11 本田技研工業株式会社 Device and method of sound source separation
US10049157B2 (en) 2014-09-04 2018-08-14 Aisin Seiki Kabushiki Kaisha Siren signal source detection, recognition and localization
JP2018506228A (en) * 2015-01-12 2018-03-01 ユウトウ・テクノロジー(ハンジョウ)・カンパニー・リミテッド Multichannel digital microphone
WO2019116889A1 (en) * 2017-12-12 2019-06-20 ソニー株式会社 Signal processing device and method, learning device and method, and program

Also Published As

Publication number Publication date
JP5772151B2 (en) 2015-09-02

Similar Documents

Publication Publication Date Title
US8660281B2 (en) Method and system for a multi-microphone noise reduction
JP4163294B2 (en) Noise suppression processing apparatus and noise suppression processing method
CN1664610B (en) The method of using a microphone array bunching
US7464029B2 (en) Robust separation of speech signals in a noisy environment
US8675890B2 (en) Speaker localization
DK2701145T3 (en) Noise cancellation for use with noise reduction and echo cancellation in personal communication
JP4588966B2 (en) Method for noise reduction
JP5762956B2 (en) System and method for providing noise suppression utilizing nulling denoising
US8867759B2 (en) System and method for utilizing inter-microphone level differences for speech enhancement
US9438992B2 (en) Multi-microphone robust noise suppression
JP2008512888A (en) Telephone device with improved noise suppression
JP5855571B2 (en) Audio zoom
RU2483439C2 (en) Robust two microphone noise suppression system
JP2007535853A (en) Adaptive beamformer, sidelobe canceller, hands-free communication device
KR20130035990A (en) Enhanced blind source separation algorithm for highly correlated mixtures
US8180067B2 (en) System for selectively extracting components of an audio input signal
JP4225430B2 (en) Sound source separation device, voice recognition device, mobile phone, sound source separation method, and program
Yousefian et al. A dual-microphone speech enhancement algorithm based on the coherence function
KR20090017435A (en) Noise reduction by combined beamforming and post-filtering
KR101339592B1 (en) Sound source separator device, sound source separator method, and computer readable recording medium having recorded program
JP5675848B2 (en) Adaptive noise suppression by level cue
JP4378170B2 (en) Acoustic device, system and method based on cardioid beam with desired zero point
JP3484112B2 (en) Noise component suppression processing apparatus and noise component suppression processing method
JP2004187283A (en) Microphone unit and reproducing apparatus
JP4162604B2 (en) Noise suppression device and noise suppression method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20131115

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20140728

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20140826

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150317

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150515

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20150602

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20150615

R150 Certificate of patent or registration of utility model

Ref document number: 5772151

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150