WO2019176153A1

WO2019176153A1 - Sound pickup device, storage medium, and method

Info

Publication number: WO2019176153A1
Application number: PCT/JP2018/039209
Authority: WO
Inventors: 隆矢頭
Original assignee: 沖電気工業株式会社
Priority date: 2018-03-12
Filing date: 2018-10-22
Publication date: 2019-09-19
Also published as: JP7067146B2; JP2019161400A

Abstract

Provided is a sound pickup device that efficiently and effectively performs area sound pickup. The present invention relates to a sound pickup device. This sound pickup device is characterized by comprising a sound pickup unit that picks up a target area sound in a target area on the basis of input signals respectively inputted from two microphone arrays corresponding to a combination of arbitrarily defined two, other than mutually parallel ones, out of sides or diagonal lines of an N-gon (N is an integer of 3 or more) that constitute a microphone array unit in which N microphones are respectively arranged at the apexes of the N-gon.

Description

Sound collecting device, storage medium and method

The present invention relates to a sound collection device, a storage medium, and a method, and can be applied to, for example, a voice communication system used in a noisy environment.

When using a voice communication system or a voice recognition application system in a noisy environment, ambient noise mixed with the required target voice is a troublesome existence that hinders good communication and lowers the voice recognition rate. Conventionally, in such an environment where a plurality of sound sources exist, a beamformer using a microphone array is used as a technique for obtaining a necessary target sound by separating and collecting only sound in a specific direction to avoid mixing unnecessary sound. (Beam Former; hereinafter referred to as “BF”; see Patent Document 2). BF is a technique for forming directivity by using the time difference between signals reaching each microphone. However, when there is another sound source around an area for sound collection (hereinafter referred to as “target area”) with BF alone, sound existing in the target area (hereinafter referred to as “target area sound”). It is difficult to pick up only the sound. For this reason, an area sound pickup method for picking up a target area using a plurality of microphone arrays has been proposed in the prior art.

FIG. 13 is an explanatory diagram showing a process of collecting the target area sound from the sound source in the target area using the two microphone arrays MA1 and MA2. FIG. 13A is an explanatory diagram showing a configuration example of each of the microphone arrays MA1 and MA2. FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 13A in the frequency domain, respectively. FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 in the frequency domain, respectively. In FIG. 13, each microphone array MA1, MA2 is composed of two microphones MC1, MC2.

In the conventional area sound collection, as shown in FIG. 13A, the directivities of the microphone arrays MA1 and MA2 are intersected at areas (target areas) where sound collection is desired from different directions. In the state of FIG. 13A, the directivity of each of the microphone arrays MA1 and MA2 includes not only the sound existing in the target area (target area sound) but also noise in the target area direction (non-target area sound). However, as shown in FIGS. 13B and 13C, when the directivities of the microphone arrays MA1 and MA2 are compared in the frequency domain, the target area sound component is included in both outputs, but the non-target area sound component is included in each microphone array. It will be different. In the conventional area sound collection technique, only the target area sound can be extracted by using such characteristics and suppressing components other than those commonly included in the BF outputs of the two microphone arrays MA1 and MA2.

JP 2014-072708 A JP 2005-195955 A

By the way, a mobile communication device such as a smart phone is a typical example of voice communication in a noisy environment. In recent years, the use of voice recognition applications has progressed, and there are many cases where voice input is performed while viewing the screen. In this case, since the smartphone is separated from the mouth, it is more susceptible to external noise. In addition, in a handset (handset) provided in an emergency vehicle, a large noise such as a siren is an obstacle, preventing an emergency information transmission. In such a use environment, the area sound collection technology is expected as an effective solution. That is, two microphone arrays are installed in front of the smart phone or around the mouthpiece of the handset, and the directivity of each of the two microphone arrays is crossed in front of the mouthpiece to make the area sound collection function. Therefore, it becomes possible to accurately transmit only the voice of the sender by eliminating the ambient noise.

However, it is not always easy to mount many microphone arrays in the limited space of these devices in order to realize area sound collection.

Therefore, a sound collection device, a storage medium, and a method that can perform area sound collection efficiently and effectively are desired.

In the first aspect of the present invention, among the N-sides and / or diagonal lines constituting the microphone array section in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective vertices of the N-side. A sound collection unit is provided that collects a target area sound in a target area based on input signals input from two microphone arrays corresponding to any two combinations other than those parallel to each other.

The computer-readable non-transitory storage medium of the second aspect of the present invention is a computer, and a microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon. Based on the input signals input from each of the two microphone arrays corresponding to any two combinations of the N-gonal sides and / or diagonals that are not parallel to each other, the target area sound of the target area is obtained. A sound collection program that functions as sound collection means for collecting sound is stored.

According to a third aspect of the present invention, in the sound collection method performed by the sound collection device, the microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon. A target area sound of the target area is collected based on input signals input from each of two microphone arrays corresponding to any two combinations of N-sides and / or diagonal lines except those parallel to each other. .

According to the present invention, it is possible to provide a sound collection device that performs area sound collection efficiently and effectively.

It is the block diagram shown about the structure (including the functional structure of the sound collection part (sound collection apparatus) which concerns on 1st Embodiment) of each apparatus which concerns on 1st Embodiment. It is the figure (plan view) shown about the external appearance of the communication apparatus provided with the sound collection part (sound collection apparatus) which concerns on 1st Embodiment. It is explanatory drawing (image figure) of the microphone array arrangement | positioning which directs the directivity by BF of two microphone arrays to a target area from a different direction. It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. It is explanatory drawing (image figure) shown about the outline | summary of the area sound collection process using three microphones. It is explanatory drawing (image figure) shown about the structural example of the microphone array formed by three microphones. It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. It is a block diagram which shows the structure which concerns on the subtraction type | mold BF in case the number of microphones is two. It is a figure which shows the directivity characteristic formed by the subtraction type | mold BF using two microphones. It is a figure which shows the directivity characteristic formed by the subtraction type | mold BF using two microphones. It is the block diagram shown about the structure (including the functional structure of the sound collection part (sound collection apparatus) which concerns on 1st Embodiment) of each apparatus which concerns on 2nd Embodiment. It is the figure shown about the structure of the microphone array part of the polygon (pentagon, N = 5) which concerns on 2nd Embodiment. It is explanatory drawing (image figure) shown about the example of the area sound collection process using the microphone array part of the polygon (pentagon, N = 5) which concerns on 2nd Embodiment. It is explanatory drawing shown about the structural example at the time of directivity by the beam former (BF) of two microphone arrays toward a target area from a separate direction in the conventional sound collection device.

(A) First Embodiment Hereinafter, a first embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the sound collection device, program, and method of the present invention are applied to a sound collection unit will be described.

First, the basic principle of area sound collection processing using the microphone array in the first embodiment will be described with reference to FIGS.

As shown in FIG. 3, at least two or more microphone arrays are usually required to realize area sound collection.

In FIG. 3, the directivities of the two microphone arrays MA100 and MA200 are crossed in a sound collection area (target area) in different directions. In FIG. 3, the microphone arrays MA100 and MA200 are each composed of two microphones ch1 and ch2. Furthermore, in FIG. 3, the directivity of microphone array MA100 is illustrated by a broken line, and the directivity of microphone array MA200 is illustrated by a one-dot chain line. In FIG. 3, the sound collection areas where the directivities of the two microphone arrays MA100 and MA200 intersect are hatched.

On the other hand, in a device (hereinafter also referred to as a “sending device”) including a handset, a smart phone, or other transmitter (a device including a microphone that captures the voice of a speaker) The position of the speaker's mouth is limited to a narrow area closest to (for example, within several centimeters) from the mouthpiece (the portion where the microphone is disposed).

Therefore, in the case of a microphone such as a handset or a smart phone that uses the two microphone arrays MA100 and MA200 to collect the area of the speaker's mouth, as shown in FIG. 4A, the two microphone arrays MA100 and MA200 It is necessary to arrange them close to each other. At this time, when the distance between the two microphone arrays MA100 and MA200 is brought close to the limit, as shown in FIG. 4B, some microphones of the two microphone arrays MA100 and MA200 (in FIG. 4B, the microphone ch1 disposed inside). Are almost overlapped. Therefore, in the state of FIG. 4B, the microphone ch1 of the microphone array MA100 and the microphone ch1 of the microphone array MA200 can be shared and replaced with the configuration of three microphones ch1 to ch3 as shown in FIG. 4C. .

That is, in a transmitter that needs to pick up an area in a narrow area closest to the mouthpiece of the transmitter, as shown in FIG. 4C, a microphone array including a minimum of three microphones ch1 to ch3, It is possible to realize area sound collection of speech uttered from the speaker's mouth.

Therefore, in a transmitter that needs to pick up an area in a narrow area closest to the transmitter, it is possible to realize area pickup using three microphones ch1 to ch3 as shown in FIG. In FIG. 5, the line segment L101 connecting the microphone ch1 and the microphone ch2 and the line segment L102 connecting the microphone ch2 and the microphone ch3 have an angle (that is, ∠ ≠ 180 °). ) Three microphones ch1 to ch3 are arranged. Further, in FIG. 5, a first microphone array having a pair of microphones ch1 and ch2 at both ends of the line segment L101 and a second microphone array having a pair of microphones ch2 and ch3 at both ends of the line segment L102 are configured. The directivity of each microphone array is crossed toward the inner angle direction (sound collection area). In FIG. 5, the directivity of the first microphone array is indicated by a broken line, and the directivity of the second microphone array is indicated by a one-dot chain line. Further, in FIG. 5, hatching (hatched lines) is given to regions where the directivities of the first microphone array and the second microphone array overlap.

Then, in the microphones ch1 to ch3, as shown in FIG. 6, a maximum of three microphone arrays (three microphone arrays having different directivity directions) can be set by combining the microphones. As shown in FIG. 6, the microphones ch1 to ch3 include a microphone array MA301 having a pair of microphones ch1 and ch2, a microphone array MA302 having a pair of microphones ch2 and ch3, and a microphone array MA303 having a pair of microphones ch3 and ch1. Can be set.

In the microphones ch1 to ch3, as shown in FIG. 7, it is possible to collect areas according to combinations (three combinations) of three microphone arrays MA301, MA302, and MA303.

7A, the directivity of the microphone array MA301 is illustrated by a one-dot chain line, and the directivity of the microphone array MA302 is illustrated by a two-dot chain line. In FIG. 7B, the directivity of the microphone array MA302 is illustrated by a one-dot chain line, and the directivity of the microphone array MA303 is illustrated by a two-dot chain line. Further, in FIG. 7C, the directivity of microphone array MA301 is illustrated by a one-dot chain line, and the directivity of microphone array MA303 is illustrated by a two-dot chain line. Furthermore, in FIG. 7A, the sound collection area A301 corresponding to the combination of the microphone arrays MA301 and MA302 is hatched. In FIG. 7B, the sound collection area A302 corresponding to the combination of the microphone arrays MA302 and MA303 is hatched. Further, in FIG. 7C, the sound collection area A303 corresponding to the combination of the microphone arrays MA301 and MA303 is hatched.

As shown in FIG. 7, in any of the microphones ch1 to ch3, any microphone array (MA301 to MA303) has an angle between the microphone arrays (lines connecting the positions of the two microphones constituting the microphone array). It is possible to achieve different area sound collection for each combination (area sound collection in different areas) by crossing the directivities of each other.

As shown in FIGS. 6 and 7, the sound collection unit (sound collection device) of the first embodiment to be described later is configured to perform area sound collection using three microphones arranged at each vertex of a triangle. ing.

(A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing the configuration of each device related to the first embodiment.

FIG. 1 illustrates a communication device 100 including the sound collection unit 110 according to the first embodiment and a communication device 200. In FIG. 1, the

communication apparatuses

100 and 200 can communicate with each other via the communication path P.

The communication device 100 is a device that captures and collects voice (sound) uttered by the speaker U1, and transmits voice data of the collected voice to the communication device 200 via the communication path P. The communication device 200 is a device that outputs a sound based on the sound data received from the communication device 100 to the listener U2.

The speaker U1 corresponds to, for example, a worker who works at a disaster relief site (for example, a worker who works in a noisy environment), and the listener U2 is a remote site (for example, a disaster rescue site). The person in charge of command at the command center)

Next, an outline of the configuration of the communication apparatus 100 will be described with reference to FIGS.

The communication apparatus 100 includes a microphone array unit 1 including three microphones MC1 to MC3, a sound collection unit 110 that collects speech uttered by the speaker U1 based on an acoustic signal captured by the microphone array unit 1, The communication unit 120 transmits audio data based on the sound collected by the sound unit 110 to the communication device 200 by communication (communication via the communication path P).

The communication path P is not limited to wired and wireless, and various connection means and connection configurations (network configurations) can be applied.

Although the hardware configuration of the communication device 100 is not limited, in the example of this embodiment, as shown in FIG. 2, in the example of this embodiment, the communication device 100 is smart in terms of hardware. It is assumed that the configuration is a phone (smart phone owned by the speaker U1).

In the example of this embodiment, as shown in FIG. 1, the microphone array unit 1 is composed of three microphones MC1 to MC3 (that is, N = 3 in FIG. 1).

Next, the detailed configuration of the sound collection unit 110 will be described with reference to FIG.

The sound collection unit 110 includes a signal input unit 2, a frequency conversion unit 3, a directivity forming unit 4, and a target area sound extraction unit 5.

For example, the sound collection unit 110 may cause a computer including a processor, a memory, and the like to execute a program (including the sound collection program according to the embodiment). It can be shown as in FIG. Further, the communication apparatus 100 may be provided with a computer-readable non-transitory storage medium that stores such a program.

Next, an outline of the configuration of the communication apparatus 200 will be described with reference to FIG.

Although the hardware configuration of the communication device 200 is not limited, for example, various telephone devices (for example, speakerphones) can be applied.

The communication device 200 includes a communication unit 210 that receives audio data from the communication device 100 via the communication path P, and a speaker 6 that outputs a sound based on the audio data received by the communication unit 210 to the listener U2. Have.

Next, the configuration of the microphone array unit 1 will be described with reference to FIG.

In the example of this embodiment, it is assumed that the microphone array section 1 has three microphones MC1 to MC3.

As shown in FIG. 2, in the example of this embodiment, since the communication device 100 has a smartphone configuration, in the communication device 100, the three microphones MC1 to MC3 are connected to the normal mouthpiece in the smartphone. It is desirable to be arranged around the part (end opposite to the part where the speaker SP is arranged). In other words, in the communication device 100, the three microphones MC1 to MC3 can be arranged around a portion facing the mouth of the speaker U1 when using the communication device 100 (a portion closest to the mouth of the speaker U1). desirable. In FIG. 2, when the speaker U1 holds the communication device 100 with his hand and presses the speaker SP to his ear, the part where the mouth of the speaker U1 is located (the lower part when viewed from the direction of FIG. 2) Three microphones MC1 to MC3 are arranged around the periphery (around the portion closest to the mouth of the speaker U1).

In the communication apparatus 100 (microphone array unit 1) shown in FIG. 2, as in the microphone arrays shown in FIGS. 6 and 7, the positions of the three microphones MC1 to MC3 (center positions of the microphones) are the apexes of an equilateral triangle It is arranged to become. In FIG. 2, for simplicity of explanation, the sides of the triangles formed by the microphones MC1 to MC3 have the same distance (the triangle formed by the microphones MC1 to MC3 is an equilateral triangle). Not necessarily.

As shown in FIG. 2, in the communication device 100 (microphone array unit 1), hereinafter, the microphone array paired with the microphones MC1 and MC2 is MA1, the microphone array paired with the microphones MC2 and MC3 is MA2, and the microphone. A microphone array paired with MC3 and MC1 is referred to as MA3.

(A-2) Operation of the First Embodiment Next, the operation of the first embodiment having the above-described configuration (sound collection method according to the embodiment) will be described.

In the communication apparatus 100, the sound collection unit 110 performs a target area sound collection process for collecting the target area sound in the target area using the acoustic signals supplied from the microphones MC1 to MC3 of the microphone array unit 1.

Hereinafter, the operation inside the sound collection unit 110 constituting the communication apparatus 100 will be mainly described.

The signal input unit 2 converts an acoustic signal collected by each of the microphones MC1 to MC3 from an analog signal to a digital signal, and supplies it to the frequency conversion unit 3. Thereafter, the frequency conversion unit 3 converts the microphone signal from the time domain to the frequency domain using, for example, fast Fourier transform. The directivity forming unit 4 forms directivity by BF.

Here, directivity formation by BF will be described with reference to FIGS.

BF is a technique for forming the directivity of sound collection using the time difference between signals reaching each microphone in the microphone array (see Non-Patent Document 1). The BF is roughly classified into two types, an addition type and a subtraction type. Here, a subtraction type BF that can form directivity with a small number of microphones will be described.

FIG. 8 is a block diagram showing a configuration related to the subtraction type BF300 when the number of microphones is two (MC1, MC2).

The subtraction type BF 300 first calculates a time difference between signals that sound existing in a target direction (hereinafter referred to as “target sound”) arrives at the microphones MC1 and MC2 by the delay unit 310, and adds a delay to the target. Match the phase of the sound. The time difference is calculated by equation (1). Here, d is the distance between the microphones MC1 and MC2, c is the speed of sound, and τ _i is the amount of delay. Θ _L represents an angle from a vertical direction to a target direction with respect to a straight line connecting the positions of the microphones MC1 and M2.

Here, when the blind spot exists in the direction of the microphone MC1 with respect to the centers of the microphones MC1 and MC2, the delay unit 310 performs a delay process on the input signal x ₁ (t) of the microphone MC1. Thereafter, the subtractor 320 performs a subtraction process according to the equation (2). In the subtractor 320, this subtraction process can be similarly performed in the frequency domain. In this case, the expression (2) is changed to the expression (3).

Here, when θ _L = ± π / 2, the formed directivity is cardioid unidirectional as shown in FIG. 9A, and when θ _L = 0, π, the directivity is as shown in FIG. 9B. Eight-shaped bi-directionality. In addition, the subtractor 320 can form directivity that is strong against a blind spot of bi-directionality by using spectral subtraction processing (hereinafter also simply referred to as “SS”). The directivity by SS is formed at all frequencies or a designated frequency band according to the equation (4). (4) In the formula, is used to input signals _{X 1} microphone MC1, it is possible to obtain the same effect input signal _{X 2} microphones MC2. Here, n represents a frame number, and β represents a coefficient for adjusting the strength of SS. In the subtractor 320, when the value becomes negative at the time of subtraction, flooring processing may be performed in which 0 or the original value is replaced with a smaller value. In this method, sound that exists in a direction other than the target direction (hereinafter referred to as “non-target sound”) is extracted based on the characteristics of bi-directionality, and the amplitude spectrum of the extracted non-target sound is subtracted from the amplitude spectrum of the input signal. The target sound can be emphasized.

By the way, when it is desired to pick up only a target area sound existing in a specific target area, the sound source (hereinafter referred to as “non-target area sound”) that exists on a line in the same direction as that area can be obtained only by using the subtraction type BF. Call).

Therefore, the directivity forming unit 4 uses the area sound collection process proposed in Patent Document 1 (using a plurality of microphone arrays, directing directivity from different directions to the target area, and crossing the directivity in the target area. In the following description, the target area sound is collected). Specifically, the directivity forming unit 4 illustrated in FIG. 1 may perform area sound collection processing by the following processing.

For example, the directivity forming unit 4 has a microphone array MA1 composed of microphones MC1 and MC2 and a microphone array MA3 composed of microphones MC1 and MC3, respectively, by BF toward the inside of the triangles of the microphones MC1 to MC3. A directivity is formed, and the directivity of the microphone array is crossed in front of the mouthpiece (an area inside the triangle formed by the microphones MC1 to MC3), which is an area (target area) where it is desired to collect sound from different directions (Fig. 7C).

The target area sound extraction unit 5 converts the BF output data Y ₁ (n) and Y ₂ (n) of the microphone array MA1 formed by the directivity forming unit 4 and the microphone array MA3 to the formula (5) or (6) SS is performed to extract the non-target area sounds N ₁ (n) and N ₂ (n) existing in the target area direction. In the following, it is assumed that the BF output data of the microphone array MA1 is Y ₁ (n) and the BF output data of the microphone array MA3 is Y ₂ (n). In the following, it is assumed that the non-target area sound output data of the microphone array MA1 is N ₁ (n) and the non-target area sound output data of the microphone array MA3 is N ₂ (n).

Here, α ₁ and α ₂ are correction coefficients for correcting a difference in signal level caused by a difference in distance between the target area and each microphone array, and should be calculated one by one by a predetermined process. It is also described in Document 1. Here, for simplicity, it is assumed that the distance between the target area and each of the microphone arrays MA1 and MA3 is the same (α ₁ (n) = α ₂ (n) = 1)) (5), Equation (6) is changed to (7), Instead of the equation (8).

Thereafter, the target area sound extraction unit 5 extracts the target area sound by SS of the non-target area sound from each BF output according to the equations (9) and (10), and extracts the target area emphasized sound Z ₁ (n), Z _2. (N) is generated. Note that γ ₁ (n) and γ ₂ (n) indicate coefficients for changing the strength at the time of SS. The target area sound extraction unit 5 may output both the target area emphasized sounds Z ₁ (n) and Z ₂ (n), or may output only one of them.

The target area emphasized sound (Z ₁ (n) and / or Z ₂ (n)) generated by the target area sound extraction unit 5 is supplied to the communication unit 120. The communication unit 120 transmits audio data based on the supplied target area emphasized sound to the communication device 200 via the communication path P.

In the communication device 200, the communication unit 210 receives the audio data, and outputs an acoustic signal based on the audio data received by the speaker 6 (sound output toward the listener U2).

(A-3) Effects of the First Embodiment According to the first embodiment, the following effects can be achieved.

In the first embodiment, area sound collection is realized by configuring two microphone arrays with three microphones MC1 to MC3. Further, in the area sound collection processing performed by the sound collection unit 110, frequency analysis (discrete Fourier transform or the like) occupies a large amount of processing, which is necessary for each microphone. That is, in the area sound collection processing performed by the sound collection unit 110, the fact that the number of processing target microphones can be reduced can also reduce the processing amount of the entire area sound collection processing. In the first embodiment, since two microphone arrays are constituted by three microphones, mounting to a small space device such as a smart phone or a handset can be performed efficiently.

In the configuration of the conventional area sound collection process (for example, the structure shown in FIG. 3), at least four microphones are used (two microphone arrays composed of two microphones are used) to collect one area sound. In the first embodiment, the microphone array unit 1 including the three microphones MC1 to MC3 can realize sound collection in one to three areas. In addition, frequency analysis (discrete Fourier transform, etc.) occupies most of the processing amount of area sound collection, and these are necessary processes for each microphone. Therefore, even if three area sound collection is performed, the number of microphones increases. If there is no configuration of the first embodiment, the increase in the processing amount is also small.

As described above, in the sound collection unit 110 of the first embodiment, a plurality of area sounds can be collected with a smaller number of microphones, and an effective noise environment countermeasure can be achieved with a simple configuration even in a limited mounting space. It becomes possible.

(B) Second Embodiment Hereinafter, a second embodiment of the sound collection device, program and method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the sound collection device, program, and method of the present invention are applied to a sound collection unit will be described.

(B-1) Configuration of Second Embodiment FIG. 11 is a block diagram showing the configuration of each device related to the second embodiment.

The second embodiment is different from the first embodiment in that the communication device 100 is replaced with a communication device 100A.

Further, the communication device 100A of the second embodiment is different from the first embodiment in that the microphone array unit 1 and the sound collection unit 110 are replaced with the microphone array unit 1A and the sound collection unit 110A. Furthermore, in the sound collection unit 110A of the second embodiment, the signal input unit 2, the frequency conversion unit 3, the directivity forming unit 4, and the target area sound extraction unit 5 are the signal input unit 2A, the frequency conversion unit 3A, and the directivity. This is different from the first embodiment in that it is replaced with a sex forming unit 4A and a target area sound extraction unit 5A.

As shown in FIGS. 1 and 2, the microphone array unit 1 of the first embodiment is configured such that microphones MC1 to MC3 are arranged at the corner vertices of a triangle. = 3) It is good also as a polygonal structure which has more sides than it without being limited to it. Therefore, the microphone array unit 1A according to the second embodiment includes arbitrary N (N is an integer of 3 or more) microphones MC (MC1 to MCN). That is, the microphone array unit 1A of the second embodiment is configured by N microphones MC1 to MCN arranged at the positions of the vertices of an N-gon (a polygon having N sides and corners). .

The N-shaped microphone array unit 1A composed of N microphones is configured with a minimum N = 3, but the number of area sound collections that can be realized by increasing N increases significantly.

As described above, the sound collection unit 110A performs area sound collection processing based on acoustic signals captured by the N microphones MC1 to MCN. The sound collection unit 110A uses a microphone array formed by a combination of any two sides or diagonal lines except for those parallel to each other in an N-gon (N-gon formed by the positions of N microphones). Area sound collection processing is possible. For example, since there are N combinations of only the combinations of adjacent sides in the above N-gon, N (N locations) area sound collection is possible. Further, in the above-described N-gon, more area sounds can be collected by including two pairs of non-adjacent sides and diagonal lines.

FIG. 12 is an explanatory diagram showing a configuration example of the microphone array unit 1A when the number of microphones is 5 (N = 5). As described above, the number (N) of microphones in the microphone array unit 1A is not limited to 3 or 5.

In the microphone array section 1A shown in FIG. 12, microphones MC1 to MC5 are arranged at the corner apexes of the pentagon. As shown in FIG. 12, in the second embodiment, the side between the microphone MC1 and the microphone MC2 is S1, the side between the microphone MC2 and the microphone MC3 is S2,..., And between the microphone MC5 and the microphone MC1. Let S5 be called S5. Also, as shown in FIG. 12, in the second embodiment, the diagonal line between the microphone MC1 and the microphone MC3 is L1, the diagonal line between the microphone MC2 and the microphone MC4 is L2,..., The microphone MC5 and the microphone MC2. The diagonal line between is called L5.

In the second embodiment, the microphone array composed of microphones at both ends of the side S1 is MA11, the microphone array composed of microphones at both ends of the side S2 is MA12,..., The microphones composed of microphones at both ends of the side S5. Let the array be called MA15. In the second embodiment, the microphone array composed of microphones at both ends of the diagonal L1 is composed of MA21, the microphone array composed of microphones at both ends of the diagonal L2 is composed of microphones at both ends of the MA22,. This microphone array is called MA25.

Although a minimum of two microphones are required to configure the microphone array, as shown in FIG. 12, microphone arrays MA11 to MA11 formed by diagonal lines and sides of the pentagonal microphone array section 1A (microphones MC1 to MC5). The number of MA15 and MA21 to MA25 is ₅ C ₂ (= 10). This corresponds to the total number of the five sides S1 to S5 and the five diagonal lines L1 to L5 of the pentagon shown in FIG.

Since the area sound collection is realized by two microphone arrays having different directivity directions, in the case of the pentagonal (regular pentagonal) microphone array unit 1A (microphones MC1 to MC5), the above-described ten microphones are used. There is no single combination of microphone arrays MA11 to MA15, MA21 to MA25 (sides S1 to S5, diagonal lines L1 to L5) that are parallel to each other. Therefore, in the case of the pentagonal (regular pentagonal) microphone array section 1A (microphones MC1 to MC5), area sound collection is possible with a maximum of ₁₀ C ₂ (= 28) combinations. For directivity formation and area sound collection processing by BF, it is not preferable that the distance between the microphones is not the same, but considering only the combination, in the first embodiment, the triangular microphone array section 1 ( The three microphones MC1 to MC3) achieve three areas of sound collection, whereas the microphone array unit 1A (microphones MC1 to MC5) of the second embodiment actually has 28 microphones by adding only two microphones. Area sound collection is realized by a combination of streets. Further, in the microphone array section 1A (microphones MC1 to MC5) of the second embodiment, area sound collection is possible with 10 combinations even if only the sides where all the microphone intervals are equal are used.

(B-2) Operation of the Second Embodiment Next, the operation of the first embodiment having the above configuration (sound collection method according to the embodiment) will be described.

In the communication device 100A, the sound collection unit 110A performs target area sound collection processing for collecting the target area sound from the sound source in the target area using the acoustic signals supplied from the microphones MC1 to MCN of the microphone array unit 1A. Do.

Hereinafter, differences between the sound collection unit 110A and the internal configuration of the communication device 100A from the first embodiment will be described.

The signal input unit 2A converts the acoustic signals picked up by the microphones MC1 to MCi (i = N) from analog signals to digital signals to generate microphone signals x ₁ to x _i (i = N), and performs frequency conversion. Supply to part 3A.

The frequency converter 3A converts the microphone signals x ₁ to x _i (i = N) from the time domain to the frequency domain to generate microphone signals X ₁ to X _i (i = N), and sends them to the directivity forming unit 4A. Supply.

The directivity forming unit 4A selects a microphone signal related to a microphone array necessary for forming a desired directivity, and obtains a BF output Y ₁ to Y _j with directivity directed toward the sound collection area for each microphone array. Then, the process of supplying to the target area sound extraction unit 5A is performed. Note that j is the number of BF outputs (microphone arrays) necessary for area sound collection, and is an integer between 2 and the maximum number M of microphone arrays that can be configured by the N-shaped microphone array unit 1A. For example, when N = 3 (triangle), M is 3, and when N = 5 (pentagon), M is 10.

Sound object area extracting unit 5A, destination area emphasized sound Z _{1 ~} Z _k of the target area sound extracted (emphasized) of the desired object area using BF Output Y _{1 ~} Y _j generated by beamforming unit 4A Is generated and output. Note that k is the number of desired destination area sounds (sound collection areas). If the number of desired destination area sound one, the destination area sound extraction unit 5A, thereby outputting only one object areas emphasized sound Z ₁ of.

For example, as shown in FIG. 12, an example of area sound collection when the number of microphones in the microphone array section 1A is 5 (N = 5) will be described.

FIG. 12 is an explanatory diagram showing an example of area sound collection processing of the microphone array section 1A when the number of microphones is 5 (N = 5).

In the example of FIG. 12, the directivity of the microphone array MA11 corresponding to the side S1 (directivity illustrated by a dotted line) and the directivity of the microphone array MA12 corresponding to the side S2 (directivity illustrated by a one-dot chain line) overlap. It shows that the area is picked up in the area A401 (area with hatching). In the example of FIG. 12, the directivity forming section 4A generates the BF output _{Y 1} of the microphone array MA11 using microphone signals _X 1, _{X 2,} further microphone array by using the microphone signals _X 2, _{X 3} It will produce the BF output _{Y 2} of MA12.

In the example of FIG. 12, the target area sound extraction unit 5A extracts (emphasizes) the target area sound of the desired target area using the BF outputs Y ₁ and Y ₂ generated by the directivity forming unit 4A. It will be generated and output areas emphasized sound Z _1.

(B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.

In the second embodiment, a large number of area sounds can be picked up with a smaller number of microphones by picking up an area sound using the N-shaped (polygonal) microphone array unit 1A. Conventionally, in order to collect N area sounds, N microphone arrays including at least two microphones are required (2 × N microphones in total). In contrast, in this embodiment, by performing area sound collection using the N-shaped microphone array unit 1A, there are restrictions on the positions of sound collection areas that can be set, but at least N microphones with N microphones. Therefore, it is possible to save the number of necessary microphones to ½ or less.

In the second embodiment, the target area can be set more efficiently by setting N to an odd number in the N-shaped microphone array unit 1A. For example, in the microphone array unit 1A, when a regular octagon is applied to an N-gon, parallel sides and diagonal lines are generated. Therefore, a regular heptagon is used to set a larger sound collection area with a smaller number of microphones. Can do.

(C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

(C-1) In each of the above embodiments, the

sound collection units

110 and 110A have been described as constituting a part of the

communication devices

100 and 100A, but may be configured as independent devices. In the above embodiments, the

sound collection units

110 and 110A have been described as not including the microphone array units 1 and 1A. However, the

sound collection units

110 and 110A and the microphone array units 1 and 1A are integrated as an apparatus. You may make it comprise.

(C-2) In each of the above embodiments, the example in which the sound collection device (

sound collection unit

110, 110A) of the present invention is applied to a hand-held transmitter such as a smart phone or a handset has been described. The sound collecting device is applied to a headset or a wearable device (for example, a head-mounted display with a microphone, a neckband type headphone with a microphone, etc.), and an area where the mouth of the speaker U1 is located when worn by the speaker U1. As the target area, an N-side microphone may be installed around the periphery (periphery) to perform area sound collection processing.

DESCRIPTION OF SYMBOLS 100 ... Communication apparatus, 1 ... Microphone array part, MC1-MC3 ... Microphone, 110 ... Sound collection part, 2 ... Signal input part, 3 ... Frequency conversion part, 4 ... Directionality formation part, 5 ... Target area sound extraction part, DESCRIPTION OF SYMBOLS 120 ... Communication part, 200 ... Communication apparatus, 210 ... Communication part, 6 ... Speaker, U1 ... Speaker, U2 ... Listener.

Claims

Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included A sound collection device having a sound collection unit that collects a target area sound of a target area based on input signals input from each of two microphone arrays corresponding to the two combinations.
The sound collection unit includes a directivity forming unit that forms directivity by a beamformer in an inner direction of the N-gon for each input signal input from the two microphone arrays,
A non-target area sound extraction unit that extracts a non-target area sound existing in the target area direction by performing spectral subtraction on the beamformer output of each microphone array;
The sound collection device according to claim 1, further comprising: a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the non-target area sound from a beamformer output of the microphone array.
The sound collecting device according to claim 1, wherein the N is an odd number.
Computer
Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included Based on the input signals input from each of the two microphone arrays corresponding to the two combinations of, and functioning as a sound collection unit for collecting the target area sound of the target area,
A computer-readable non-transitory storage medium that stores a sound collection program.
In the sound collection method performed by the sound collection device,
Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included Collecting the target area sound of the target area based on input signals input from each of the two microphone arrays corresponding to the two combinations of
Sound collection method.