WO2019176153A1 - Sound pickup device, storage medium, and method - Google Patents

Sound pickup device, storage medium, and method Download PDF

Info

Publication number
WO2019176153A1
WO2019176153A1 PCT/JP2018/039209 JP2018039209W WO2019176153A1 WO 2019176153 A1 WO2019176153 A1 WO 2019176153A1 JP 2018039209 W JP2018039209 W JP 2018039209W WO 2019176153 A1 WO2019176153 A1 WO 2019176153A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
target area
microphones
sound collection
microphone
Prior art date
Application number
PCT/JP2018/039209
Other languages
French (fr)
Japanese (ja)
Inventor
隆 矢頭
Original Assignee
沖電気工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 沖電気工業株式会社 filed Critical 沖電気工業株式会社
Publication of WO2019176153A1 publication Critical patent/WO2019176153A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to a sound collection device, a storage medium, and a method, and can be applied to, for example, a voice communication system used in a noisy environment.
  • a beamformer using a microphone array is used as a technique for obtaining a necessary target sound by separating and collecting only sound in a specific direction to avoid mixing unnecessary sound.
  • Beam Former hereinafter referred to as “BF”; see Patent Document 2).
  • BF is a technique for forming directivity by using the time difference between signals reaching each microphone.
  • target area sound when there is another sound source around an area for sound collection (hereinafter referred to as “target area”) with BF alone, sound existing in the target area (hereinafter referred to as “target area sound”). It is difficult to pick up only the sound. For this reason, an area sound pickup method for picking up a target area using a plurality of microphone arrays has been proposed in the prior art.
  • FIG. 13 is an explanatory diagram showing a process of collecting the target area sound from the sound source in the target area using the two microphone arrays MA1 and MA2.
  • FIG. 13A is an explanatory diagram showing a configuration example of each of the microphone arrays MA1 and MA2.
  • FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 13A in the frequency domain, respectively.
  • FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 in the frequency domain, respectively.
  • each microphone array MA1, MA2 is composed of two microphones MC1, MC2.
  • the directivities of the microphone arrays MA1 and MA2 are intersected at areas (target areas) where sound collection is desired from different directions.
  • the directivity of each of the microphone arrays MA1 and MA2 includes not only the sound existing in the target area (target area sound) but also noise in the target area direction (non-target area sound).
  • FIGS. 13B and 13C when the directivities of the microphone arrays MA1 and MA2 are compared in the frequency domain, the target area sound component is included in both outputs, but the non-target area sound component is included in each microphone array. It will be different.
  • only the target area sound can be extracted by using such characteristics and suppressing components other than those commonly included in the BF outputs of the two microphone arrays MA1 and MA2.
  • a mobile communication device such as a smart phone is a typical example of voice communication in a noisy environment.
  • voice recognition applications since the smartphone is separated from the mouth, it is more susceptible to external noise.
  • a large noise such as a siren is an obstacle, preventing an emergency information transmission.
  • the area sound collection technology is expected as an effective solution. That is, two microphone arrays are installed in front of the smart phone or around the mouthpiece of the handset, and the directivity of each of the two microphone arrays is crossed in front of the mouthpiece to make the area sound collection function. Therefore, it becomes possible to accurately transmit only the voice of the sender by eliminating the ambient noise.
  • N is an integer of 3 or more
  • a sound collection unit is provided that collects a target area sound in a target area based on input signals input from two microphone arrays corresponding to any two combinations other than those parallel to each other.
  • the computer-readable non-transitory storage medium of the second aspect of the present invention is a computer, and a microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon. Based on the input signals input from each of the two microphone arrays corresponding to any two combinations of the N-gonal sides and / or diagonals that are not parallel to each other, the target area sound of the target area is obtained. A sound collection program that functions as sound collection means for collecting sound is stored.
  • the microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon.
  • a target area sound of the target area is collected based on input signals input from each of two microphone arrays corresponding to any two combinations of N-sides and / or diagonal lines except those parallel to each other. .
  • FIG. 1 It is the block diagram shown about the structure (including the functional structure of the sound collection part (sound collection apparatus) which concerns on 1st Embodiment) of each apparatus which concerns on 1st Embodiment. It is the figure (plan view) shown about the external appearance of the communication apparatus provided with the sound collection part (sound collection apparatus) which concerns on 1st Embodiment. It is explanatory drawing (image figure) of the microphone array arrangement
  • At least two or more microphone arrays are usually required to realize area sound collection.
  • the directivities of the two microphone arrays MA100 and MA200 are crossed in a sound collection area (target area) in different directions.
  • the microphone arrays MA100 and MA200 are each composed of two microphones ch1 and ch2.
  • the directivity of microphone array MA100 is illustrated by a broken line, and the directivity of microphone array MA200 is illustrated by a one-dot chain line.
  • the sound collection areas where the directivities of the two microphone arrays MA100 and MA200 intersect are hatched.
  • sending device including a handset, a smart phone, or other transmitter (a device including a microphone that captures the voice of a speaker)
  • the position of the speaker's mouth is limited to a narrow area closest to (for example, within several centimeters) from the mouthpiece (the portion where the microphone is disposed).
  • the two microphone arrays MA100 and MA200 It is necessary to arrange them close to each other. At this time, when the distance between the two microphone arrays MA100 and MA200 is brought close to the limit, as shown in FIG. 4B, some microphones of the two microphone arrays MA100 and MA200 (in FIG. 4B, the microphone ch1 disposed inside). Are almost overlapped. Therefore, in the state of FIG. 4B, the microphone ch1 of the microphone array MA100 and the microphone ch1 of the microphone array MA200 can be shared and replaced with the configuration of three microphones ch1 to ch3 as shown in FIG. 4C. .
  • a microphone array including a minimum of three microphones ch1 to ch3 It is possible to realize area sound collection of speech uttered from the speaker's mouth.
  • the line segment L101 connecting the microphone ch1 and the microphone ch2 and the line segment L102 connecting the microphone ch2 and the microphone ch3 have an angle (that is, ⁇ ⁇ 180 °).
  • Three microphones ch1 to ch3 are arranged.
  • a first microphone array having a pair of microphones ch1 and ch2 at both ends of the line segment L101 and a second microphone array having a pair of microphones ch2 and ch3 at both ends of the line segment L102 are configured.
  • each microphone array is crossed toward the inner angle direction (sound collection area).
  • the directivity of the first microphone array is indicated by a broken line
  • the directivity of the second microphone array is indicated by a one-dot chain line.
  • hatching hatching (hatched lines) is given to regions where the directivities of the first microphone array and the second microphone array overlap.
  • the microphones ch1 to ch3 include a microphone array MA301 having a pair of microphones ch1 and ch2, a microphone array MA302 having a pair of microphones ch2 and ch3, and a microphone array MA303 having a pair of microphones ch3 and ch1. Can be set.
  • the directivity of the microphone array MA301 is illustrated by a one-dot chain line, and the directivity of the microphone array MA302 is illustrated by a two-dot chain line.
  • the directivity of the microphone array MA302 is illustrated by a one-dot chain line, and the directivity of the microphone array MA303 is illustrated by a two-dot chain line.
  • the directivity of microphone array MA301 is illustrated by a one-dot chain line, and the directivity of microphone array MA303 is illustrated by a two-dot chain line.
  • the sound collection area A301 corresponding to the combination of the microphone arrays MA301 and MA302 is hatched. In FIG.
  • the sound collection area A302 corresponding to the combination of the microphone arrays MA302 and MA303 is hatched. Further, in FIG. 7C, the sound collection area A303 corresponding to the combination of the microphone arrays MA301 and MA303 is hatched.
  • any microphone array (MA301 to MA303) has an angle between the microphone arrays (lines connecting the positions of the two microphones constituting the microphone array). It is possible to achieve different area sound collection for each combination (area sound collection in different areas) by crossing the directivities of each other.
  • the sound collection unit (sound collection device) of the first embodiment to be described later is configured to perform area sound collection using three microphones arranged at each vertex of a triangle. ing.
  • FIG. 1 is a block diagram showing the configuration of each device related to the first embodiment.
  • FIG. 1 illustrates a communication device 100 including the sound collection unit 110 according to the first embodiment and a communication device 200.
  • the communication apparatuses 100 and 200 can communicate with each other via the communication path P.
  • the communication device 100 is a device that captures and collects voice (sound) uttered by the speaker U1, and transmits voice data of the collected voice to the communication device 200 via the communication path P.
  • the communication device 200 is a device that outputs a sound based on the sound data received from the communication device 100 to the listener U2.
  • the speaker U1 corresponds to, for example, a worker who works at a disaster relief site (for example, a worker who works in a noisy environment), and the listener U2 is a remote site (for example, a disaster rescue site).
  • a disaster relief site for example, a worker who works in a noisy environment
  • the listener U2 is a remote site (for example, a disaster rescue site).
  • the communication apparatus 100 includes a microphone array unit 1 including three microphones MC1 to MC3, a sound collection unit 110 that collects speech uttered by the speaker U1 based on an acoustic signal captured by the microphone array unit 1,
  • the communication unit 120 transmits audio data based on the sound collected by the sound unit 110 to the communication device 200 by communication (communication via the communication path P).
  • the communication path P is not limited to wired and wireless, and various connection means and connection configurations (network configurations) can be applied.
  • the hardware configuration of the communication device 100 is not limited, in the example of this embodiment, as shown in FIG. 2, in the example of this embodiment, the communication device 100 is smart in terms of hardware. It is assumed that the configuration is a phone (smart phone owned by the speaker U1).
  • the sound collection unit 110 includes a signal input unit 2, a frequency conversion unit 3, a directivity forming unit 4, and a target area sound extraction unit 5.
  • the sound collection unit 110 may cause a computer including a processor, a memory, and the like to execute a program (including the sound collection program according to the embodiment). It can be shown as in FIG. Further, the communication apparatus 100 may be provided with a computer-readable non-transitory storage medium that stores such a program.
  • the hardware configuration of the communication device 200 is not limited, for example, various telephone devices (for example, speakerphones) can be applied.
  • the communication device 200 includes a communication unit 210 that receives audio data from the communication device 100 via the communication path P, and a speaker 6 that outputs a sound based on the audio data received by the communication unit 210 to the listener U2. Have.
  • the microphone array section 1 has three microphones MC1 to MC3.
  • the three microphones MC1 to MC3 are connected to the normal mouthpiece in the smartphone. It is desirable to be arranged around the part (end opposite to the part where the speaker SP is arranged). In other words, in the communication device 100, the three microphones MC1 to MC3 can be arranged around a portion facing the mouth of the speaker U1 when using the communication device 100 (a portion closest to the mouth of the speaker U1). desirable.
  • the part where the mouth of the speaker U1 is located (the lower part when viewed from the direction of FIG. 2)
  • Three microphones MC1 to MC3 are arranged around the periphery (around the portion closest to the mouth of the speaker U1).
  • the positions of the three microphones MC1 to MC3 are the apexes of an equilateral triangle It is arranged to become.
  • the sides of the triangles formed by the microphones MC1 to MC3 have the same distance (the triangle formed by the microphones MC1 to MC3 is an equilateral triangle). Not necessarily.
  • the microphone array paired with the microphones MC1 and MC2 is MA1
  • the microphone array paired with the microphones MC2 and MC3 is MA2
  • the microphone is referred to as MA3.
  • a microphone array paired with MC3 and MC1 is referred to as MA3.
  • the sound collection unit 110 performs a target area sound collection process for collecting the target area sound in the target area using the acoustic signals supplied from the microphones MC1 to MC3 of the microphone array unit 1.
  • the signal input unit 2 converts an acoustic signal collected by each of the microphones MC1 to MC3 from an analog signal to a digital signal, and supplies it to the frequency conversion unit 3. Thereafter, the frequency conversion unit 3 converts the microphone signal from the time domain to the frequency domain using, for example, fast Fourier transform.
  • the directivity forming unit 4 forms directivity by BF.
  • the BF is a technique for forming the directivity of sound collection using the time difference between signals reaching each microphone in the microphone array (see Non-Patent Document 1).
  • the BF is roughly classified into two types, an addition type and a subtraction type.
  • a subtraction type BF that can form directivity with a small number of microphones will be described.
  • FIG. 8 is a block diagram showing a configuration related to the subtraction type BF300 when the number of microphones is two (MC1, MC2).
  • the subtraction type BF 300 first calculates a time difference between signals that sound existing in a target direction (hereinafter referred to as “target sound”) arrives at the microphones MC1 and MC2 by the delay unit 310, and adds a delay to the target. Match the phase of the sound.
  • the time difference is calculated by equation (1).
  • d is the distance between the microphones MC1 and MC2
  • c is the speed of sound
  • ⁇ i is the amount of delay.
  • ⁇ L represents an angle from a vertical direction to a target direction with respect to a straight line connecting the positions of the microphones MC1 and M2.
  • the delay unit 310 performs a delay process on the input signal x 1 (t) of the microphone MC1.
  • the subtractor 320 performs a subtraction process according to the equation (2). In the subtractor 320, this subtraction process can be similarly performed in the frequency domain. In this case, the expression (2) is changed to the expression (3).
  • the subtractor 320 can form directivity that is strong against a blind spot of bi-directionality by using spectral subtraction processing (hereinafter also simply referred to as “SS”).
  • the directivity by SS is formed at all frequencies or a designated frequency band according to the equation (4).
  • (4) In the formula, is used to input signals X 1 microphone MC1, it is possible to obtain the same effect input signal X 2 microphones MC2.
  • n a frame number
  • a coefficient for adjusting the strength of SS.
  • flooring processing when the value becomes negative at the time of subtraction, flooring processing may be performed in which 0 or the original value is replaced with a smaller value.
  • sound that exists in a direction other than the target direction hereinafter referred to as “non-target sound” is extracted based on the characteristics of bi-directionality, and the amplitude spectrum of the extracted non-target sound is subtracted from the amplitude spectrum of the input signal.
  • the target sound can be emphasized.
  • non-target area sound the sound source (hereinafter referred to as “non-target area sound”) that exists on a line in the same direction as that area can be obtained only by using the subtraction type BF. Call).
  • the directivity forming unit 4 uses the area sound collection process proposed in Patent Document 1 (using a plurality of microphone arrays, directing directivity from different directions to the target area, and crossing the directivity in the target area. In the following description, the target area sound is collected). Specifically, the directivity forming unit 4 illustrated in FIG. 1 may perform area sound collection processing by the following processing.
  • the directivity forming unit 4 has a microphone array MA1 composed of microphones MC1 and MC2 and a microphone array MA3 composed of microphones MC1 and MC3, respectively, by BF toward the inside of the triangles of the microphones MC1 to MC3.
  • a directivity is formed, and the directivity of the microphone array is crossed in front of the mouthpiece (an area inside the triangle formed by the microphones MC1 to MC3), which is an area (target area) where it is desired to collect sound from different directions (Fig. 7C).
  • the target area sound extraction unit 5 converts the BF output data Y 1 (n) and Y 2 (n) of the microphone array MA1 formed by the directivity forming unit 4 and the microphone array MA3 to the formula (5) or (6) SS is performed to extract the non-target area sounds N 1 (n) and N 2 (n) existing in the target area direction.
  • the BF output data of the microphone array MA1 is Y 1 (n) and the BF output data of the microphone array MA3 is Y 2 (n).
  • the non-target area sound output data of the microphone array MA1 is N 1 (n) and the non-target area sound output data of the microphone array MA3 is N 2 (n).
  • the target area sound extraction unit 5 extracts the target area sound by SS of the non-target area sound from each BF output according to the equations (9) and (10), and extracts the target area emphasized sound Z 1 (n), Z 2. (N) is generated.
  • ⁇ 1 (n) and ⁇ 2 (n) indicate coefficients for changing the strength at the time of SS.
  • the target area sound extraction unit 5 may output both the target area emphasized sounds Z 1 (n) and Z 2 (n), or may output only one of them.
  • the target area emphasized sound (Z 1 (n) and / or Z 2 (n)) generated by the target area sound extraction unit 5 is supplied to the communication unit 120.
  • the communication unit 120 transmits audio data based on the supplied target area emphasized sound to the communication device 200 via the communication path P.
  • the communication unit 210 receives the audio data, and outputs an acoustic signal based on the audio data received by the speaker 6 (sound output toward the listener U2).
  • area sound collection is realized by configuring two microphone arrays with three microphones MC1 to MC3. Further, in the area sound collection processing performed by the sound collection unit 110, frequency analysis (discrete Fourier transform or the like) occupies a large amount of processing, which is necessary for each microphone. That is, in the area sound collection processing performed by the sound collection unit 110, the fact that the number of processing target microphones can be reduced can also reduce the processing amount of the entire area sound collection processing. In the first embodiment, since two microphone arrays are constituted by three microphones, mounting to a small space device such as a smart phone or a handset can be performed efficiently.
  • the microphone array unit 1 including the three microphones MC1 to MC3 can realize sound collection in one to three areas.
  • frequency analysis discrete Fourier transform, etc.
  • a plurality of area sounds can be collected with a smaller number of microphones, and an effective noise environment countermeasure can be achieved with a simple configuration even in a limited mounting space. It becomes possible.
  • FIG. 11 is a block diagram showing the configuration of each device related to the second embodiment.
  • the second embodiment is different from the first embodiment in that the communication device 100 is replaced with a communication device 100A.
  • the communication device 100A of the second embodiment is different from the first embodiment in that the microphone array unit 1 and the sound collection unit 110 are replaced with the microphone array unit 1A and the sound collection unit 110A.
  • the signal input unit 2, the frequency conversion unit 3, the directivity forming unit 4, and the target area sound extraction unit 5 are the signal input unit 2A, the frequency conversion unit 3A, and the directivity. This is different from the first embodiment in that it is replaced with a sex forming unit 4A and a target area sound extraction unit 5A.
  • the sound collection unit 110A performs area sound collection processing based on acoustic signals captured by the N microphones MC1 to MCN.
  • the sound collection unit 110A uses a microphone array formed by a combination of any two sides or diagonal lines except for those parallel to each other in an N-gon (N-gon formed by the positions of N microphones). Area sound collection processing is possible. For example, since there are N combinations of only the combinations of adjacent sides in the above N-gon, N (N locations) area sound collection is possible. Further, in the above-described N-gon, more area sounds can be collected by including two pairs of non-adjacent sides and diagonal lines.
  • microphones MC1 to MC5 are arranged at the corner apexes of the pentagon.
  • the side between the microphone MC1 and the microphone MC2 is S1
  • the side between the microphone MC2 and the microphone MC3 is S2,...
  • S5 be called S5.
  • the diagonal line between the microphone MC1 and the microphone MC3 is L1
  • the diagonal line between the microphone MC2 and the microphone MC4 is L2,..., The microphone MC5 and the microphone MC2.
  • the diagonal line between is called L5.
  • the microphone array composed of microphones at both ends of the side S1 is MA11
  • the microphone array composed of microphones at both ends of the side S2 is MA12
  • the array be called MA15.
  • the microphone array composed of microphones at both ends of the diagonal L1 is composed of MA21
  • the microphone array composed of microphones at both ends of the diagonal L2 is composed of microphones at both ends of the MA22,. This microphone array is called MA25.
  • microphone arrays MA11 to MA11 formed by diagonal lines and sides of the pentagonal microphone array section 1A (microphones MC1 to MC5).
  • the triangular microphone array section 1 ( The three microphones MC1 to MC3) achieve three areas of sound collection, whereas the microphone array unit 1A (microphones MC1 to MC5) of the second embodiment actually has 28 microphones by adding only two microphones. Area sound collection is realized by a combination of streets. Further, in the microphone array section 1A (microphones MC1 to MC5) of the second embodiment, area sound collection is possible with 10 combinations even if only the sides where all the microphone intervals are equal are used.
  • the sound collection unit 110A performs target area sound collection processing for collecting the target area sound from the sound source in the target area using the acoustic signals supplied from the microphones MC1 to MCN of the microphone array unit 1A. Do.
  • the directivity forming unit 4A selects a microphone signal related to a microphone array necessary for forming a desired directivity, and obtains a BF output Y 1 to Y j with directivity directed toward the sound collection area for each microphone array. Then, the process of supplying to the target area sound extraction unit 5A is performed.
  • Sound object area extracting unit 5A destination area emphasized sound Z 1 ⁇ Z k of the target area sound extracted (emphasized) of the desired object area using BF Output Y 1 ⁇ Y j generated by beamforming unit 4A Is generated and output.
  • k is the number of desired destination area sounds (sound collection areas). If the number of desired destination area sound one, the destination area sound extraction unit 5A, thereby outputting only one object areas emphasized sound Z 1 of.
  • the directivity of the microphone array MA11 corresponding to the side S1 (directivity illustrated by a dotted line) and the directivity of the microphone array MA12 corresponding to the side S2 (directivity illustrated by a one-dot chain line) overlap. It shows that the area is picked up in the area A401 (area with hatching).
  • the directivity forming section 4A generates the BF output Y 1 of the microphone array MA11 using microphone signals X 1, X 2, further microphone array by using the microphone signals X 2, X 3 It will produce the BF output Y 2 of MA12.
  • the target area sound extraction unit 5A extracts (emphasizes) the target area sound of the desired target area using the BF outputs Y 1 and Y 2 generated by the directivity forming unit 4A. It will be generated and output areas emphasized sound Z 1.
  • a large number of area sounds can be picked up with a smaller number of microphones by picking up an area sound using the N-shaped (polygonal) microphone array unit 1A.
  • N microphone arrays including at least two microphones are required (2 ⁇ N microphones in total).
  • N microphones with N microphones there are restrictions on the positions of sound collection areas that can be set, but at least N microphones with N microphones. Therefore, it is possible to save the number of necessary microphones to 1 ⁇ 2 or less.
  • the target area can be set more efficiently by setting N to an odd number in the N-shaped microphone array unit 1A.
  • N the number of microphones.
  • the microphone array unit 1A when a regular octagon is applied to an N-gon, parallel sides and diagonal lines are generated. Therefore, a regular heptagon is used to set a larger sound collection area with a smaller number of microphones. Can do.
  • the sound collection units 110 and 110A have been described as constituting a part of the communication devices 100 and 100A, but may be configured as independent devices. In the above embodiments, the sound collection units 110 and 110A have been described as not including the microphone array units 1 and 1A. However, the sound collection units 110 and 110A and the microphone array units 1 and 1A are integrated as an apparatus. You may make it comprise.
  • the sound collection device (sound collection unit 110, 110A) of the present invention is applied to a hand-held transmitter such as a smart phone or a handset.
  • the sound collecting device is applied to a headset or a wearable device (for example, a head-mounted display with a microphone, a neckband type headphone with a microphone, etc.), and an area where the mouth of the speaker U1 is located when worn by the speaker U1.
  • an N-side microphone may be installed around the periphery (periphery) to perform area sound collection processing.
  • DESCRIPTION OF SYMBOLS 100 ... Communication apparatus, 1 ... Microphone array part, MC1-MC3 ... Microphone, 110 ... Sound collection part, 2 ... Signal input part, 3 ... Frequency conversion part, 4 ... Directionality formation part, 5 ... Target area sound extraction part, DESCRIPTION OF SYMBOLS 120 ... Communication part, 200 ... Communication apparatus, 210 ... Communication part, 6 ... Speaker, U1 ... Speaker, U2 ... Listener.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Provided is a sound pickup device that efficiently and effectively performs area sound pickup. The present invention relates to a sound pickup device. This sound pickup device is characterized by comprising a sound pickup unit that picks up a target area sound in a target area on the basis of input signals respectively inputted from two microphone arrays corresponding to a combination of arbitrarily defined two, other than mutually parallel ones, out of sides or diagonal lines of an N-gon (N is an integer of 3 or more) that constitute a microphone array unit in which N microphones are respectively arranged at the apexes of the N-gon.

Description

収音装置、記憶媒体及び方法Sound collecting device, storage medium and method
 この発明は、収音装置、記憶媒体及び方法に関し、例えば、雑音環境下で用いられる音声通信システム等に適用し得る。 The present invention relates to a sound collection device, a storage medium, and a method, and can be applied to, for example, a voice communication system used in a noisy environment.
 雑音環境下で音声通信システムや音声認識応用システムを利用する場合、必要な目的音声と同時に混入する周囲の雑音は、良好なコミュニケーションを阻害し、音声認識率の低下をもたらす厄介な存在である。従来、このような複数の音源が存在する環境下において、特定の方向の音のみ分離・収音することで不要音の混入を避け必要な目的音を得る技術として、マイクアレイを用いたビームフォーマ(Beam Former;以下「BF」とも呼ぶ;特許文献2参照)がある。BFとは各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である。しかしBFだけでは収音を目的とするエリア(以下、「目的エリア」と呼ぶ)の周囲に他の音源が存在する場合、目的エリア内に存在する音(以下、「目的エリア音」と呼ぶ)だけを収音することが難しい。そのため、従来、特許文献1等により、複数のマイクアレイを用いて目的エリアを収音するエリア収音方式が提案されている。 When using a voice communication system or a voice recognition application system in a noisy environment, ambient noise mixed with the required target voice is a troublesome existence that hinders good communication and lowers the voice recognition rate. Conventionally, in such an environment where a plurality of sound sources exist, a beamformer using a microphone array is used as a technique for obtaining a necessary target sound by separating and collecting only sound in a specific direction to avoid mixing unnecessary sound. (Beam Former; hereinafter referred to as “BF”; see Patent Document 2). BF is a technique for forming directivity by using the time difference between signals reaching each microphone. However, when there is another sound source around an area for sound collection (hereinafter referred to as “target area”) with BF alone, sound existing in the target area (hereinafter referred to as “target area sound”). It is difficult to pick up only the sound. For this reason, an area sound pickup method for picking up a target area using a plurality of microphone arrays has been proposed in the prior art.
 図13は、2つのマイクアレイMA1、MA2を用いて、目的エリアの音源からの目的エリア音を収音する処理について示した説明図である。図13Aは、各マイクアレイMA1、MA2の構成例について示した説明図である。図13B、図13Cは、それぞれ図13Aに示すマイクアレイMA1、MA2のBF出力について周波数領域で示した図(グラフ形式のイメージ図)である。図13B、図13Cは、それぞれマイクアレイMA1、MA2のBF出力について周波数領域で示した図(グラフ形式のイメージ図)である。図13において各マイクアレイMA1、MA2は、それぞれ2つのマイクロホンMC1、MC2により構成されている。 FIG. 13 is an explanatory diagram showing a process of collecting the target area sound from the sound source in the target area using the two microphone arrays MA1 and MA2. FIG. 13A is an explanatory diagram showing a configuration example of each of the microphone arrays MA1 and MA2. FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 13A in the frequency domain, respectively. FIGS. 13B and 13C are diagrams (graph format image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 in the frequency domain, respectively. In FIG. 13, each microphone array MA1, MA2 is composed of two microphones MC1, MC2.
 従来のエリア収音では、図13Aに示すように、マイクアレイMA1、MA2の指向性を別々の方向から収音したいエリア(目的エリア)で交差させて収音する。図13Aの状態では、各マイクアレイMA1、MA2の指向性に目的エリア内に存在する音(目的エリア音)だけでなく、目的エリア方向の雑音(非目的エリア音)も含まれている。しかし、図13B、図13Cに示すように、マイクアレイMA1、MA2の指向性を周波数領域で比較すると、目的エリア音成分はどちらの出力にも含まれるが、非目的エリア音成分は各マイクアレイで異なることになる。従来のエリア収音技術では、このような特性を利用し、2つのマイクアレイMA1、MA2のBF出力に、共通に含まれる成分以外を抑圧することで目的エリア音のみ抽出することができる。 In the conventional area sound collection, as shown in FIG. 13A, the directivities of the microphone arrays MA1 and MA2 are intersected at areas (target areas) where sound collection is desired from different directions. In the state of FIG. 13A, the directivity of each of the microphone arrays MA1 and MA2 includes not only the sound existing in the target area (target area sound) but also noise in the target area direction (non-target area sound). However, as shown in FIGS. 13B and 13C, when the directivities of the microphone arrays MA1 and MA2 are compared in the frequency domain, the target area sound component is included in both outputs, but the non-target area sound component is included in each microphone array. It will be different. In the conventional area sound collection technique, only the target area sound can be extracted by using such characteristics and suppressing components other than those commonly included in the BF outputs of the two microphone arrays MA1 and MA2.
特開2014-072708号公報JP 2014-072708 A 特開2005-195955号公報JP 2005-195955 A
 ところで、雑音環境下で音声通信といえば、スマートホンなどのモバイル通信機器が代表的である。近年は音声認識アプリの利用が進み、画面を見ながら音声入力するケースが多くなっている。その場合、スマートホンと口元が離れるため、より外部騒音の影響を受けやすくなる。また、緊急車両に備えられるハンドセット(送受話器)はサイレンなどの大騒音が障害となり緊急の情報伝達の妨げとなっている。このような利用環境において、前記エリア収音技術は有効な解決策として期待される。すなわち、スマートホンの前面、あるいはハンドセットの送話口周辺に2つのマイクアレイを設置し、当該2つのマイクアレイのそれぞれの指向性を送話口の前で交差させエリア収音を機能させることにより、周辺雑音を排除し送話者の音声だけを正確に伝達することが可能になる。 By the way, a mobile communication device such as a smart phone is a typical example of voice communication in a noisy environment. In recent years, the use of voice recognition applications has progressed, and there are many cases where voice input is performed while viewing the screen. In this case, since the smartphone is separated from the mouth, it is more susceptible to external noise. In addition, in a handset (handset) provided in an emergency vehicle, a large noise such as a siren is an obstacle, preventing an emergency information transmission. In such a use environment, the area sound collection technology is expected as an effective solution. That is, two microphone arrays are installed in front of the smart phone or around the mouthpiece of the handset, and the directivity of each of the two microphone arrays is crossed in front of the mouthpiece to make the area sound collection function. Therefore, it becomes possible to accurately transmit only the voice of the sender by eliminating the ambient noise.
 しかし、これらの機器の限られたスペースに、エリア収音実現のために多くのマイクアレイを装着することは必ずしも容易ではない。 However, it is not always easy to mount many microphone arrays in the limited space of these devices in order to realize area sound collection.
 そのため、効率良く、かつ効果的にエリア収音を行うことができる収音装置、記憶媒体及び方法が望まれている。 Therefore, a sound collection device, a storage medium, and a method that can perform area sound collection efficiently and effectively are desired.
 第1の本発明は、N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音する収音部を有する。 In the first aspect of the present invention, among the N-sides and / or diagonal lines constituting the microphone array section in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective vertices of the N-side. A sound collection unit is provided that collects a target area sound in a target area based on input signals input from two microphone arrays corresponding to any two combinations other than those parallel to each other.
 第2の本発明のコンピュータ読み取り可能な非一時的な記憶媒体はコンピュータを、N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音する収音手段として機能させる収音プログラムを記憶する。 The computer-readable non-transitory storage medium of the second aspect of the present invention is a computer, and a microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon. Based on the input signals input from each of the two microphone arrays corresponding to any two combinations of the N-gonal sides and / or diagonals that are not parallel to each other, the target area sound of the target area is obtained. A sound collection program that functions as sound collection means for collecting sound is stored.
 第3の本発明は、収音装置が行う収音方法において、N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音する。 According to a third aspect of the present invention, in the sound collection method performed by the sound collection device, the microphone array unit in which N (N is an integer of 3 or more) microphones are arranged at the positions of the respective apexes of the N-gon. A target area sound of the target area is collected based on input signals input from each of two microphone arrays corresponding to any two combinations of N-sides and / or diagonal lines except those parallel to each other. .
 本発明によれば、効率良く、かつ効果的にエリア収音を行う収音装置を提供することができる。 According to the present invention, it is possible to provide a sound collection device that performs area sound collection efficiently and effectively.
第1の実施形態に係る各装置の構成(第1の実施形態に係る収音部(収音装置)の機能的構成を含む)について示したブロック図である。It is the block diagram shown about the structure (including the functional structure of the sound collection part (sound collection apparatus) which concerns on 1st Embodiment) of each apparatus which concerns on 1st Embodiment. 第1の実施形態に係る収音部(収音装置)を備えた通信装置の外観について示した図(平面図)である。It is the figure (plan view) shown about the external appearance of the communication apparatus provided with the sound collection part (sound collection apparatus) which concerns on 1st Embodiment. 2つのマイクアレイのBFによる指向性を別々の方向から目的エリアへ向けるマイクアレイ配置の説明図(イメージ図)である。It is explanatory drawing (image figure) of the microphone array arrangement | positioning which directs the directivity by BF of two microphone arrays to a target area from a different direction. マイクアレイから近接したエリアを収音する場合のマイクアレイ配置の説明図(イメージ図)である。It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. マイクアレイから近接したエリアを収音する場合のマイクアレイ配置の説明図(イメージ図)である。It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. マイクアレイから近接したエリアを収音する場合のマイクアレイ配置の説明図(イメージ図)である。It is explanatory drawing (image figure) of microphone array arrangement | positioning in the case of picking up the area which adjoined from the microphone array. 3個のマイクロホンを用いたエリア収音処理の概要について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the outline | summary of the area sound collection process using three microphones. 3個のマイクロホンにより形成されるマイクアレイの構成例について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the structural example of the microphone array formed by three microphones. 3個のマイクロホンにより形成されるマイクアレイの各組み合に対応するエリア収音処理について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. 3個のマイクロホンにより形成されるマイクアレイの各組み合に対応するエリア収音処理について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. 3個のマイクロホンにより形成されるマイクアレイの各組み合に対応するエリア収音処理について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the area sound collection process corresponding to each combination of the microphone array formed by three microphones. マイクロホン数が2個の場合の減算型BFに係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on the subtraction type | mold BF in case the number of microphones is two. 2個のマイクロホンを用いた減算型BFにより形成される指向特性を示す図である。It is a figure which shows the directivity characteristic formed by the subtraction type | mold BF using two microphones. 2個のマイクロホンを用いた減算型BFにより形成される指向特性を示す図である。It is a figure which shows the directivity characteristic formed by the subtraction type | mold BF using two microphones. 第2の実施形態に係る各装置の構成(第1の実施形態に係る収音部(収音装置)の機能的構成を含む)について示したブロック図である。It is the block diagram shown about the structure (including the functional structure of the sound collection part (sound collection apparatus) which concerns on 1st Embodiment) of each apparatus which concerns on 2nd Embodiment. 第2の実施形態に係る多角形(5角形、N=5)のマイクアレイ部の構成について示した図である。It is the figure shown about the structure of the microphone array part of the polygon (pentagon, N = 5) which concerns on 2nd Embodiment. 第2の実施形態に係る多角形(5角形、N=5)のマイクアレイ部を用いたエリア収音処理の例について示した説明図(イメージ図)である。It is explanatory drawing (image figure) shown about the example of the area sound collection process using the microphone array part of the polygon (pentagon, N = 5) which concerns on 2nd Embodiment. 従来の収音装置において、2つのマイクアレイのビームフォーマ(BF)による指向性を別々の方向から目的エリアへ向けた場合の構成例について示した説明図である。It is explanatory drawing shown about the structural example at the time of directivity by the beam former (BF) of two microphone arrays toward a target area from a separate direction in the conventional sound collection device.
 (A)第1の実施形態
 以下、本発明による収音装置、プログラム及び方法の第1の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の収音装置、プログラム及び方法を収音部に適用した例について説明する。
(A) First Embodiment Hereinafter, a first embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the sound collection device, program, and method of the present invention are applied to a sound collection unit will be described.
 まず、第1の実施形態におけるマイクアレイを用いたエリア収音処理の基本的な原理について図3~図7を用いて説明する。 First, the basic principle of area sound collection processing using the microphone array in the first embodiment will be described with reference to FIGS.
 図3に示すように、エリア収音を実現するためには、通常少なくとも2つ以上のマイクアレイが必要となる。 As shown in FIG. 3, at least two or more microphone arrays are usually required to realize area sound collection.
 図3では、2つのマイクアレイMA100、MA200の指向性をそれぞれ別の方向に向けて収音エリア(目的エリア)で交差させている。また、図3では、マイクアレイMA100、MA200は、それぞれ2つのマイクロホンch1、ch2により構成されている。さらに、図3では、マイクアレイMA100の指向性を破線で図示し、マイクアレイMA200の指向性を一点鎖線で図示している。そして、図3では、2つのマイクアレイMA100、MA200の指向性が交差する収音エリアにハッチ(斜線)を付している。 In FIG. 3, the directivities of the two microphone arrays MA100 and MA200 are crossed in a sound collection area (target area) in different directions. In FIG. 3, the microphone arrays MA100 and MA200 are each composed of two microphones ch1 and ch2. Furthermore, in FIG. 3, the directivity of microphone array MA100 is illustrated by a broken line, and the directivity of microphone array MA200 is illustrated by a one-dot chain line. In FIG. 3, the sound collection areas where the directivities of the two microphone arrays MA100 and MA200 intersect are hatched.
 一方、ハンドセットやスマートホン等の送話器(話者の音声を捕捉するマイクロホンを備える装置)を備える装置(以下、「送話装置」とも呼ぶ)において、収音を必要とするエリア(使用時における話者の口元の位置)は送話口(マイクロホンが配置されている部分)から直近(例えば、数センチ以内)の狭いエリアに限定される。 On the other hand, in a device (hereinafter also referred to as a “sending device”) including a handset, a smart phone, or other transmitter (a device including a microphone that captures the voice of a speaker) The position of the speaker's mouth is limited to a narrow area closest to (for example, within several centimeters) from the mouthpiece (the portion where the microphone is disposed).
 したがって、ハンドセットやスマートホン等の送話装置において、2つのマイクアレイMA100、MA200を用いて、話者の口元のエリア収音を行う場合、図4Aに示すように2つのマイクアレイMA100、MA200の距離も近接して配置する必要がある。このとき、2つのマイクアレイMA100、MA200の距離を限界まで近づけると、図4Bに示すように、2つのマイクアレイMA100、MA200の一部のマイクロホン(図4Bでは、内側に配置されたマイクロホンch1)をほぼ重なった位置とすることになる。そのため、図4Bの状態では、マイクアレイMA100のマイクロホンch1と、マイクアレイMA200のマイクロホンch1については共通化して、図4Cに示すような3個のマイクロホンch1~ch3の構成に置き換えることが可能となる。 Therefore, in the case of a microphone such as a handset or a smart phone that uses the two microphone arrays MA100 and MA200 to collect the area of the speaker's mouth, as shown in FIG. 4A, the two microphone arrays MA100 and MA200 It is necessary to arrange them close to each other. At this time, when the distance between the two microphone arrays MA100 and MA200 is brought close to the limit, as shown in FIG. 4B, some microphones of the two microphone arrays MA100 and MA200 (in FIG. 4B, the microphone ch1 disposed inside). Are almost overlapped. Therefore, in the state of FIG. 4B, the microphone ch1 of the microphone array MA100 and the microphone ch1 of the microphone array MA200 can be shared and replaced with the configuration of three microphones ch1 to ch3 as shown in FIG. 4C. .
 すなわち、送話器の送話口から直近の狭いエリアのエリア収音を行う必要のある送話装置では、図4Cのように、最小で3つのマイクロホンch1~ch3により構成されるマイクアレイで、話者の口元から発話される音声のエリア収音を実現することができる。 That is, in a transmitter that needs to pick up an area in a narrow area closest to the mouthpiece of the transmitter, as shown in FIG. 4C, a microphone array including a minimum of three microphones ch1 to ch3, It is possible to realize area sound collection of speech uttered from the speaker's mouth.
 したがって、送話器から直近の狭いエリアのエリア収音を行う必要のある送話装置では、図5に示すような3つのマイクロホンch1~ch3を用いて、エリア収音を実現することができる。図5では、マイクロホンch1とマイクロホンch2との間を結ぶ線分L101と、マイクロホンch2とマイクロホンch3との間を結ぶ線分L102とが角度を有するように(すなわち、∠≠180°となるように)、3つのマイクロホンch1~ch3が配置されている。また、図5では、線分L101の両端のマイクロホンch1、ch2を対とする第1のマイクアレイと、線分L102の両端のマイクロホンch2、ch3を対とする第2のマイクアレイを構成し、各マイクアレイの指向性を内角方向(収音エリア)に向けて交差させている。なお、図5では、第1のマイクアレイの指向性を破線で図示し、第2のマイクアレイの指向性を一点鎖線で図示している。また、図5では、第1のマイクアレイと第2のマイクアレイの指向性が重なる領域にハッチ(斜線)を付している。 Therefore, in a transmitter that needs to pick up an area in a narrow area closest to the transmitter, it is possible to realize area pickup using three microphones ch1 to ch3 as shown in FIG. In FIG. 5, the line segment L101 connecting the microphone ch1 and the microphone ch2 and the line segment L102 connecting the microphone ch2 and the microphone ch3 have an angle (that is, ∠ ≠ 180 °). ) Three microphones ch1 to ch3 are arranged. Further, in FIG. 5, a first microphone array having a pair of microphones ch1 and ch2 at both ends of the line segment L101 and a second microphone array having a pair of microphones ch2 and ch3 at both ends of the line segment L102 are configured. The directivity of each microphone array is crossed toward the inner angle direction (sound collection area). In FIG. 5, the directivity of the first microphone array is indicated by a broken line, and the directivity of the second microphone array is indicated by a one-dot chain line. Further, in FIG. 5, hatching (hatched lines) is given to regions where the directivities of the first microphone array and the second microphone array overlap.
 そうすると、マイクロホンch1~ch3では、図6に示すように、マイクロホンの組み合わせによって最大3つのマイクアレイ(指向性の方向の異なる3つのマイクアレイ)を設定することができる。図6に示すように、マイクロホンch1~ch3では、マイクロホンch1、ch2を対とするマイクアレイMA301、マイクロホンch2、ch3を対とするマイクアレイMA302、及びマイクロホンch3、ch1を対とするマイクアレイMA303を設定することができる。 Then, in the microphones ch1 to ch3, as shown in FIG. 6, a maximum of three microphone arrays (three microphone arrays having different directivity directions) can be set by combining the microphones. As shown in FIG. 6, the microphones ch1 to ch3 include a microphone array MA301 having a pair of microphones ch1 and ch2, a microphone array MA302 having a pair of microphones ch2 and ch3, and a microphone array MA303 having a pair of microphones ch3 and ch1. Can be set.
 そして、マイクロホンch1~ch3では、図7に示すように、3つのマイクアレイMA301、MA302、MA303の組み合わせ(3通りの組み合わせ)に応じたエリア収音が可能となる。 In the microphones ch1 to ch3, as shown in FIG. 7, it is possible to collect areas according to combinations (three combinations) of three microphone arrays MA301, MA302, and MA303.
 図7Aでは、マイクアレイMA301の指向性を一点鎖線で図示し、マイクアレイMA302の指向性を二点鎖線で図示し、ている。また、図7Bでは、マイクアレイMA302の指向性を一点鎖線で図示し、マイクアレイMA303の指向性を二点鎖線で図示している。さらに、図7Cでは、マイクアレイMA301の指向性を一点鎖線で図示し、マイクアレイMA303の指向性を二点鎖線で図示している。さらにまた、図7Aでは、マイクアレイMA301、MA302の組み合わせに応じた収音エリアA301にハッチ(斜線)を付している。また、図7Bでは、マイクアレイMA302、MA303の組み合わせに応じた収音エリアA302にハッチ(斜線)を付している。さらに、図7Cでは、マイクアレイMA301、MA303の組み合わせに応じた収音エリアA303にハッチ(斜線)を付している。 7A, the directivity of the microphone array MA301 is illustrated by a one-dot chain line, and the directivity of the microphone array MA302 is illustrated by a two-dot chain line. In FIG. 7B, the directivity of the microphone array MA302 is illustrated by a one-dot chain line, and the directivity of the microphone array MA303 is illustrated by a two-dot chain line. Further, in FIG. 7C, the directivity of microphone array MA301 is illustrated by a one-dot chain line, and the directivity of microphone array MA303 is illustrated by a two-dot chain line. Furthermore, in FIG. 7A, the sound collection area A301 corresponding to the combination of the microphone arrays MA301 and MA302 is hatched. In FIG. 7B, the sound collection area A302 corresponding to the combination of the microphone arrays MA302 and MA303 is hatched. Further, in FIG. 7C, the sound collection area A303 corresponding to the combination of the microphone arrays MA301 and MA303 is hatched.
 図7に示すように、マイクロホンch1~ch3では、いずれのマイクアレイ(MA301~MA303)でも、マイクアレイ同士(マイクアレイを構成する2つのマイクの位置を結ぶ線分同士)で角度を有することから、互いの指向性を交差させて、組み合わせ毎に異なるエリア収音(異なる領域のエリア収音)が実現可能である。 As shown in FIG. 7, in any of the microphones ch1 to ch3, any microphone array (MA301 to MA303) has an angle between the microphone arrays (lines connecting the positions of the two microphones constituting the microphone array). It is possible to achieve different area sound collection for each combination (area sound collection in different areas) by crossing the directivities of each other.
 後述する第1の実施形態の収音部(収音装置)では、図6、図7に示すように、三角形の各頂点に配置された3つのマイクロホンを用いたエリア収音を行う構成となっている。 As shown in FIGS. 6 and 7, the sound collection unit (sound collection device) of the first embodiment to be described later is configured to perform area sound collection using three microphones arranged at each vertex of a triangle. ing.
 (A-1)第1の実施形態の構成
 図1は、第1の実施形態に関連する各装置の構成について示したブロック図である。
(A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing the configuration of each device related to the first embodiment.
 図1では、第1の実施形態に係る収音部110を備える通信装置100と、通信装置200とを図示している。また、図1では、通信装置100、200間は、通信路Pにより通信可能な構成となっている。 FIG. 1 illustrates a communication device 100 including the sound collection unit 110 according to the first embodiment and a communication device 200. In FIG. 1, the communication apparatuses 100 and 200 can communicate with each other via the communication path P.
 通信装置100は、話者U1が発話した音声(音)を捕捉し収音し、収音した音声の音声データを通信路Pを介して通信装置200に送信する装置である。そして、通信装置200は、通信装置100から受信した音声データに基づく音声を聴者U2に表音出力する装置である。 The communication device 100 is a device that captures and collects voice (sound) uttered by the speaker U1, and transmits voice data of the collected voice to the communication device 200 via the communication path P. The communication device 200 is a device that outputs a sound based on the sound data received from the communication device 100 to the listener U2.
 話者U1は、例えば、災害救助等の現場で作業する作業員(例えば、騒音環境下で作業をする作業員)等が該当し、聴者U2としては遠隔地(例えば、災害救助等の現場を指揮する司令センタ)の指令担当者等が該当する。 The speaker U1 corresponds to, for example, a worker who works at a disaster relief site (for example, a worker who works in a noisy environment), and the listener U2 is a remote site (for example, a disaster rescue site). The person in charge of command at the command center)
 次に、通信装置100の構成概要について図1、図2を用いて説明する。 Next, an outline of the configuration of the communication apparatus 100 will be described with reference to FIGS.
 通信装置100は、3個のマイクロホンMC1~MC3を備えるマイクアレイ部1と、マイクアレイ部1で捕捉した音響信号に基づいて話者U1の発話した音声を収音する収音部110と、収音部110が収音した音声に基づく音声データを通信(通信路Pを介した通信)により通信装置200に送信する通信部120とを有している。 The communication apparatus 100 includes a microphone array unit 1 including three microphones MC1 to MC3, a sound collection unit 110 that collects speech uttered by the speaker U1 based on an acoustic signal captured by the microphone array unit 1, The communication unit 120 transmits audio data based on the sound collected by the sound unit 110 to the communication device 200 by communication (communication via the communication path P).
 通信路Pは、有線・無線に限定されず種々の接続手段や接続構成(ネットワーク構成)を適用することができる。 The communication path P is not limited to wired and wireless, and various connection means and connection configurations (network configurations) can be applied.
 通信装置100のハードウェア的な構成については限定されないものであるが、この実施形態の例では、図2に示すように、この実施形態の例では、通信装置100は、ハードウェア的にはスマートホン(話者U1が所持するスマートホン)の構成となっているものとする。 Although the hardware configuration of the communication device 100 is not limited, in the example of this embodiment, as shown in FIG. 2, in the example of this embodiment, the communication device 100 is smart in terms of hardware. It is assumed that the configuration is a phone (smart phone owned by the speaker U1).
 この実施形態の例では、図1に示すように、マイクアレイ部1は、3つのマイクロホンMC1~MC3により構成されている(すなわち、図1ではN=3)。 In the example of this embodiment, as shown in FIG. 1, the microphone array unit 1 is composed of three microphones MC1 to MC3 (that is, N = 3 in FIG. 1).
 次に、収音部110の詳細構成について図1を用いて説明する。 Next, the detailed configuration of the sound collection unit 110 will be described with reference to FIG.
 収音部110は、信号入力部2、周波数変換部3、指向性形成部4、及び目的エリア音抽出部5を有している。 The sound collection unit 110 includes a signal input unit 2, a frequency conversion unit 3, a directivity forming unit 4, and a target area sound extraction unit 5.
 収音部110は、例えば、プロセッサやメモリ等を備えるコンピュータにプログラム(実施形態に係る収音プログラムを含む)を実行させるようにしてもよいが、その場合であっても、機能的には、図1のように示すことができる。また、そのようなプログラムを記憶する、コンピュータ読み取り可能な非一時的な記憶媒体が通信装置100に設けられ得る。 For example, the sound collection unit 110 may cause a computer including a processor, a memory, and the like to execute a program (including the sound collection program according to the embodiment). It can be shown as in FIG. Further, the communication apparatus 100 may be provided with a computer-readable non-transitory storage medium that stores such a program.
 次に、通信装置200の構成概要について図1を用いて説明する。 Next, an outline of the configuration of the communication apparatus 200 will be described with reference to FIG.
 通信装置200のハードウェア構成についても限定されないものであるが、例えば、種々の電話装置(例えば、スピーカホン等)を適用することができる。 Although the hardware configuration of the communication device 200 is not limited, for example, various telephone devices (for example, speakerphones) can be applied.
 通信装置200は、通信路Pを介して通信装置100から音声データを受信する通信部210と、通信部210が受信した音声データに基づく音声を聴者U2に向けて表音出力するスピーカ6とを有している。 The communication device 200 includes a communication unit 210 that receives audio data from the communication device 100 via the communication path P, and a speaker 6 that outputs a sound based on the audio data received by the communication unit 210 to the listener U2. Have.
 次に、マイクアレイ部1の構成について、図2を用いて説明する。 Next, the configuration of the microphone array unit 1 will be described with reference to FIG.
 この実施形態の例では、マイクアレイ部1は、3つのマイクロホンMC1~MC3を有する構成であるものとする。 In the example of this embodiment, it is assumed that the microphone array section 1 has three microphones MC1 to MC3.
 そして、図2に示すように、この実施形態の例では、通信装置100はスマートホンの構成であるため、通信装置100では、この3つのマイクロホンMC1~MC3は、スマートホンにおいて通常送話口となる部分(スピーカSPが配置されている部分と反対側の端)の周囲に配置されることが望ましい。言い換えると、通信装置100において、3つのマイクロホンMC1~MC3は、通信装置100の使用時に話者U1の口元と対向する部分(話者U1の口元と最も近接する部分)の周囲に配置することが望ましい。図2では、話者U1が通信装置100を手で把持し、耳にスピーカSPを押し付けた場合に、話者U1の口元が位置する部分(図2の方向から見て下側の部分)の周囲(話者U1の口元と最も近接する部分の周囲)に3つのマイクロホンMC1~MC3が配置されている。 As shown in FIG. 2, in the example of this embodiment, since the communication device 100 has a smartphone configuration, in the communication device 100, the three microphones MC1 to MC3 are connected to the normal mouthpiece in the smartphone. It is desirable to be arranged around the part (end opposite to the part where the speaker SP is arranged). In other words, in the communication device 100, the three microphones MC1 to MC3 can be arranged around a portion facing the mouth of the speaker U1 when using the communication device 100 (a portion closest to the mouth of the speaker U1). desirable. In FIG. 2, when the speaker U1 holds the communication device 100 with his hand and presses the speaker SP to his ear, the part where the mouth of the speaker U1 is located (the lower part when viewed from the direction of FIG. 2) Three microphones MC1 to MC3 are arranged around the periphery (around the portion closest to the mouth of the speaker U1).
 図2に示す通信装置100(マイクアレイ部1)では、図6、図7に示マイクアレイと同様に、3個のマイクロホンMC1~MC3の各位置(各マイクロホンの中心位置)が正三角形の頂点となるように配置されている。図2では、説明を簡易とするため、マイクロホンMC1~MC3による三角形の各辺が同じ距離(マイクロホンMC1~MC3による三角形が正三角形)としているが、各辺の距離や各角の角度は全て同じでなくてもよい。 In the communication apparatus 100 (microphone array unit 1) shown in FIG. 2, as in the microphone arrays shown in FIGS. 6 and 7, the positions of the three microphones MC1 to MC3 (center positions of the microphones) are the apexes of an equilateral triangle It is arranged to become. In FIG. 2, for simplicity of explanation, the sides of the triangles formed by the microphones MC1 to MC3 have the same distance (the triangle formed by the microphones MC1 to MC3 is an equilateral triangle). Not necessarily.
 なお、図2に示すように、以下では、通信装置100(マイクアレイ部1)において、マイクロホンMC1、MC2を対とするマイクアレイをMA1、マイクロホンMC2、MC3を対とするマイクアレイをMA2、マイクロホンMC3、MC1を対とするマイクアレイをMA3と呼ぶものとする。 As shown in FIG. 2, in the communication device 100 (microphone array unit 1), hereinafter, the microphone array paired with the microphones MC1 and MC2 is MA1, the microphone array paired with the microphones MC2 and MC3 is MA2, and the microphone. A microphone array paired with MC3 and MC1 is referred to as MA3.
 (A-2)第1の実施形態の動作
 次に、以上のような構成を有する第1の実施形態の動作(実施形態に係る収音方法)を説明する。
(A-2) Operation of the First Embodiment Next, the operation of the first embodiment having the above-described configuration (sound collection method according to the embodiment) will be described.
 通信装置100では、収音部110が、マイクアレイ部1のマイクロホンMC1~MC3から供給される音響信号を用いて、目的エリアの目的エリア音を収音する目的エリア音収音処理を行う。 In the communication apparatus 100, the sound collection unit 110 performs a target area sound collection process for collecting the target area sound in the target area using the acoustic signals supplied from the microphones MC1 to MC3 of the microphone array unit 1.
 以下では、通信装置100を構成する収音部110内部の動作を中心に説明する。 Hereinafter, the operation inside the sound collection unit 110 constituting the communication apparatus 100 will be mainly described.
 信号入力部2は、各マイクロホンMC1~MC3で収音した音響信号をアナログ信号からデジタル信号に変換し、周波数変換部3に供給する。その後、周波数変換部3では、例えば高速フーリエ変換を用いてマイク信号を時間領域から周波数領域へ変換する。指向性形成部4はBFにより指向性を形成する。 The signal input unit 2 converts an acoustic signal collected by each of the microphones MC1 to MC3 from an analog signal to a digital signal, and supplies it to the frequency conversion unit 3. Thereafter, the frequency conversion unit 3 converts the microphone signal from the time domain to the frequency domain using, for example, fast Fourier transform. The directivity forming unit 4 forms directivity by BF.
 ここで、図8、図9を用いてBFによる指向性形成について説明する。 Here, directivity formation by BF will be described with reference to FIGS.
 BFとは、マイクアレイにおいて各マイクロホンに到達する信号の時間差を利用して収音の指向性を形成する技術である(非特許文献1参照)。BFは加算型と減算型の大きく2つの種類に分けられるが、ここでは少ないマイクロホン数で指向性を形成できる減算型BFについて説明する。 BF is a technique for forming the directivity of sound collection using the time difference between signals reaching each microphone in the microphone array (see Non-Patent Document 1). The BF is roughly classified into two types, an addition type and a subtraction type. Here, a subtraction type BF that can form directivity with a small number of microphones will be described.
 図8は、マイクロホン数が2個(MC1、MC2)の場合の減算型BF300に係る構成を示すブロック図である。 FIG. 8 is a block diagram showing a configuration related to the subtraction type BF300 when the number of microphones is two (MC1, MC2).
 減算型BF300は、まず遅延器310により目的とする方向に存在する音(以下、「目的音」と呼ぶ)が各マイクロホンMC1、MC2に到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。時間差は(1)式により算出される。ここで、dはマイクロホンMC1、MC2間の距離、cは音速、τは遅延量を示している。またθは、マイクロホンMC1、M2の位置を結んだ直線に対する垂直方向から目的方向への角度を示している。 The subtraction type BF 300 first calculates a time difference between signals that sound existing in a target direction (hereinafter referred to as “target sound”) arrives at the microphones MC1 and MC2 by the delay unit 310, and adds a delay to the target. Match the phase of the sound. The time difference is calculated by equation (1). Here, d is the distance between the microphones MC1 and MC2, c is the speed of sound, and τ i is the amount of delay. Θ L represents an angle from a vertical direction to a target direction with respect to a straight line connecting the positions of the microphones MC1 and M2.
 ここで、死角がマイクロホンMC1とマイクロホンMC2の中心に対し、マイクロホンMC1の方向に存在する場合、遅延器310は、マイクロホンMC1の入力信号x(t)に対し遅延処理を行う。その後、減算器320が、(2)式に従い減算処理を行う。減算器320では、この減算処理は周波数領域でも同様に行うことができ、その場合(2)式は(3)式のように変更される。
Figure JPOXMLDOC01-appb-M000001
Here, when the blind spot exists in the direction of the microphone MC1 with respect to the centers of the microphones MC1 and MC2, the delay unit 310 performs a delay process on the input signal x 1 (t) of the microphone MC1. Thereafter, the subtractor 320 performs a subtraction process according to the equation (2). In the subtractor 320, this subtraction process can be similarly performed in the frequency domain. In this case, the expression (2) is changed to the expression (3).
Figure JPOXMLDOC01-appb-M000001
 ここでθ=±π/2の場合、形成される指向性は図9Aに示すように、カージオイド型の単一指向性となり、θ=0,πの場合は、図9Bのような8の字型の双指向性となる。また、減算器320では、スペクトル減算法(Spectral Subtraction)の処理(以下、単に「SS」とも呼ぶ)を用いることで、双指向性の死角に強い指向性を形成することもできる。SSによる指向性は、(4)式に従い全周波数、もしくは指定した周波数帯域で形成される。(4)式では、マイクロホンMC1の入力信号Xを用いているが、マイクロホンMC2の入力信号Xでも同様の効果を得ることができる。ここで、nはフレーム番号、βはSSの強度を調節するための係数を示している。減算器320では、減算時に値がマイナスなった場合は、0または元の値を小さくした値に置き換えるフロアリング処理を行うようにしてもよい。この方式では、双指向性の特性によって目的方向以外に存在する音(以下、「非目的音」と呼ぶ)を抽出し、抽出した非目的音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的音を強調することができる。
Figure JPOXMLDOC01-appb-M000002
Here, when θ L = ± π / 2, the formed directivity is cardioid unidirectional as shown in FIG. 9A, and when θ L = 0, π, the directivity is as shown in FIG. 9B. Eight-shaped bi-directionality. In addition, the subtractor 320 can form directivity that is strong against a blind spot of bi-directionality by using spectral subtraction processing (hereinafter also simply referred to as “SS”). The directivity by SS is formed at all frequencies or a designated frequency band according to the equation (4). (4) In the formula, is used to input signals X 1 microphone MC1, it is possible to obtain the same effect input signal X 2 microphones MC2. Here, n represents a frame number, and β represents a coefficient for adjusting the strength of SS. In the subtractor 320, when the value becomes negative at the time of subtraction, flooring processing may be performed in which 0 or the original value is replaced with a smaller value. In this method, sound that exists in a direction other than the target direction (hereinafter referred to as “non-target sound”) is extracted based on the characteristics of bi-directionality, and the amplitude spectrum of the extracted non-target sound is subtracted from the amplitude spectrum of the input signal. The target sound can be emphasized.
Figure JPOXMLDOC01-appb-M000002
 ところで、ある特定の目的エリア内に存在する目的エリア音だけを収音したい場合、減算型BFを用いるだけでは、そのエリアと同一方向の線上に存在する音源(以下、「非目的エリア音」と呼ぶ)も収音してしまう。 By the way, when it is desired to pick up only a target area sound existing in a specific target area, the sound source (hereinafter referred to as “non-target area sound”) that exists on a line in the same direction as that area can be obtained only by using the subtraction type BF. Call).
 そこで、指向性形成部4では、特許文献1で提案されているエリア収音処理(複数のマイクアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する処理)を行うものとして説明する。具体的には、図1に示す指向性形成部4は、以下のような処理によりエリア収音処理を行うようにしてもよい。 Therefore, the directivity forming unit 4 uses the area sound collection process proposed in Patent Document 1 (using a plurality of microphone arrays, directing directivity from different directions to the target area, and crossing the directivity in the target area. In the following description, the target area sound is collected). Specifically, the directivity forming unit 4 illustrated in FIG. 1 may perform area sound collection processing by the following processing.
 例えば、指向性形成部4は、マイクロホンMC1、MC2により構成されるマイクアレイMA1と、マイクロホンMC1、MC3により構成されるマイクアレイMA3に対し、マイクロホンMC1~MC3の三角形の内側に向かってそれぞれBFによって指向性を形成し、マイクアレイの指向性を別々の方向から収音したいエリア(目的エリア)である送話口前(マイクロホンMC1~MC3により形成される三角形の内側の領域)で交差させる(図7C参照)。 For example, the directivity forming unit 4 has a microphone array MA1 composed of microphones MC1 and MC2 and a microphone array MA3 composed of microphones MC1 and MC3, respectively, by BF toward the inside of the triangles of the microphones MC1 to MC3. A directivity is formed, and the directivity of the microphone array is crossed in front of the mouthpiece (an area inside the triangle formed by the microphones MC1 to MC3), which is an area (target area) where it is desired to collect sound from different directions (Fig. 7C).
 目的エリア音抽出部5は、指向性形成部4で形成したマイクアレイMA1、およびマイクアレイMA3の各BF出力データY(n)、Y(n)を(5)、もしくは(6)式に従いSSし、目的エリア方向に存在する非目的エリア音N(n)、N(n)を抽出する。以下では、マイクアレイMA1のBF出力データをY(n)、マイクアレイMA3のBF出力データをY(n)とする。また、以下では、マイクアレイMA1の非目的エリア音出力データをN(n)、マイクアレイMA3の非目的エリア音出力データをN(n)とする。 The target area sound extraction unit 5 converts the BF output data Y 1 (n) and Y 2 (n) of the microphone array MA1 formed by the directivity forming unit 4 and the microphone array MA3 to the formula (5) or (6) SS is performed to extract the non-target area sounds N 1 (n) and N 2 (n) existing in the target area direction. In the following, it is assumed that the BF output data of the microphone array MA1 is Y 1 (n) and the BF output data of the microphone array MA3 is Y 2 (n). In the following, it is assumed that the non-target area sound output data of the microphone array MA1 is N 1 (n) and the non-target area sound output data of the microphone array MA3 is N 2 (n).
 ここでα、αは、目的エリアと各マイクアレイの距離の違いによって生じる信号レベルの差を補正する補正係数であり、所定の処理によって逐一計算されるべきものであり、その手法は特許文献1にも記載されている。ここでは簡単のため、目的エリアと各マイクアレイMA1、MA3までの距離は同一(α(n)=α(n)=1))とし(5)、(6)式を(7)、(8)式に代える。
Figure JPOXMLDOC01-appb-M000003
Here, α 1 and α 2 are correction coefficients for correcting a difference in signal level caused by a difference in distance between the target area and each microphone array, and should be calculated one by one by a predetermined process. It is also described in Document 1. Here, for simplicity, it is assumed that the distance between the target area and each of the microphone arrays MA1 and MA3 is the same (α 1 (n) = α 2 (n) = 1)) (5), Equation (6) is changed to (7), Instead of the equation (8).
Figure JPOXMLDOC01-appb-M000003
 その後、目的エリア音抽出部5は、(9)、(10)式に従い、各BF出力から非目的エリア音をSSして目的エリア音を抽出した目的エリア強調音Z(n)、Z(n)を生成する。なお、γ(n)、γ(n)はSS時の強度を変更するための係数を示している。なお、目的エリア音抽出部5は、目的エリア強調音Z(n)、Z(n)の両方を出力するようにしてもよいし、いずれか一方だけを出力するようにしてもよい。
Figure JPOXMLDOC01-appb-I000004
Thereafter, the target area sound extraction unit 5 extracts the target area sound by SS of the non-target area sound from each BF output according to the equations (9) and (10), and extracts the target area emphasized sound Z 1 (n), Z 2. (N) is generated. Note that γ 1 (n) and γ 2 (n) indicate coefficients for changing the strength at the time of SS. The target area sound extraction unit 5 may output both the target area emphasized sounds Z 1 (n) and Z 2 (n), or may output only one of them.
Figure JPOXMLDOC01-appb-I000004
 目的エリア音抽出部5で生成された目的エリア強調音(Z(n)及び又はZ(n))は、通信部120に供給される。通信部120は、供給された目的エリア強調音に基づく音声データを、通信路Pを介して通信装置200に送信する。 The target area emphasized sound (Z 1 (n) and / or Z 2 (n)) generated by the target area sound extraction unit 5 is supplied to the communication unit 120. The communication unit 120 transmits audio data based on the supplied target area emphasized sound to the communication device 200 via the communication path P.
 通信装置200では、通信部210が音声データを受信し、スピーカ6が受信した音声データに基づく音響信号を表音出力(聴者U2に向けて表音出力)する。 In the communication device 200, the communication unit 210 receives the audio data, and outputs an acoustic signal based on the audio data received by the speaker 6 (sound output toward the listener U2).
 (A-3)第1の実施形態の効果
 第1の実施形態によれば、以下のような効果を奏することができる。
(A-3) Effects of the First Embodiment According to the first embodiment, the following effects can be achieved.
 第1の実施形態では、3個のマイクロホンMC1~MC3で、2つのマイクアレイを構成することでエリア収音を実現している。また、収音部110が行うエリア収音処理において、処理量の多くを占めるのは周波数分析(離散フーリエ変換等)であり、それらはマイクロホン毎に必要な処理である。すなわち、収音部110が行うエリア収音処理において、処理対象のマイクロホンの数を少なくできるということは、エリア収音処理全体の処理量も少なくできることになる。第1の実施形態では、3個のマイクロホンで2つのマイクアレイを構成しているため、スマートホンやハンドセット等の小スペース機器への実装が効率的に行なえる。 In the first embodiment, area sound collection is realized by configuring two microphone arrays with three microphones MC1 to MC3. Further, in the area sound collection processing performed by the sound collection unit 110, frequency analysis (discrete Fourier transform or the like) occupies a large amount of processing, which is necessary for each microphone. That is, in the area sound collection processing performed by the sound collection unit 110, the fact that the number of processing target microphones can be reduced can also reduce the processing amount of the entire area sound collection processing. In the first embodiment, since two microphone arrays are constituted by three microphones, mounting to a small space device such as a smart phone or a handset can be performed efficiently.
 従来のエリア収音処理の構成(例えば、図3に示す構成)では、少なくても4個のマイクロホンを使用(2つのマイクロホンにより構成されるマイクアレイを2つ使用)して1つのエリア収音処理を実現していたが、第1の実施形態では、3個のマイクロホンMC1~MC3を備えるマイクアレイ部1で、1~3個のエリアの収音を実現することができる。また、エリア収音の処理量の多くを占める周波数分析(離散フーリエ変換等)であり、それらはマイクロホン毎に必要な処理であるため、3つのエリア収音を行なっても、マイクロホンの数が増えない第1の実施形態の構成であれば、処理量の増加も少ない。 In the configuration of the conventional area sound collection process (for example, the structure shown in FIG. 3), at least four microphones are used (two microphone arrays composed of two microphones are used) to collect one area sound. In the first embodiment, the microphone array unit 1 including the three microphones MC1 to MC3 can realize sound collection in one to three areas. In addition, frequency analysis (discrete Fourier transform, etc.) occupies most of the processing amount of area sound collection, and these are necessary processes for each microphone. Therefore, even if three area sound collection is performed, the number of microphones increases. If there is no configuration of the first embodiment, the increase in the processing amount is also small.
 以上のように、第1の実施形態の収音部110では、より少ないマイクロホン数で複数のエリア収音が実現され、限られた実装スペースにおいても、簡易な構成で効果的な雑音環境対策が可能となる。 As described above, in the sound collection unit 110 of the first embodiment, a plurality of area sounds can be collected with a smaller number of microphones, and an effective noise environment countermeasure can be achieved with a simple configuration even in a limited mounting space. It becomes possible.
 (B)第2の実施形態
 以下、本発明による収音装置、プログラム及び方法の第2の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の収音装置、プログラム及び方法を収音部に適用した例について説明する。
(B) Second Embodiment Hereinafter, a second embodiment of the sound collection device, program and method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the sound collection device, program, and method of the present invention are applied to a sound collection unit will be described.
 (B-1)第2の実施形態の構成
 図11は、第2の実施形態に関連する各装置の構成について示したブロック図である。
(B-1) Configuration of Second Embodiment FIG. 11 is a block diagram showing the configuration of each device related to the second embodiment.
 第2の実施形態では、通信装置100が通信装置100Aに置き換わっている点で第1の実施形態と異なっている。 The second embodiment is different from the first embodiment in that the communication device 100 is replaced with a communication device 100A.
 また、第2の実施形態の通信装置100Aでは、マイクアレイ部1及び収音部110が、マイクアレイ部1A及び収音部110Aに置き換わっている点で第1の実施形態と異なっている。さらに、第2の実施形態の収音部110Aでは、信号入力部2、周波数変換部3、指向性形成部4、及び目的エリア音抽出部5が、信号入力部2A、周波数変換部3A、指向性形成部4A、及び目的エリア音抽出部5Aに置き換わっている点で第1の実施形態と異なっている。 Further, the communication device 100A of the second embodiment is different from the first embodiment in that the microphone array unit 1 and the sound collection unit 110 are replaced with the microphone array unit 1A and the sound collection unit 110A. Furthermore, in the sound collection unit 110A of the second embodiment, the signal input unit 2, the frequency conversion unit 3, the directivity forming unit 4, and the target area sound extraction unit 5 are the signal input unit 2A, the frequency conversion unit 3A, and the directivity. This is different from the first embodiment in that it is replaced with a sex forming unit 4A and a target area sound extraction unit 5A.
 第1の実施形態のマイクアレイ部1は、図1、図2に示すように、三角形の角頂点にマイクロホンMC1~MC3を配置する構成となっているが、マイクアレイ部1は、三角形(N=3)に限定されずそれより多い数の辺を有する多角形の構成としてもよい。そこで、第2の実施形態のマイクアレイ部1Aは、任意のN個(Nは3以上の整数)のマイクロホンMC(MC1~MCN)を備えるものとする。すなわち、第2の実施形態のマイクアレイ部1Aは、N角形(N個の辺及び角を有する多角形)のそれぞれの頂点の位置に配置されたN個のマイクロホンMC1~MCNにより構成されている。 As shown in FIGS. 1 and 2, the microphone array unit 1 of the first embodiment is configured such that microphones MC1 to MC3 are arranged at the corner vertices of a triangle. = 3) It is good also as a polygonal structure which has more sides than it without being limited to it. Therefore, the microphone array unit 1A according to the second embodiment includes arbitrary N (N is an integer of 3 or more) microphones MC (MC1 to MCN). That is, the microphone array unit 1A of the second embodiment is configured by N microphones MC1 to MCN arranged at the positions of the vertices of an N-gon (a polygon having N sides and corners). .
 N個のマイクロホンによるN角形のマイクアレイ部1Aは、最小N=3で構成されるが、Nが増えることで実現できるエリア収音の数が大幅に増加する。 The N-shaped microphone array unit 1A composed of N microphones is configured with a minimum N = 3, but the number of area sound collections that can be realized by increasing N increases significantly.
 上述の通り、収音部110Aは、N個のマイクロホンMC1~MCNにより捕捉された音響信号に基づくエリア収音処理を行う。そして、収音部110Aは、N角形(N個のマイクロホンの位置により構成されるN角形)において、互いに並行であるものを除く任意の2つの辺もしくは対角線の組み合わせで形成されるマイクアレイを用いてエリア収音処理が可能である。例えば、上述のN角形において隣り合う辺同士の組み合わせだけでN組の組み合わせがあるため、N個(Nか所)のエリア収音が可能となる。また、上述のN角形において、隣接しない2組の辺、及び対角線を含めれば、さらに多くのエリア収音が可能になる。 As described above, the sound collection unit 110A performs area sound collection processing based on acoustic signals captured by the N microphones MC1 to MCN. The sound collection unit 110A uses a microphone array formed by a combination of any two sides or diagonal lines except for those parallel to each other in an N-gon (N-gon formed by the positions of N microphones). Area sound collection processing is possible. For example, since there are N combinations of only the combinations of adjacent sides in the above N-gon, N (N locations) area sound collection is possible. Further, in the above-described N-gon, more area sounds can be collected by including two pairs of non-adjacent sides and diagonal lines.
 図12は、マイクロホンの数を5(N=5)とした場合におけるマイクアレイ部1Aの構成例について示した説明図である。なお、上述の通り、マイクアレイ部1Aにおけるマイクロホンの数(N)は3や5に限定されない。 FIG. 12 is an explanatory diagram showing a configuration example of the microphone array unit 1A when the number of microphones is 5 (N = 5). As described above, the number (N) of microphones in the microphone array unit 1A is not limited to 3 or 5.
 図12に示すマイクアレイ部1Aでは、5角形の角頂点にそれぞれマイクロホンMC1~MC5が配置されている。図12に示すように、第2の実施形態では、マイクロホンMC1とマイクロホンMC2との間の辺をS1、マイクロホンMC2とマイクロホンMC3との間の辺をS2、…、マイクロホンMC5とマイクロホンMC1との間の辺をS5と呼ぶものとする。また、図12に示すように、第2の実施形態では、マイクロホンMC1とマイクロホンMC3との間の対角線をL1、マイクロホンMC2とマイクロホンMC4との間の対角線をL2、…、マイクロホンMC5とマイクロホンMC2との間の対角線をL5と呼ぶものとする。 In the microphone array section 1A shown in FIG. 12, microphones MC1 to MC5 are arranged at the corner apexes of the pentagon. As shown in FIG. 12, in the second embodiment, the side between the microphone MC1 and the microphone MC2 is S1, the side between the microphone MC2 and the microphone MC3 is S2,..., And between the microphone MC5 and the microphone MC1. Let S5 be called S5. Also, as shown in FIG. 12, in the second embodiment, the diagonal line between the microphone MC1 and the microphone MC3 is L1, the diagonal line between the microphone MC2 and the microphone MC4 is L2,..., The microphone MC5 and the microphone MC2. The diagonal line between is called L5.
 第2の実施形態では、辺S1の両端のマイクロホンにより構成されるマイクアレイをMA11、辺S2の両端のマイクロホンにより構成されるマイクアレイをMA12、…、辺S5の両端のマイクロホンにより構成されるマイクアレイをMA15と呼ぶものとする。また、第2の実施形態では、対角線L1の両端のマイクロホンにより構成されるマイクアレイをMA21、対角線L2の両端のマイクロホンにより構成されるマイクアレイをMA22、…、対角線L5の両端のマイクロホンにより構成されるマイクアレイをMA25と呼ぶものとする。 In the second embodiment, the microphone array composed of microphones at both ends of the side S1 is MA11, the microphone array composed of microphones at both ends of the side S2 is MA12,..., The microphones composed of microphones at both ends of the side S5. Let the array be called MA15. In the second embodiment, the microphone array composed of microphones at both ends of the diagonal L1 is composed of MA21, the microphone array composed of microphones at both ends of the diagonal L2 is composed of microphones at both ends of the MA22,. This microphone array is called MA25.
 マイクアレイを構成するには最小2個のマイクロホンが必要であるが、図12に示すように、5角形のマイクアレイ部1A(マイクロホンMC1~MC5)の辺と対角線で形成されるマイクアレイMA11~MA15、MA21~MA25の数は(=10)個となる。これは、図12に示す五角形の5つの辺S1~S5及び5つの対角線L1~L5の数を合計した数に相当する。 Although a minimum of two microphones are required to configure the microphone array, as shown in FIG. 12, microphone arrays MA11 to MA11 formed by diagonal lines and sides of the pentagonal microphone array section 1A (microphones MC1 to MC5). The number of MA15 and MA21 to MA25 is 5 C 2 (= 10). This corresponds to the total number of the five sides S1 to S5 and the five diagonal lines L1 to L5 of the pentagon shown in FIG.
 エリア収音は、指向性の方向が互いに異なる2個のマイクアレイで実現されることから、5角形(正5角形)のマイクアレイ部1A(マイクロホンMC1~MC5)の場合、上述の10個のマイクアレイMA11~MA15、MA21~MA25(辺S1~S5、対角線L1~L5)で互いに平行となる組み合わせは1つとしてない。したがって、5角形(正5角形)のマイクアレイ部1A(マイクロホンMC1~MC5)の場合、最大で10(=28)通りの組合せでエリア収音が可能である。BFによる指向性形成やエリア収音処理にとって、マイクロホン間の距離が同一ではないことは好ましいことではないが、単純に組合せだけを考えれば、第1の実施形態では、三角形のマイクアレイ部1(3個のマイクロホンMC1~MC3)で、3つのエリア収音を実現したのに対し、第2の実施形態のマイクアレイ部1A(マイクロホンMC1~MC5)では、僅か2個のマイクロホンの増設で実に28通りの組み合わせでエリア収音が実現される。また、第2の実施形態のマイクアレイ部1A(マイクロホンMC1~MC5)では、すべてのマイクロホン間隔が等しい辺だけを使っても10通りの組合せでエリア収音が可能である。 Since the area sound collection is realized by two microphone arrays having different directivity directions, in the case of the pentagonal (regular pentagonal) microphone array unit 1A (microphones MC1 to MC5), the above-described ten microphones are used. There is no single combination of microphone arrays MA11 to MA15, MA21 to MA25 (sides S1 to S5, diagonal lines L1 to L5) that are parallel to each other. Therefore, in the case of the pentagonal (regular pentagonal) microphone array section 1A (microphones MC1 to MC5), area sound collection is possible with a maximum of 10 C 2 (= 28) combinations. For directivity formation and area sound collection processing by BF, it is not preferable that the distance between the microphones is not the same, but considering only the combination, in the first embodiment, the triangular microphone array section 1 ( The three microphones MC1 to MC3) achieve three areas of sound collection, whereas the microphone array unit 1A (microphones MC1 to MC5) of the second embodiment actually has 28 microphones by adding only two microphones. Area sound collection is realized by a combination of streets. Further, in the microphone array section 1A (microphones MC1 to MC5) of the second embodiment, area sound collection is possible with 10 combinations even if only the sides where all the microphone intervals are equal are used.
 (B-2)第2の実施形態の動作
 次に、以上のような構成を有する第1の実施形態の動作(実施形態に係る収音方法)を説明する。
(B-2) Operation of the Second Embodiment Next, the operation of the first embodiment having the above configuration (sound collection method according to the embodiment) will be described.
 通信装置100Aでは、収音部110Aが、マイクアレイ部1AのマイクロホンMC1~MCNから供給される音響信号を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 In the communication device 100A, the sound collection unit 110A performs target area sound collection processing for collecting the target area sound from the sound source in the target area using the acoustic signals supplied from the microphones MC1 to MCN of the microphone array unit 1A. Do.
 以下では、通信装置100Aを構成する収音部110A内部の動作について第1の実施形態との差異を説明する。 Hereinafter, differences between the sound collection unit 110A and the internal configuration of the communication device 100A from the first embodiment will be described.
 信号入力部2Aは、各マイクロホンMC1~MCi(i=N)で収音した音響信号をアナログ信号からデジタル信号に変換してマイクロホン信号x~x(i=N)を生成し、周波数変換部3Aに供給する。 The signal input unit 2A converts the acoustic signals picked up by the microphones MC1 to MCi (i = N) from analog signals to digital signals to generate microphone signals x 1 to x i (i = N), and performs frequency conversion. Supply to part 3A.
 周波数変換部3Aは、マイクロホン信号x~x(i=N)を時間領域から周波数領域へ変換してマイクロホン信号X~X(i=N)を生成し、指向性形成部4Aに供給する。 The frequency converter 3A converts the microphone signals x 1 to x i (i = N) from the time domain to the frequency domain to generate microphone signals X 1 to X i (i = N), and sends them to the directivity forming unit 4A. Supply.
 指向性形成部4Aは、所望の指向性を形成するために必要なマイクアレイに係るマイクロホン信号を選択し、各マイクアレイについて収音エリア方向に指向性を向けたBF出力Y~Y得て、目的エリア音抽出部5Aに供給する処理を行う。なお、jはエリア収音に必要なBF出力(マイクアレイ)の数であり、2からN角形のマイクアレイ部1Aで構成可能なマイクアレイの最大数Mの間のいずれかの整数となる。例えば、N=3(三角形)の場合Mは3となり、N=5(五角形)の場合Mは10となる。 The directivity forming unit 4A selects a microphone signal related to a microphone array necessary for forming a desired directivity, and obtains a BF output Y 1 to Y j with directivity directed toward the sound collection area for each microphone array. Then, the process of supplying to the target area sound extraction unit 5A is performed. Note that j is the number of BF outputs (microphone arrays) necessary for area sound collection, and is an integer between 2 and the maximum number M of microphone arrays that can be configured by the N-shaped microphone array unit 1A. For example, when N = 3 (triangle), M is 3, and when N = 5 (pentagon), M is 10.
 目的エリア音抽出部5Aは、指向性形成部4Aで生成されたBF出力Y~Yを用いて所望の目的エリアの目的エリア音を抽出(強調)した目的エリア強調音Z~Zを生成して出力する。なおkは、所望の目的エリア音(収音エリア)の個数である。所望の目的エリア音の個数が1つの場合、目的エリア音抽出部5Aは、1つの目的エリア強調音Zだけを出力することになる。 Sound object area extracting unit 5A, destination area emphasized sound Z 1 ~ Z k of the target area sound extracted (emphasized) of the desired object area using BF Output Y 1 ~ Y j generated by beamforming unit 4A Is generated and output. Note that k is the number of desired destination area sounds (sound collection areas). If the number of desired destination area sound one, the destination area sound extraction unit 5A, thereby outputting only one object areas emphasized sound Z 1 of.
 例えば、図12に示すように、マイクアレイ部1Aにおけるマイクロホンの数を5(N=5)とした場合のエリア収音の例について説明する。 For example, as shown in FIG. 12, an example of area sound collection when the number of microphones in the microphone array section 1A is 5 (N = 5) will be described.
 図12は、マイクロホンの数を5(N=5)とした場合におけるマイクアレイ部1Aのエリア収音処理の例について示した説明図である。 FIG. 12 is an explanatory diagram showing an example of area sound collection processing of the microphone array section 1A when the number of microphones is 5 (N = 5).
 図12の例では、辺S1に対応するマイクアレイMA11の指向性(点線で図示した指向性)と、辺S2に対応するマイクアレイMA12の指向性(一点鎖線で図示した指向性)とが重なる領域A401(ハッチ(斜線)を付した領域)でエリア収音することについて示している。図12の例の場合、指向性形成部4Aは、マイクロホン信号X、Xを用いてマイクアレイMA11のBF出力Yを生成し、さらに、マイクロホン信号X、Xを用いてマイクアレイMA12のBF出力Yを生成することになる。 In the example of FIG. 12, the directivity of the microphone array MA11 corresponding to the side S1 (directivity illustrated by a dotted line) and the directivity of the microphone array MA12 corresponding to the side S2 (directivity illustrated by a one-dot chain line) overlap. It shows that the area is picked up in the area A401 (area with hatching). In the example of FIG. 12, the directivity forming section 4A generates the BF output Y 1 of the microphone array MA11 using microphone signals X 1, X 2, further microphone array by using the microphone signals X 2, X 3 It will produce the BF output Y 2 of MA12.
 そして、図12の例では、目的エリア音抽出部5Aが、指向性形成部4Aで生成されたBF出力Y、Yを用いて所望の目的エリアの目的エリア音を抽出(強調)した目的エリア強調音Zを生成して出力することになる。 In the example of FIG. 12, the target area sound extraction unit 5A extracts (emphasizes) the target area sound of the desired target area using the BF outputs Y 1 and Y 2 generated by the directivity forming unit 4A. It will be generated and output areas emphasized sound Z 1.
 (B-3)第2の実施形態の効果
 第2の実施形態によれば、以下のような効果を奏することができる。
(B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.
 第2の実施形態では、N角形(多角形)のマイクアレイ部1Aを用いてエリア収音することにより、より少ない数のマイクロホンで多数のエリア収音を行うことができる。従来、N個のエリア収音を行なうには、最低でも2個のマイクロホンを備えるマイクアレイがN個(合計で2×N個のマイクロホン)必要である。これに対して、この実施形態では、N角形のマイクアレイ部1Aを用いたエリア収音を行うことで、設定できる収音エリアの位置には制約があるものの、N個のマイクロホンで最低N個のエリア収音を実行することができるため、必要なマイクロホンの数を1/2以下に節約することも可能である。 In the second embodiment, a large number of area sounds can be picked up with a smaller number of microphones by picking up an area sound using the N-shaped (polygonal) microphone array unit 1A. Conventionally, in order to collect N area sounds, N microphone arrays including at least two microphones are required (2 × N microphones in total). In contrast, in this embodiment, by performing area sound collection using the N-shaped microphone array unit 1A, there are restrictions on the positions of sound collection areas that can be set, but at least N microphones with N microphones. Therefore, it is possible to save the number of necessary microphones to ½ or less.
 また、第2の実施形態では、N角形のマイクアレイ部1Aにおいて、Nを奇数に設定することで、より効率的に目的エリアを設定することができる。例えば、マイクアレイ部1Aにおいて、N角形に正八角形を適用した場合、平行する辺や対角線が発生するため、正七角形とした方がより少ないマイクロホンの数でより多くの収音エリアを設定することができる。 In the second embodiment, the target area can be set more efficiently by setting N to an odd number in the N-shaped microphone array unit 1A. For example, in the microphone array unit 1A, when a regular octagon is applied to an N-gon, parallel sides and diagonal lines are generated. Therefore, a regular heptagon is used to set a larger sound collection area with a smaller number of microphones. Can do.
 (C)他の実施形態
 本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。
(C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.
 (C-1)上記の各実施形態では、収音部110、110Aは通信装置100、100Aの一部を構成するものとして説明したが、独立した装置として構成するようにしてもよい。また、上記の各実施形態では、収音部110、110Aにマイクアレイ部1、1Aは含まない構成として説明したが、収音部110、110Aとマイクアレイ部1、1Aを一体とした装置として構成するようにしてもよい。 (C-1) In each of the above embodiments, the sound collection units 110 and 110A have been described as constituting a part of the communication devices 100 and 100A, but may be configured as independent devices. In the above embodiments, the sound collection units 110 and 110A have been described as not including the microphone array units 1 and 1A. However, the sound collection units 110 and 110A and the microphone array units 1 and 1A are integrated as an apparatus. You may make it comprise.
 (C-2)上記の各実施形態では、本発明の収音装置(収音部110、110A)をスマートホンやハンドセット等の手持ち型の送話器に適用する例について説明したが、本発明の収音装置は、ヘッドセットやウェアラブルデバイス(例えば、マイクロホン付きのヘッドマウントディスプレイ、マイクロホン付きのネックバンド型ヘッドホン等)に適用し、話者U1による装着時に話者U1の口元が位置する領域を目的エリアとして、その周囲(周辺)にN角形のマイクロホンを設置し、エリア収音処理するようにしてもよい。 (C-2) In each of the above embodiments, the example in which the sound collection device ( sound collection unit 110, 110A) of the present invention is applied to a hand-held transmitter such as a smart phone or a handset has been described. The sound collecting device is applied to a headset or a wearable device (for example, a head-mounted display with a microphone, a neckband type headphone with a microphone, etc.), and an area where the mouth of the speaker U1 is located when worn by the speaker U1. As the target area, an N-side microphone may be installed around the periphery (periphery) to perform area sound collection processing.
 100…通信装置、1…マイクアレイ部、MC1~MC3…マイクロホン、110…収音部、2…信号入力部、3…周波数変換部、4…指向性形成部、5…目的エリア音抽出部、120…通信部、200…通信装置、210…通信部、6…スピーカ、U1…話者、U2…聴者。 DESCRIPTION OF SYMBOLS 100 ... Communication apparatus, 1 ... Microphone array part, MC1-MC3 ... Microphone, 110 ... Sound collection part, 2 ... Signal input part, 3 ... Frequency conversion part, 4 ... Directionality formation part, 5 ... Target area sound extraction part, DESCRIPTION OF SYMBOLS 120 ... Communication part, 200 ... Communication apparatus, 210 ... Communication part, 6 ... Speaker, U1 ... Speaker, U2 ... Listener.

Claims (5)

  1.  N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音する収音部
     を有する、収音装置。
    Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included A sound collection device having a sound collection unit that collects a target area sound of a target area based on input signals input from each of two microphone arrays corresponding to the two combinations.
  2.  前記収音部は、2つの前記マイクアレイから入力されたそれぞれの入力信号について前記N角形の内側方向にビームフォーマにより指向性を形成する指向性形成部と、
     それぞれの前記マイクアレイのビームフォーマ出力をスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出部と、
     前記マイクアレイのビームフォーマ出力から前記非目的エリア音をスペクトル減算することにより、目的エリア音を抽出する目的エリア音抽出部と
     を有する、請求項1に記載の収音装置。
    The sound collection unit includes a directivity forming unit that forms directivity by a beamformer in an inner direction of the N-gon for each input signal input from the two microphone arrays,
    A non-target area sound extraction unit that extracts a non-target area sound existing in the target area direction by performing spectral subtraction on the beamformer output of each microphone array;
    The sound collection device according to claim 1, further comprising: a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the non-target area sound from a beamformer output of the microphone array.
  3.  前記Nは奇数である、請求項1に記載の収音装置。 The sound collecting device according to claim 1, wherein the N is an odd number.
  4.  コンピュータを、
     N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音する収音部として機能させる、
    収音プログラムを記憶する、コンピュータ読み取り可能な非一時的な記憶媒体。
    Computer
    Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included Based on the input signals input from each of the two microphone arrays corresponding to the two combinations of, and functioning as a sound collection unit for collecting the target area sound of the target area,
    A computer-readable non-transitory storage medium that stores a sound collection program.
  5.  収音装置が行う収音方法において、
     N個(Nは3以上の整数)のマイクロホンがそれぞれN角形の各頂点の位置に配置されているマイクアレイ部を構成する前記N角形の辺及び又は対角線のうち、互いに平行なものを除く任意の2つの組み合わせに対応する2つのマイクアレイのそれぞれから入力された入力信号に基づいて、目的エリアの目的エリア音を収音すること、を含む、
    収音方法。
    In the sound collection method performed by the sound collection device,
    Arbitrary N-side (N is an integer greater than or equal to 3) microphones are arranged at the positions of the respective apexes of the N-gon, and any of the N-sides and / or diagonals of the N-gon that are parallel to each other is included Collecting the target area sound of the target area based on input signals input from each of the two microphone arrays corresponding to the two combinations of
    Sound collection method.
PCT/JP2018/039209 2018-03-12 2018-10-22 Sound pickup device, storage medium, and method WO2019176153A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018044397A JP7067146B2 (en) 2018-03-12 2018-03-12 Sound collectors, programs and methods
JP2018-044397 2018-03-12

Publications (1)

Publication Number Publication Date
WO2019176153A1 true WO2019176153A1 (en) 2019-09-19

Family

ID=67907037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/039209 WO2019176153A1 (en) 2018-03-12 2018-10-22 Sound pickup device, storage medium, and method

Country Status (2)

Country Link
JP (1) JP7067146B2 (en)
WO (1) WO2019176153A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022102311A1 (en) * 2020-11-10 2022-05-19 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Sound pickup device
JP2023050963A (en) * 2021-09-30 2023-04-11 沖電気工業株式会社 Voice processing device, voice processing program, voice processing method, and wearable item

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012004708A (en) * 2010-06-15 2012-01-05 Yamaha Corp Acoustic processing apparatus
JP2014072708A (en) * 2012-09-28 2014-04-21 Oki Electric Ind Co Ltd Sound collecting device and program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5032960B2 (en) 2007-11-28 2012-09-26 パナソニック株式会社 Acoustic input device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012004708A (en) * 2010-06-15 2012-01-05 Yamaha Corp Acoustic processing apparatus
JP2014072708A (en) * 2012-09-28 2014-04-21 Oki Electric Ind Co Ltd Sound collecting device and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure
CN113301476B (en) * 2021-03-31 2023-11-14 阿里巴巴(中国)有限公司 Pickup device and microphone array structure

Also Published As

Publication number Publication date
JP7067146B2 (en) 2022-05-16
JP2019161400A (en) 2019-09-19

Similar Documents

Publication Publication Date Title
US9749731B2 (en) Sidetone generation using multiple microphones
US10097921B2 (en) Methods circuits devices systems and associated computer executable code for acquiring acoustic signals
CN110100453B (en) Controlling wind noise in a dual-sided microphone array
CN110089130B (en) Dual-purpose double-side microphone array
CN107533838A (en) Sensed using the voice of multiple microphones
US9521486B1 (en) Frequency based beamforming
WO2019176153A1 (en) Sound pickup device, storage medium, and method
US20180206038A1 (en) Real-time processing of audio data captured using a microphone array
JP2008543143A (en) Acoustic transducer assembly, system and method
US11832072B2 (en) Audio processing using distributed machine learning model
US9723403B2 (en) Wearable directional microphone array apparatus and system
JP5151352B2 (en) Sound emission and collection device
JP6943120B2 (en) Sound collectors, programs and methods
US20210235188A1 (en) Wearable microphone speaker
WO2021129197A1 (en) Voice signal processing method and apparatus
JP7067173B2 (en) Sound collectors, programs and methods
JP7040198B2 (en) Sound collectors, programs and methods
Miyahara et al. A hearing device with an adaptive noise canceller for noise-robust voice input
JP7176316B2 (en) SOUND COLLECTION DEVICE, PROGRAM AND METHOD
JP6973224B2 (en) Sound collectors, programs and methods
US10880642B2 (en) Sound pick-up apparatus, medium, and method
JP7176291B2 (en) SOUND COLLECTION DEVICE, PROGRAM AND METHOD
US20190306618A1 (en) Methods circuits devices systems and associated computer executable code for acquiring acoustic signals
CN111327984A (en) Earphone auxiliary listening method based on null filtering and ear-worn equipment
TW201116077A (en) Earphone integrated with microphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909526

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909526

Country of ref document: EP

Kind code of ref document: A1