US12120490B2 - Filter coefficient optimization apparatus, filter coefficient optimization method, and program - Google Patents

Filter coefficient optimization apparatus, filter coefficient optimization method, and program Download PDF

Info

Publication number
US12120490B2
US12120490B2 US17/801,754 US202017801754A US12120490B2 US 12120490 B2 US12120490 B2 US 12120490B2 US 202017801754 A US202017801754 A US 202017801754A US 12120490 B2 US12120490 B2 US 12120490B2
Authority
US
United States
Prior art keywords
filter coefficient
sound
optimization
frequency bin
beamformer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US17/801,754
Other versions
US20230088204A1 (en
Inventor
Ryotaro Sato
Kenta Niwa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION reassignment NIPPON TELEGRAPH AND TELEPHONE CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NIWA, KENTA, SATO, RYOTARO
Publication of US20230088204A1 publication Critical patent/US20230088204A1/en
Application granted granted Critical
Publication of US12120490B2 publication Critical patent/US12120490B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • G10K11/26Sound-focusing or directing, e.g. scanning
    • G10K11/34Sound-focusing or directing, e.g. scanning using electrical steering of transducer arrays, e.g. beam steering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic

Definitions

  • the present invention relates to a technology for optimizing a filter coefficient in target sound emphasis.
  • a beamforming using a microphone array is well known as a signal processing technique for emphasizing only sound (hereinafter referred to as target sound) that comes from a particular angular direction and suppressing sound (hereinafter referred to as non-target sound) that comes from other angular directions.
  • target sound only sound
  • non-target sound suppressing sound
  • an optimum filter is derived by solving an optimization problem of a cost function under some sort of constraint.
  • an MVDR (Minimum Variance Distortionless Response) beamformer described in Non Patent Literature 1 is obtained by using the power of an output signal as a cost function and minimizing this under a distortionless constraint condition for a target sound source angular direction.
  • Non Patent Literature 1 there is an LCMV (Linearly Constrained Minimum Variance) beamformer (see Non Patent Literature 2).
  • the LCMV beamformer emphasizes the target sound by imposing an equality constraint to responses of the beamformer for a plurality of angular directions, and suppresses the non-target sound by minimizing the variance of the output signal.
  • a design technique for the LCMV beamformer will be described below in detail.
  • signals are handled as values in time-frequency region after short-time Fourier transform.
  • complex conjugate transpositions of a vector v and a matrix M are expressed as a superscript H , as shown by v H and M H .
  • a linear filter that eliminate the non-target sound as unnecessary sound from an observation signal of a microphone array constituted by M microphone elements and emphasizes the target sound as the sound from a plurality of preset angular directions is configured.
  • D sound sources as signal sources that emit sound exist far off and a virtual plane wave comes to the microphone array is assumed. Further, it is assumed that all sound sources and all microphone elements are on identical planes.
  • the array manifold vector a f,d is a quantity that is automatically determined for each frequency bin f from physical characteristics of the microphone array and the whole system.
  • the filter coefficient determines the behavior of the beamformer.
  • the filter coefficient w f is set such that the non-target sound is minimized under the constraint of the target sound emphasis.
  • a cost function expressing the variance of the non-target sound is defined. It is expected that it is possible to design a desired beamformer by setting the filter coefficient such that the cost function is minimized.
  • FIG. 5 is a block diagram showing the configuration of a filter coefficient optimization apparatus 100 .
  • FIG. 7 is a block diagram showing the configuration of an optimization unit 120 .
  • FIG. 8 is a flowchart showing the behavior of the optimization unit 120 .
  • FIG. 9 is a diagram showing an example of the functional configuration of a computer that realizes apparatuses in embodiments of the present invention.
  • “_” indicates an inferior subscript. For example, “x y_z ” shows that “y z ” is a superscript for “x”, and “x y_z ” shows that “y z ” is an inferior subscript for “x”.
  • a cost term (hereinafter referred to as a regularization term) in which the relationship of the filter coefficient between adjacent frequency bins is considered can be used for designing a stable beamformer having a good quality.
  • a new cost function is introduced by adding the regularization term to the cost function ⁇ f L MV_f (w f ) described in Background Art, and the filter coefficient is determined by solving an optimization problem of the new cost function.
  • a regularization term using the difference in the phase component related to the filter coefficient will be descried as a regularization term by frequency-directional smoothing.
  • the regularization term makes it possible to directly control the group delay and phase delay of the filter constituting the beamformer.
  • the response of the beamformer in the frequency bin f for the angular direction ⁇ d is expressed as a complex number w f H a f,d .
  • of the response w f H a f,d of the beamformer is referred to as an amplitude, and a deflection angle ⁇ (w f H a f,d ) is referred to as a phase.
  • Two forms will be shown below as examples of the regularization term by the frequency-directional smoothing.
  • ⁇ ( ⁇ is a predetermined positive number) represents a weight parameter.
  • 2 ⁇ in Expression (7) and Expression (8) is a norm that is defined by the following expression.
  • a complex plane is divided into C sectors that are around the origin and that have an equal central angle, consecutive numbers 1, . . . , C are assigned in a counterclockwise manner, and c f,d is the number of a sector where the complex number w f H a f,d is positioned.
  • the discrete variable c f,d has one value of 1, . . . , C. Further, the following expression is satisfied among the filter coefficient w f , the array manifold vector a f,d and the discrete variable c f,d .
  • ⁇ ( ⁇ is a predetermined positive number) represents a weight parameter.
  • c in Expression (11) is a norm that is defined by the following expression.
  • the algorithm is shown in FIG. 1 .
  • the optimum value of the filter coefficient w f is determined depending on only the value of the discrete variable c f , regardless of the values of the other frequency bins. Therefore, by previously the filter coefficient w f for all values that C D discrete variables c f can have for each frequency bin f, the optimization problem results in a shortest path problem relevant to the discrete variable c f . Accordingly, the optimization problem can be solved at high speed by applying a Dijkstra method. This is used in the algorithm in FIG. 1 .
  • the distortionless constraint condition for one angular direction is used, but a distortionless constraint condition for a plurality of angular directions may be used.
  • the constraint sometimes becomes excessively strict, so that the solution is not evaluated.
  • the relaxation of the distortionless constraint condition is possible, but in this case, a non-convex optimization problem is sometimes obtained.
  • a technique for optimizing the filter coefficient by solving a convex optimization problem equivalent to the non-convex optimization problem instead of solving the non-convex optimization problem will be described below.
  • L convex is a strongly convex function relevant to the latent variable ⁇ w
  • the optimization problem in Expression (16) is an optimization problem in which the cost function is a non-convex function, that is, a non-convex optimization problem.
  • the non-convex optimization problem is a difficult problem as described above, and therefore, is intended to result in a convex optimization problem to be solved more easily, by introducing a certain kind of approximation.
  • the newly introduced function ⁇ circumflex over ( ) ⁇ d,c is a convex function on the region S d,c , and is a function for approximating the function L d on the region S d,c .
  • the function L d is a convex function on the region S d,c
  • the approximation can be performed by a more accurate piecewise convex function.
  • Expression (17) is equivalent to the following expression.
  • the non-convex optimization problem in Expression (16) can be transformed into the convex optimization problem in Expression (18) that is equivalent to the non-convex optimization problem in Expression (16), and the convex optimization problem in Expression (18) can be solved by the latent variable optimization algorithm in FIG. 2 .
  • the constraint condition in Expression (19) and the constraint condition in Expression (20) express the constraint that the amplitude of the response of the beamformer is a constant value (specifically, 1) and the constraint that the amplitude of the response of the beamformer only needs to be equal to or more than a constant value (specifically, 1), respectively.
  • Each of the constraint condition in Expression (19) and the constraint condition in Expression (20) is mathematically classified into a non-convex constraint.
  • the constraint condition in Expression (20) shows that the absolute value of the complex number w f H a f,d is equal to or more than 1. This means that the complex number w f H a f,d needs to be geometrically positioned on a unit circle or outside the unit circle in the complex plane.
  • the complex plane is equally divided into C sectors that are around the origin. The C sectors correspond to the C regions described above. Then, on the border or inside of each sector, Expression (20) that is the original constraint is approximated by C convex functions.
  • the function ⁇ circumflex over ( ) ⁇ (f,d),c_f,d may be a function expressed by the following expression.
  • R(z) represents the real part of a complex number z.
  • c f (c f,1 , . . . , c f,D ) is satisfied.
  • FIG. 3 shows a filter coefficient optimization algorithm that is obtained based on the latent variable optimization algorithm in FIG. 2 .
  • the optimization problem in Expression (23) can be solved at high speed by applying the Dijkstra method.
  • the algorithm is shown in FIG. 4 .
  • the observation signal is an input data that is used for the optimization of the filter coefficient, and therefore, the observation signal is referred to as optimization data, hereinafter.
  • FIG. 5 is a block diagram showing the configuration of the filter coefficient optimization apparatus 100 .
  • FIG. 6 is a flowchart showing the behavior of the filter coefficient optimization apparatus 100 .
  • the filter coefficient optimization apparatus 100 includes a setup data calculation unit 110 , an optimization unit 120 , and a recording unit 190 .
  • the recording unit 190 is a component unit that appropriately records the information necessary for the processing in the filter coefficient optimization apparatus 100 .
  • the recording unit 190 records the filter coefficient that is an optimized object.
  • the setup data calculation unit 110 calculates setup data that is used at the time of the optimization of the filter coefficient w, using the optimization data.
  • the optimization unit 120 calculates the optimum value w* of the filter coefficient w, using the setup data generated in S 110 .
  • the optimization unit 120 can calculate the optimization value w* based on the optimization problem min w L(w) relevant to the filter coefficient w under a predetermined constraint condition.
  • the function L(w) is a cost function relevant to the filter coefficient w f
  • is a predetermined positive value
  • C is an integer equal to or more than 1
  • c under a constraint condition (*).
  • FIG. 7 is a block diagram showing the configuration of the optimization unit 120 .
  • FIG. 8 is a flowchart showing the behavior of the optimization unit 120 .
  • the optimization unit 120 includes an initialization unit 121 , a candidate calculation unit 122 and an optimum value determination unit 123 .
  • c f (c f,1 , .
  • FIG. 7 is a block diagram showing the configuration of the optimization unit 120 .
  • FIG. 8 is a flowchart showing the behavior of the optimization unit 120 .
  • the optimization unit 120 includes an initialization unit 121 , a candidate calculation unit 122 and an optimum value determination unit 123 .
  • the behavior of the optimization unit 120 will be described with FIG. 8 .
  • FIG. 9 is a diagram showing an example of the functional configuration of a computer that realizes the apparatuses described above.
  • the processing in the apparatuses described above can be executed when a recording unit 2020 reads programs for causing a computer to function as the apparatuses described above and a control unit 2010 , an input unit 2030 , an output unit 2040 and the like to behave.
  • the apparatus in the present invention includes an input unit that can be connected with a keyboard and the like, an output unit that can be connected with a liquid crystal display and the like, a communication unit that can be connected with a communication device (for example, a communication cable) capable of communicating with the exterior of the hardware entity, a CPU (Central Processing Unit, a cache memory, a register and the like may be included), a RAM and a ROM that are memories, an external storage device that is a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM and the external storage device such that data can be exchanged.
  • the hardware entity may be provided with a device (drive) that can perform reading and writing for a record medium such as a CD-ROM.
  • a device including the hardware resources there are a general-purpose computer and the like.
  • the external storage device of the hardware entity programs necessary for realizing the above functions, data necessary in the processing of the programs, and the like are stored (for example, the program may be stored in a ROM that is a read-only storage without being limited to the external storage device). Further, data and others obtained by the processing of the programs are appropriately stored in the RAM, the external storage device or the like.
  • the programs stored in the external storage device (or the ROM or the like) and the data necessary for the processing of the programs are read in the memory as necessary, and are appropriately interpreted, executed or processed by the CPU.
  • the CPU realizes predetermined functions (the above component units expressed as the . . . unit, the . . . means and the like).
  • the processing functions in the hardware entity (the apparatus in the present invention) described in the above embodiments are realized by a computer as described above, the processing contents of the functions to be included in the hardware entity are described by programs. Then, the programs are executed by the computer, and thereby, the processing functions in the above hardware entity are realized on the computer.
  • the programs describing the processing contents can be recorded in a computer-readable record medium.
  • a computer-readable record medium for example, a magnetic record device, an optical disk, a magneto-optical record medium, a semiconductor memory and others may be used.
  • a hard disk device, a flexible disk, a magnetic tape or the like can be used as the magnetic record device
  • a CD-ROM Compact Disc Read Only Memory
  • a CD-R (Readable)/RW (ReWritable) or the like can be used as the optical disk
  • an MO Magnetto-Optical disc
  • an EEP-ROM Electrically Erasable and Programmable-Read Only Memory
  • the distribution of the programs is performed by sale, transfer, lending or the like of a portable record medium such as a DVD or CD-ROM in which the programs are recorded.
  • the programs may be distributed by storing the programs in a storage device of a server computer and transmitting the programs from the server computer to another computer through a network.
  • the computer that executes the programs first, once stores the programs recorded in the portable record medium or the programs transmitted from the server computer, in its own storage device. Then, at the time of the execution of the processing, the computer reads a program stored in its own storage device, and executes a process in accordance with the read program. Further, as another form of the execution of the programs, the computer may read a program directly from the portable record medium, and may execute a process in accordance with the program.
  • the computer may execute a process in accordance with the received program.
  • the above-described processes may be executed by a so-called ASP (Application Service Provider) service in which the processing functions are realized by only execution instruction and result acquisition, without the transmission of the programs from the server computer to the computer.
  • the program in the form includes information that is supplied for the processing by an electronic computer and that is similar to the program (for example, data that is not a direct command to the computer but has a property of prescribing the processing by the computer).
  • the hardware entity is configured by executing predetermined programs on the computer, but at least some of the processing contents may be realized in hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Otolaryngology (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

Provided is a filter coefficient optimization technology that makes it possible to design a stable beamformer having a good quality by considering the relationship of a filter coefficient between adjacent frequency bins. A filter coefficient optimization apparatus includes an optimization unit that calculates an optimum value of a filter coefficient w={w1, . . . , wF} (wf is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (target sound) from D sound source, af,d being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from an angular direction θd in which a sound source d exists, the sound wave being a plane wave, the optimization unit calculating the optimum value based on an optimization problem of a cost function defined using a sum of a sum of a cost function LMV_f(wf) and a predetermined regularization term, under a predetermined constraint condition, the predetermined regularization term being defined using a difference in phase between adjacent frequency bins relevant to a response wf Haf,d of the beamformer in the frequency bin f for the angular direction θd.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a U.S. National Stage Application filed under 35 U.S.C. § 371 claiming priority to International Patent Application No. PCT/JP2020/008233, filed on 28 Feb. 2020, the disclosure of which is hereby incorporated herein by reference in its entirety.
TECHNICAL FIELD
The present invention relates to a technology for optimizing a filter coefficient in target sound emphasis.
BACKGROUND ART
A beamforming using a microphone array is well known as a signal processing technique for emphasizing only sound (hereinafter referred to as target sound) that comes from a particular angular direction and suppressing sound (hereinafter referred to as non-target sound) that comes from other angular directions. This technique has been put to practical use in a telephone meeting system, a communication system in an automobile, a smart speaker, and the like.
In many conventional techniques relevant to the beamforming, an optimum filter is derived by solving an optimization problem of a cost function under some sort of constraint. For example, an MVDR (Minimum Variance Distortionless Response) beamformer described in Non Patent Literature 1 is obtained by using the power of an output signal as a cost function and minimizing this under a distortionless constraint condition for a target sound source angular direction.
Further, techniques of suppressing the non-target sound while imposing a constraint relevant to responses for a plurality of sound source directions in a situation where sound sources to be emphasized are in a plurality of angular directions have been already proposed as a beamformer design technique using a minimum variance method such as the technique described in Non Patent Literature 1. As one of them, there is an LCMV (Linearly Constrained Minimum Variance) beamformer (see Non Patent Literature 2). The LCMV beamformer emphasizes the target sound by imposing an equality constraint to responses of the beamformer for a plurality of angular directions, and suppresses the non-target sound by minimizing the variance of the output signal. A design technique for the LCMV beamformer will be described below in detail.
First, various definitions and notations are introduced. Hereinafter, signals are handled as values in time-frequency region after short-time Fourier transform.
A subscript of a time frame is expressed as t=1, . . . , T, and a subscript of a frequency bin is expressed as f=1, . . . , F. Further, complex conjugate transpositions of a vector v and a matrix M are expressed as a superscript H, as shown by vH and MH.
In the design of the LCMV beamformer, a linear filter (beamformer) that eliminate the non-target sound as unnecessary sound from an observation signal of a microphone array constituted by M microphone elements and emphasizes the target sound as the sound from a plurality of preset angular directions is configured. An observation signal for an M channel of the microphone array in a time frame t and a frequency bin f is shown as xf,t∈CM (f=1, . . . , F, t=1, . . . , T). A situation where D sound sources as signal sources that emit sound exist far off and a virtual plane wave comes to the microphone array is assumed. Further, it is assumed that all sound sources and all microphone elements are on identical planes. A signal that is emitted from a sound source d (d=1, . . . , D) and that comes to the microphone array in the time frame t and the frequency bin f is shown as sd,f,t∈C (d=1, . . . , D, f=1, . . . , F, t=1, . . . , T). It is assumed that the sound of the sound source d comes from an angular direction θd. It is assumed that the angular direction θd is known.
When an array manifold vector (hereinafter referred to as an array manifold vector in the frequency bin f corresponding to a sound wave as a plane wave that comes from the angular direction θd) in the frequency bin f from the sound source d to M microphone elements of the microphone array is shown as af,d∈CM (f=1, . . . , F, d=1, . . . , D), the observation signal xf,t is expressed by the following expression.
[ Math . 1 ] x f , t = d = 1 D s d , f , t a f , d + n f , t ( 1 )
Here, nf,t (f=1, . . . , F, t=1, . . . , T) expresses a noise component including noises added in the course of the observation and other echoes and non-directional noises. The array manifold vector af,d is a quantity that is automatically determined for each frequency bin f from physical characteristics of the microphone array and the whole system.
Hereinafter, a linear filter in the frequency bin f is expressed as wf∈CM (f=1, . . . , F), and this is referred to as a filter coefficient of the beamformer. The filter coefficient determines the behavior of the beamformer.
An output signal yf,t (f=1, . . . , F, t=1, . . . , T) of the beamformer is expressed by the following expression.
[Math. 2]
y f,t =w f H x f,t  (2)
That is, the design of the beamformer is the design of a filter coefficient wf (f=1, . . . , F) that meets Expression (2).
An inner product wf Haf,d of the filter coefficient wf and the array manifold vector af,d means a response characteristic of the beamformer in the frequency bin f for the angular direction θd. Accordingly, in a situation where it is desirable to certainly collect, at a constant gain, the sound that comes from a sound source in the angular direction θd (that is, from the sound source d), a method of imposing the following constraint condition (referred to as a distortionless constraint condition) on the filter coefficient wf is often used.
[Math. 3]
w f Hαf,d=1  (3)
(f=1, . . . , F)
It is possible to achieve the emphasis of the sound that comes from the sound source d, by setting the filter coefficient wf such that the distortionless constraint condition is met and gains for signals from unnecessary sound sources are reduced as much as possible.
In the case where it is desirable to concurrently emphasize the sound that comes from a plurality of sound sources, it is only necessary to concurrently impose a plurality of distortionless constraint conditions.
Since the beamformer is required to suppress the non-target sound, it is desired to set the filter coefficient wf such that the non-target sound is minimized under the constraint of the target sound emphasis. For mathematically formulating this, a cost function expressing the variance of the non-target sound is defined. It is expected that it is possible to design a desired beamformer by setting the filter coefficient such that the cost function is minimized.
When a spatial correlation matrix Rf (f=1, . . . , F) of the non-target sound is defined as Rf:=Et[xf,txf,t H], a cost function LMV_f(wf) expressing the variance of the non-target sound can be defined for each of the frequency bins f=1, . . . , F. Specifically, the cost function LMV_f(wf) is shown as the following expression.
[Math. 4]
L MV f (w f)=w f H R f w f  (4)
It is possible to design the beamformer by setting the filter coefficient wf (f=1, . . . , F) such that the sum of the cost function LMV_f(wf) is minimized under the constraint condition in Expression (3). When this is expressed as a mathematical expression, an optimization problem in the following expression is obtained.
[ Math . 5 ] min w 1 , , w F f L MV f ( w f ) s . t . w f H a f , d = 1 ( f = 1 , , F , d = 1 , , D ) ( 5 )
By solving the optimization problem in Expression (5), it is possible to obtain the optimum filter coefficient.
The optimization problem in Expression (5) can be divided into individual optimization problems for the respective frequency bins f=1, . . . , F. That is, for the frequency bin f, an optimization problem in the following expression may be solved instead of the optimization problem in Expression (5).
[ Math . 6 ] min w f L MV f ( w f ) s . t . w f H a f , d = 1 ( d = 1 , , D ) ( 6 )
By solving the optimization problem in Expression (5) or Expression (6) described above, it is possible to design the LCMV beamformer. This is the conventional design technique for the LCMV beamformer.
CITATION LIST Non-Patent Literature
  • Non-Patent Literature 1: J. Capon, “High-resolution frequency-wavenumber spectrum analysis”, Proceedings of the IEEE, vol. 57, no. 8, pp. 1408-1418, August 1969.
  • Non-Patent Literature 2: Futoshi Asano, “Acoustic Technology Series 16, Array signal processing for acoustics: localization, tracking and separation of sound sources, edited by The Acoustical Society of Japan”, Corona Publishing Co., Ltd., pp. 86-90, 2011.
SUMMARY OF THE INVENTION Technical Problem
In the conventional design technique for the LCMV beamformer, it is necessary to solve the optimization problem in Expression (5).
However, in the optimization problem in Expression (5), the relationship of the filter coefficient between adjacent frequency bins is not considered, and specifically, the reduction in the phase difference between adjacent frequency bins is not considered, so that it is not possible to design a stable beamformer having a good quality.
Hence, the present invention has an object to provide a filter coefficient optimization technology that makes it possible to design a stable beamformer having a good quality by considering the relationship of the filter coefficient between adjacent frequency bins.
Means for Solving the Problem
An aspect of the present invention is a filter coefficient optimization apparatus including an optimization unit that calculates an optimum value w* of a filter coefficient w={w1, . . . , wF} (wf (f=1, . . . , F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D), D being an integer equal to or more than 1, Rf (f=1, . . . , F) being a spatial correlation matrix for sound other than the target sound relevant to the frequency bin f, LMV_f(wf)=wf HRfwf (f=1, . . . , F) being a cost function relevant to a filter coefficient wf, θd (d=1, . . . , D) being an angular direction in which a sound source d exists, af,d (f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave, L(w) being a cost function relevant to the filter coefficient w and being defined using a sum of a sum Σf=1 FLMV_f(wf) of the cost function LMV_f(wf) and a predetermined regularization term, the optimization unit calculating the optimum value w* based on an optimization problem minwL(w) relevant to the filter coefficient w, under a predetermined constraint condition, the predetermined regularization term being defined using a difference in phase between adjacent frequency bins relevant to a response wf Haf,d (f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for the angular direction θd.
An aspect of the present invention is a filter coefficient optimization apparatus including an optimization unit that calculates an optimum value w* of a filter coefficient w={w1, . . . , wF} (wf (f=1, . . . , F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D), D being an integer equal to or more than 1, θd (d=1, . . . , D) being an angular direction in which a sound source d exists, af,d (f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave, the optimization unit calculating the optimum value w* by performing derivation so as to reduce a difference in phase between adjacent frequency bins relevant to a response wf Haf,d (f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for the angular direction θd.
Effects of the Invention
According to the present invention, it is possible to design a stable beamformer having a good quality, by optimizing the filter coefficient in consideration of the relationship of the filter coefficient between adjacent frequency bins.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a diagram showing a filter coefficient optimization algorithm.
FIG. 2 is a diagram showing a latent variable optimization algorithm.
FIG. 3 is a diagram showing a filter coefficient optimization algorithm.
FIG. 4 is a diagram showing a filter coefficient optimization algorithm.
FIG. 5 is a block diagram showing the configuration of a filter coefficient optimization apparatus 100.
FIG. 6 is a flowchart showing the behavior of the filter coefficient optimization apparatus 100.
FIG. 7 is a block diagram showing the configuration of an optimization unit 120.
FIG. 8 is a flowchart showing the behavior of the optimization unit 120.
FIG. 9 is a diagram showing an example of the functional configuration of a computer that realizes apparatuses in embodiments of the present invention.
DESCRIPTION OF EMBODIMENTS
Embodiments of the present invention will be described below in detail. Component units having identical functions are denoted by identical numerals, and repetitive descriptions are omitted.
Before the description of the embodiments, the notation method in the specification will be described.
“_” (underscore) indicates an inferior subscript. For example, “xy_z” shows that “yz” is a superscript for “x”, and “xy_z” shows that “yz” is an inferior subscript for “x”.
Further, for a certain character “x”, superscripts “{circumflex over ( )}” and “˜” for “{circumflex over ( )}x” and “˜x” should be originally put just above “x”, but “{circumflex over ( )}x” and “˜x” are shown because of the constraint about the notation in the specification.
<Technical Background>
A cost term (hereinafter referred to as a regularization term) in which the relationship of the filter coefficient between adjacent frequency bins is considered can be used for designing a stable beamformer having a good quality. In this technique, a new cost function is introduced by adding the regularization term to the cost function ΣfLMV_f(wf) described in Background Art, and the filter coefficient is determined by solving an optimization problem of the new cost function. When a first order difference and a second order difference in a frequency direction of a phase component related to the filter coefficient are used as the relationship of the filter coefficient between adjacent frequency bins, it is expected that it is possible to design a filter having a stable delay characteristic by using a regularization term corresponding to the first order difference and the second order difference, because the first order difference and the second order difference correspond to a phase delay and a group delay respectively.
The reason why the simple difference in the filter coefficient is not used as the relationship of the filter coefficient between adjacent frequency bins is that the calculation amount can become enormous. For avoiding this problem, in the invention in the present patent application, attention is focused on the difference in phase component related to the filter coefficient, in consideration of the correspondence to the above phase delay and group delay.
<<Regularization Term by Frequency-Directional Smoothing>>
Here, a regularization term using the difference in the phase component related to the filter coefficient will be descried as a regularization term by frequency-directional smoothing. The regularization term makes it possible to directly control the group delay and phase delay of the filter constituting the beamformer.
The response of the beamformer in the frequency bin f for the angular direction θd is expressed as a complex number wf Haf,d. An absolute value |wf Haf,d| of the response wf Haf,d of the beamformer is referred to as an amplitude, and a deflection angle ∠(wf Haf,d) is referred to as a phase. Two forms will be shown below as examples of the regularization term by the frequency-directional smoothing.
(Continuous Form)
As an example of the regularization term in the form, there is a regularization term that is defined by the first order difference in phase between adjacent frequency bins. This regularization term is given by the following expression.
[ Math . 7 ] η f = 1 F - 1 d = 1 D "\[LeftBracketingBar]" ( w f H a f , d ) - ( w f + 1 H a f + 1 , d ) "\[RightBracketingBar]" 2 π ( 7 )
Further, as another example, there is a regularization term that is defined by the second order difference in phase between adjacent frequency bins. This regularization term is given by the following expression.
[ Math . 8 ] η f = 1 F - 2 d = 1 D "\[LeftBracketingBar]" ( w f H a f , d ) - 2 ( w f + 1 H a f + 1 , d ) + ( w f + 2 H a f + 2 , d ) "\[RightBracketingBar]" 2 π ( 8 )
In Expression (7) and Expression (8), η (η is a predetermined positive number) represents a weight parameter. Further, |●| in Expression (7) and Expression (8) is a norm that is defined by the following expression.
[ Math . 9 ] "\[LeftBracketingBar]" x "\[RightBracketingBar]" 2 π = min n = , - 1 , 0 , 1 , "\[LeftBracketingBar]" x - 2 n π "\[RightBracketingBar]" ( 9 )
That is, |x| is a special norm in which the periodicity of a variable x is considered.
(Discrete Form)
For defining the regularization term in the form, a variable cf,d (f=1, . . . , F, d=1, . . . , D) that depends on the phase of the response wf Haf,d of the beamformer and that has discrete values is introduced for a filter coefficient wf and an array manifold vector af,d. Specifically, a complex plane is divided into C sectors that are around the origin and that have an equal central angle, consecutive numbers 1, . . . , C are assigned in a counterclockwise manner, and cf,d is the number of a sector where the complex number wf Haf,d is positioned. Accordingly, the discrete variable cf,d has one value of 1, . . . , C. Further, the following expression is satisfied among the filter coefficient wf, the array manifold vector af,d and the discrete variable cf,d.
[ Math . 10 ] ( w f H a f , d ) [ 2 π ( c f , d - 1 ) / C , 2 π c f , d / C ] ( 10 )
The regularization term in the following expression is defined using the discrete variable cf,d that meets Expression (10).
[ Math . 11 ] η f = 1 F - 1 d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" C ( 11 )
In Expression (11), η (η is a predetermined positive number) represents a weight parameter. Further, |●|c in Expression (11) is a norm that is defined by the following expression.
[ Math 12 ] "\[LeftBracketingBar]" x "\[RightBracketingBar]" C = min n = , - 1 , 0 , 1 , "\[LeftBracketingBar]" x - nC "\[RightBracketingBar]" ( 12 )
That is, |x|c is a special norm in which the periodicity of the variable x is considered.
In the case of cf=(cf,1, . . . , cf,D), the regularization term (hereinafter referred to as {circumflex over ( )}Lη(c1,1, . . . , cF,D)) in Expression (11) is expressed as follows.
[ Math . 13 ] L ^ η ( c 1 , 1 , , c F , D ) = f = 1 F - 1 L ^ η f ( c f , c f + 1 ) ( 13 ) L ^ η f ( c f , c f + 1 ) = η d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" C ( 14 )
An example of the introduction of the regularization term {circumflex over ( )}Lη(c1,1, . . . , cF,D) in the design of the LCMV beamformer will be described below. Assuming that a particularly important target sound exists in the first angular direction (that is, an angular direction θ1) of D angular directions θ1, . . . , θD, a distortionless constraint condition wf Haf,1=1 (f=1, . . . , F) is imposed for the angular direction θ1. In this case, an optimization problem to be solved is shown as follows.
[ Math . 14 ] min { c f , w f } f = 1 F ( f = 1 F L MV f ( w f ) + f = 1 F - 1 L ^ η f ( c f , c f + 1 ) ) s . t . ( w f H a f , d ) [ 2 π ( c f , d - 1 ) / C , 2 π c f , d / C ] ( f = 1 , , F , d = 1 , , D ) , w f H a f , 1 = 1 ( f = 1 , , F ) ( 15 )
This optimization problem can be solved by evaluating optimum values of the filter coefficient wf that minimize a cost function Σf=1 FLMV_f(wf)+Σf=1 F-1{circumflex over ( )}Lηf(cf, cf+1) for all values that the discrete variable cf (f=1, . . . , F) can have, and thereafter, among them, adopting the optimum value that minimizes the value of the cost function Σf=1 FLMV_f(wf)+Σf=1 F-1{circumflex over ( )}Lηf(cf, cf+1), but in fact, there is a more efficient algorithm. The algorithm is shown in FIG. 1 .
When the value of the discrete variable cf is set, the optimum value of the filter coefficient wf is determined depending on only the value of the discrete variable cf, regardless of the values of the other frequency bins. Therefore, by previously the filter coefficient wf for all values that CD discrete variables cf can have for each frequency bin f, the optimization problem results in a shortest path problem relevant to the discrete variable cf. Accordingly, the optimization problem can be solved at high speed by applying a Dijkstra method. This is used in the algorithm in FIG. 1 .
In the optimization problem in Expression (15), the distortionless constraint condition for one angular direction is used, but a distortionless constraint condition for a plurality of angular directions may be used. However, when the distortionless constraint condition for a plurality of angular directions is used, the constraint sometimes becomes excessively strict, so that the solution is not evaluated. The relaxation of the distortionless constraint condition is possible, but in this case, a non-convex optimization problem is sometimes obtained. Generally, it is difficult to solve the non-convex optimization problem. Hence, a technique for optimizing the filter coefficient by solving a convex optimization problem equivalent to the non-convex optimization problem instead of solving the non-convex optimization problem will be described below.
First, a method for transforming the non-convex optimization problem into the convex optimization problem equivalent to the non-convex optimization problem and a method for solving the convex optimization problem obtained by the transformation will be described. Next, two examples will be described as examples of the use of the method for the non-convex optimization problem obtained by the relaxation of the constraint condition.
<<Transformation into Convex Optimization Problem Equivalent to Non-Convex Optimization Problem and Solution Method>>
Here, a method for transforming the non-convex optimization problem into the convex optimization problem equivalent to the non-convex optimization problem and a method for solving the convex optimization problem obtained by the transformation will be described. An optimization problem relevant to a latent variable ˜w that is defined by the following expression will be discussed below.
[ Math . 15 ] min w ~ ( L convex ( w ~ ) + d = 1 D L d ( w ~ ) ) ( 16 )
Here, Lconvex is a strongly convex function relevant to the latent variable ˜w, and Ld (d=1, . . . , D, D is an integer equal to or more than 1) is a function relevant to the latent variable ˜w. That is, Ld (d=1, . . . , D) does not always need to be a convex function.
Generally, the optimization problem in Expression (16) is an optimization problem in which the cost function is a non-convex function, that is, a non-convex optimization problem. The non-convex optimization problem is a difficult problem as described above, and therefore, is intended to result in a convex optimization problem to be solved more easily, by introducing a certain kind of approximation. Hence, the function Ld(˜w) (d=1, . . . , D) is intended to be approximated by a piecewise convex function constituted by a plurality of convex functions.
The definition of the piecewise convex function will be described below. For the function Ld(˜w) (d=1, . . . , D) to be approximated, the domain is divided into regions Sd,1, . . . , Sd,C that are C closed convex sets. Then, a function {circumflex over ( )}d,c (c=1, . . . , C) that is defined for each of the regions Sd,1, . . . , Sd,C is introduced. The newly introduced function {circumflex over ( )}d,c is a convex function on the region Sd,c, and is a function for approximating the function Ld on the region Sd,c. In the case where the function Ld is a convex function on the region Sd,c, {circumflex over ( )}d,c=Ld may be adopted on the region Sd,c. Thereby, the function Ld(˜w) can be approximately expressed by the piecewise convex function {circumflex over ( )}d,c (c=1, . . . , C). Generally, as the value (that is, the number into which the domain of the function Ld is divided) of C is larger, the approximation can be performed by a more accurate piecewise convex function.
However, when the approximation is used, a discrete variable representing a region to which the optimum value as the solution of the optimization problem belongs is newly added as an optimized object, in addition to the latent variable that is an optimized object in the optimization problem in Expression (16), so that the number of variables to be optimized increases. However, when the discrete variable is fixed, for the latent variable, the optimization problem results in the convex optimization (instead of the non-convex optimization), and therefore can be solved relatively easily. This will be specifically described below. The optimization problem that is formulated using the approximation is expressed by the following expression, with cd (d=1, . . . , D) as a discrete variable that has a value of 1, . . . , C.
[ Math . 16 ] min w ~ ( L convex ( w ~ ) + d = 1 D min c d Λ d , c d ( w ~ ) ) ( 17 )
Expression (17) is equivalent to the following expression.
[ Math . 17 ] min c 1 , , c D ( min w ~ ( L convex ( w ~ ) + d = 1 D Λ d , c d ( w ~ ) ) ) ( 18 )
In Expression (18), min˜w(Lconvex(˜w)+Σd=1 D{circumflex over ( )}d,c_d(˜w)) is a convex optimization problem relevant to the latent variable ˜w, and can be solved relatively easily. The procedure will be described below. First, the convex optimization problem min˜w(Lconvex (˜w)+Σd=1 D{circumflex over ( )}d,c_d(˜w)) is solved for all values that the discrete variable (c1, . . . , cD) can have. Thereby, the solution of the convex optimization problem min˜w(Lconvex(˜w)+Σd=1 D{circumflex over ( )}d,c_d(˜w)) is evaluated for all values that the CD discrete variables (c1, . . . , cD) can have. Then, among the obtained solutions of the convex optimization problem, a solution that minimizes the value the cost function Lconvex(˜w)+Σd=1 D{circumflex over ( )}d,c_d(˜w) is adopted as the optimum value. Thereby, the optimization problem in Expression (18) can be solved. The procedure of the solution method is illustrated in FIG. 2 .
The non-convex optimization problem in Expression (16) can be transformed into the convex optimization problem in Expression (18) that is equivalent to the non-convex optimization problem in Expression (16), and the convex optimization problem in Expression (18) can be solved by the latent variable optimization algorithm in FIG. 2 .
Application Example
Here, an example in which the above-described versatile scheme of evaluating the optimum value after transforming the non-convex optimization problem into the convex optimization problem is applied to the non-convex optimization problem obtained by relaxing the constraint condition in Expression (3) will be described.
As described above, in the related art in Non Patent Literature 1, Expression (3) that is an equality constraint is imposed for many objects, and therefore, there is a fear that an appropriate filter coefficient cannot be obtained. Hence, it is intended to use a softer constraint condition that is suitable for a real situation. Specifically, it is intended to use a constraint condition (that is, a constraint condition in which there is no constraint relevant to the phase) in which a constraint is imposed for only the amplitude of the response of the beamformer, instead of the constraint condition in Expression (3). For example, the following expression can be used.
[Math. 18]
|w f Hαf,d|=1  (19)
Further, as another example, the following expression can be used.
[Math. 19]
|w f Hαf,d|≥1  (20)
The constraint condition in Expression (19) and the constraint condition in Expression (20) express the constraint that the amplitude of the response of the beamformer is a constant value (specifically, 1) and the constraint that the amplitude of the response of the beamformer only needs to be equal to or more than a constant value (specifically, 1), respectively. Each of the constraint condition in Expression (19) and the constraint condition in Expression (20) is mathematically classified into a non-convex constraint.
An optimization problem in which the constraint condition is Expression (20) will be discussed below. The constraint condition in Expression (20) shows that the absolute value of the complex number wf Haf,d is equal to or more than 1. This means that the complex number wf Haf,d needs to be geometrically positioned on a unit circle or outside the unit circle in the complex plane. Hence, first, the complex plane is equally divided into C sectors that are around the origin. The C sectors correspond to the C regions described above. Then, on the border or inside of each sector, Expression (20) that is the original constraint is approximated by C convex functions.
This will be specifically described below. The discrete variable cf,d is adopted as a variable that has a value of 1, . . . , C, for the frequency bin f (f=1, . . . , F) and the sound source d (d=1, . . . , D). Further, γf,d=wf Haf,d is satisfied. A convex function {circumflex over ( )}(f,d),c_f,df,d) (cf,d=1, . . . , C) that is defined for the frequency bin f (f=1, . . . , F) and the sound source d (d=1, . . . , D) is defined such that the values of the complex number γf,d are restricted inside the sectors around the origin at a central angle 2π/C on the complex plane and in a range in which |γf,d|≥1 is met. Then, Expression (20) is approximated by a piecewise convex function using the C convex functions {circumflex over ( )}(f,d),c_f,df,d) (cf,d=1, . . . , C).
For example, the function {circumflex over ( )}(f,d),c_f,d may be a function expressed by the following expression.
[ Math . 20 ] Λ ( f , d ) c f , d ( γ f , d ) := { 0 ( R ( γ f , d e - 2 π j ( c f , d + 1 ) / 2 C 1 , 2 π c f , d C ∠γ f , d 2 π ( c f , d + 1 ) C ) ( otherwise ) ( 21 )
Here, R(z) represents the real part of a complex number z.
When the value of C is large, the approximation can be performed more accurately, but in the case of solving the optimization problem using the algorithm in FIG. 2 , it is necessary to examine all combinations of the discrete variables, so that the calculation amount increases.
Thus, the filter coefficient optimization problem in which the constraint condition is Expression (20) results in a convex optimization problem in the following expression.
[ Math . 21 ] min { c f , w f } f = 1 F ( f = 1 F L MV f ( w f ) + f = 1 F d = 1 D Λ ( f , d ) , c f , d ( w f H a f , d ) ) ( 22 )
Here, cf=(cf,1, . . . , cf,D) is satisfied.
This optimization problem can be solved by applying the latent variable optimization algorithm in FIG. 2 . An algorithm for solving the optimization problem is shown in FIG. 3 . That is, FIG. 3 shows a filter coefficient optimization algorithm that is obtained based on the latent variable optimization algorithm in FIG. 2 .
Application Example 2
The optimization problem of the filter coefficient w that is defined using the cost function Σf=1 FLMV_f(wf)+Σf=1 F-1{circumflex over ( )}Lηf(cf, cf+1) under the constraint condition |wf Haf,d|≥1 (f=1, . . . , F, d=1, . . . , D) will be discussed. This problem is a non-convex optimization problem that is obtained by using the constraint condition |wf Haf,d|≥1 (f=1, . . . , F, d=1, . . . , D) instead of the constraint condition wf Haf,d=1 (f=1, . . . , F, d=1, . . . , D).
Note that the discrete variable cf,d defined in <<Regularization Term by Frequency-Directional Smoothing>> and the discrete variable cf,d defined in <<Application Example>> are the same as each other. Thereby, the above non-convex optimization problem results in the following convex optimization problem.
[ Math . 22 ] min { c f , w f } f = 1 F ( f = 1 F L MV f ( w f ) + f = 1 F d = 1 D Λ ( f , d ) , c f , d ( w f H a f , d ) + f = 1 F - 1 L ^ η f ( c f , c f + 1 ) ) ( 23 )
Similarly to the optimization problem in Expression (15), the optimization problem in Expression (23) can be solved at high speed by applying the Dijkstra method. The algorithm is shown in FIG. 4 .
First Embodiment
From a signal (observation signal) resulting from observing sound (hereinafter referred to as target sound) from D (D is an integer equal to or more than 1) sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D), a filter coefficient optimization apparatus 100 calculates the optimum value w* of the filter coefficient w={w1, . . . , wF} (wf (f=1, . . . , F, F is an integer equal to or more than 1) is the filter coefficient of the frequency bin f) of the beamformer that emphasizes the target sound, using a microphone array constituted by M (M is an integer equal to or more than 1) microphone elements. The observation signal is an input data that is used for the optimization of the filter coefficient, and therefore, the observation signal is referred to as optimization data, hereinafter.
The filter coefficient optimization apparatus 100 will be described below with reference to FIG. 5 and FIG. 6 . FIG. 5 is a block diagram showing the configuration of the filter coefficient optimization apparatus 100. FIG. 6 is a flowchart showing the behavior of the filter coefficient optimization apparatus 100. As shown in FIG. 5 , the filter coefficient optimization apparatus 100 includes a setup data calculation unit 110, an optimization unit 120, and a recording unit 190. The recording unit 190 is a component unit that appropriately records the information necessary for the processing in the filter coefficient optimization apparatus 100. For example, the recording unit 190 records the filter coefficient that is an optimized object.
The behavior of the filter coefficient optimization apparatus 100 will be described with FIG. 6 .
In S110, the setup data calculation unit 110 calculates setup data that is used at the time of the optimization of the filter coefficient w, using the optimization data. In the case of using the cost function for optimizing the filter coefficient w, examples of the setup data include a spatial correlation matrix Rf (f=1, . . . , F) for sound other than the target sound relevant to the frequency bin f and the array manifold vector af,d (f=1, . . . , F, d=1, . . . , D) in the frequency bin f corresponding to a sound wave as a plane wave that comes from the angular direction θd (d=1, . . . , D) in which the sound source d exists obtained based on the observation signal.
In S120, the optimization unit 120 calculates the optimum value w* of the filter coefficient w, using the setup data generated in S110. For example, the optimization unit 120 can calculate the optimization value w* based on the optimization problem minwL(w) relevant to the filter coefficient w under a predetermined constraint condition. Here, LMV_f(wf)=wf HRfwf (f=1, . . . , F) is a cost function relevant to the filter coefficient wf, and the function L(w) is a cost function relevant to the filter coefficient w and is defined using the sum of the sum Σf=I FLMV_f(wf) of the function LMV_f(wf) and a predetermined regularization term. Further, the predetermined regularization term is a regularization term that is defined using the difference in phase between adjacent frequency bins relevant to the response wf Haf,d (f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for the angular direction θd.
Some examples of the regularization term will be shown. Here, η is a predetermined positive value, and ∠(wf Haf,d) (f=1, . . . , F, d=1, . . . , D) expresses the phase of the response wf Haf,d of the beamformer in the frequency bin f for the angular direction θd.
The first example is ηΣf=1 F-1Σd=1 D|∠(wf Haf,d)−∠(wf+1 Haf+1,d)|. The second example is ηΣf=1 F-2Σd=1 D|∠(wf Haf,d)−2∠(wf+1 Haf+1,d)+∠(wf+2 Haf+2,d)|.
The third example is ηΣf=1 F-1Σd=1 D|cf,d−cf+1,d|c. Here, C is an integer equal to or more than 1, and cf,d (f=1, . . . , F, d=1, . . . , D) is a discrete variable that has one value of 1, . . . , C that satisfies ∠(wf Haf,d)∈[2π(cf,d−1)/C, 2πcf,d/C] for the phase ∠(wf Haf,d).
A case where the third example is used as the regularization term will be described below. In this case, an example of the constraint condition is expressed by the following expression.
[Math. 23]
w f Hαf,1=1  (*)
(f=1, . . . , F)
Further, another example of the constraint condition is expressed by the following expression.
[Math. 24]
|w f Hαf,d|≥1  (**)
(f=1, . . . , F, d=1, . . . , D)
Case 1 and Case 2 will be described below. Case 1 is a case where the optimization unit 120 solves the optimization problem of the cost function that is defined using the sum of the sum Σf=1 FLMV_f(wf) of the cost function LMV_f(wf) and the regularization term ηΣf=1 F-1Σd=1 D|cf,d−cf+1,d|c under a constraint condition (*). Case 2 is a case where the optimization unit 120 solves the optimization problem of the cost function that is defined using the sum of the sum Σf=1 FLMV_f(wf) of the cost function LMV_f(wf) and the regularization term ηΣf=1 F-1Σd=1 D|cf,d−cf+1,d|c under a constraint condition (**).
(Case 1)
The optimization unit 120 will be described below with reference to FIG. 7 and FIG. 8 . FIG. 7 is a block diagram showing the configuration of the optimization unit 120. FIG. 8 is a flowchart showing the behavior of the optimization unit 120. As shown in FIG. 7 , the optimization unit 120 includes an initialization unit 121, a candidate calculation unit 122 and an optimum value determination unit 123.
The behavior of the optimization unit 120 will be described with FIG. 8 . Here, cf=(cf,1, . . . , cf,D) (f=1, . . . , F) is a discrete variable that is defined by the discrete variable cf,1, . . . , cf,D.
In S121, the initialization unit 121 initializes α0[cf] (f=1, . . . , F), by the following expression.
α0[c f]=0  [Math. 25]
In S122, the candidate calculation unit 122 calculates αf[cf] for all values that the discrete variable cf can have, for each the frequency f, and sets the value of the variable copt to copt=argmincαF[c].
[ Math . 26 ] w f dp [ c f ] argmin w f L MV f ( w f ) s . t . ( w f H a f , d ) [ 2 π ( c f , d - 1 ) / C , 2 π c f , d / C ] ( d = 1 , , D ) w f H a f , 1 = 1 c f prev [ c f ] argmin c f - 1 ( α f - 1 [ c f - 1 ] + L ^ η f - 1 ( c f - 1 , c f ) ) α f [ c f ] α f - 1 [ c f prev [ c f ] ] + L ^ η f - 1 ( c f prev [ c f ] , c f ) + L MV f ( w f dp [ c f ] ) ( L η f ( c f , c f + 1 ) = η d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" c )
In S123, using the value of the variable copt calculated in S122 as an input, the optimum value determination unit 123 calculates the optimum value wf* of the filter coefficient wf and the value of the variable copt for the frequency bin f, in descending order from F to 1, by the following expression, and obtains the optimum value w* from w*={w1*, . . . , wF*}.
w f *←w f dp[c opt]
c opt ←c f prev[c opt]  [Math. 27]
(Case 2)
In this case, the optimization unit 120 may calculate the optimization value w* by solving an optimization problem min{c_f,w_f}f=1 FLMV_f(wf)+Σf=1 FΣd=1 D{circumflex over ( )}(f,d),c_f,d(wf Haf,d)+ηΣf=1 F-1Σd=1 D|cf,d−cf+1,d|c) relevant to the filter coefficient w and the discrete variable c1, . . . , cF, instead of solving the optimization problem under the constraint condition (**). Here, cf=(cf,1, . . . , cf,D) (f=1, . . . , F) is a discrete variable that is defined by the discrete variable cf,1, . . . , cf,D, and {circumflex over ( )}(f,d),c_f,d (f=1, . . . , F, d=1, . . . , D) is a function relevant to a variable γf,d that is defined by the following expression (γf,d=wf Haf,d).
[ Math . 28 ] Λ ( f , d ) , c f , d ( γ f , d ) = { 0 ( R ( γ f , d e - 2 π j ( c f , d + 1 ) / 2 C 1 , 2 π c f , d C ∠γ f , d 2 π ( c f , d + 1 ) C ) ( otherwise ) ( 21 )
The optimization unit 120 will be described below with reference to FIG. 7 and FIG. 8 . FIG. 7 is a block diagram showing the configuration of the optimization unit 120. FIG. 8 is a flowchart showing the behavior of the optimization unit 120. As shown in FIG. 7 , the optimization unit 120 includes an initialization unit 121, a candidate calculation unit 122 and an optimum value determination unit 123.
The behavior of the optimization unit 120 will be described with FIG. 8 .
In S121, the initialization unit 121 initializes α0[cf] (f=1, . . . , F), by the following expression.
α0[c f]=0  [Math. 29]
In S122, the candidate calculation unit 122 calculates αf[cf] for all values that the discrete variable cf can have, for each the frequency bin f, and sets the value of the variable copt to copt=argmincαF[c].
[ Math . 30 ] w f dp [ c f ] arg min w f ( L MV f ( w f ) + d = 1 D Λ ( f , d ) , c f , d w f H a f , d ) ) c f prev [ c f ] argmin c f - 1 ( α f - 1 [ c f - 1 ] + L ^ η f - 1 ( c f - 1 , c f ) ) α f [ c f ] α f - 1 [ c f prev [ c f ] ] + L ^ η f - 1 ( c f prev [ c f ] , c f ) + L MV f ( w f dp [ c f ] ) + d = 1 D Λ ( f , d ) , c f , d ( w f dp [ c f ] H a f , d ) ( L η f ( c f , c f + 1 ) = η d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" c )
In S123, using the value of the variable copt calculated in S122 as an input, the optimum value determination unit 123 calculates the optimum value wf* of the filter coefficient wf and the value of the variable copt for the frequency bin f, in descending order from F to 1, by the following expression, and obtains the optimum value w* from w*={w1*, . . . , wF}.
w f *←w f dp[c opt]
c opt ←c f prov[c opt]  [Math. 31]
As described above, it can be said that the optimization unit 120 calculates the optimum value w* by performing derivation so as to reduce the difference in phase between adjacent frequency bins relevant to the response wf Haf,d (f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for the angular direction θd.
According to the embodiments of the present invention, it is possible to design a stable beamformer having a good quality, by optimizing the filter coefficient in consideration of the relationship of the filter coefficient between adjacent frequency bins.
<Supplement>
FIG. 9 is a diagram showing an example of the functional configuration of a computer that realizes the apparatuses described above. The processing in the apparatuses described above can be executed when a recording unit 2020 reads programs for causing a computer to function as the apparatuses described above and a control unit 2010, an input unit 2030, an output unit 2040 and the like to behave.
For example, as a single hardware entity, the apparatus in the present invention includes an input unit that can be connected with a keyboard and the like, an output unit that can be connected with a liquid crystal display and the like, a communication unit that can be connected with a communication device (for example, a communication cable) capable of communicating with the exterior of the hardware entity, a CPU (Central Processing Unit, a cache memory, a register and the like may be included), a RAM and a ROM that are memories, an external storage device that is a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM and the external storage device such that data can be exchanged. Further, as necessary, the hardware entity may be provided with a device (drive) that can perform reading and writing for a record medium such as a CD-ROM. As a physical entity including the hardware resources, there are a general-purpose computer and the like.
In the external storage device of the hardware entity, programs necessary for realizing the above functions, data necessary in the processing of the programs, and the like are stored (for example, the program may be stored in a ROM that is a read-only storage without being limited to the external storage device). Further, data and others obtained by the processing of the programs are appropriately stored in the RAM, the external storage device or the like.
In the hardware entity, the programs stored in the external storage device (or the ROM or the like) and the data necessary for the processing of the programs are read in the memory as necessary, and are appropriately interpreted, executed or processed by the CPU. As a result, the CPU realizes predetermined functions (the above component units expressed as the . . . unit, the . . . means and the like).
The present invention is not limited to the above-described embodiments, and modifications can be appropriately made without departing from the spirit of the present invention. Further, the processes described in the above embodiments do not need to be executed in a time-series manner in the order of the descriptions, and may be executed in parallel or individually, depending on the processing capacities of the devices that execute the processes or as necessary.
In the case where the processing functions in the hardware entity (the apparatus in the present invention) described in the above embodiments are realized by a computer as described above, the processing contents of the functions to be included in the hardware entity are described by programs. Then, the programs are executed by the computer, and thereby, the processing functions in the above hardware entity are realized on the computer.
The programs describing the processing contents can be recorded in a computer-readable record medium. As the computer-readable record medium, for example, a magnetic record device, an optical disk, a magneto-optical record medium, a semiconductor memory and others may be used. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape or the like can be used as the magnetic record device, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Readable)/RW (ReWritable) or the like can be used as the optical disk, an MO (Magneto-Optical disc) or the like can be used as the magneto-optical record medium, and an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used as the semiconductor memory.
For example, the distribution of the programs is performed by sale, transfer, lending or the like of a portable record medium such as a DVD or CD-ROM in which the programs are recorded. Furthermore, the programs may be distributed by storing the programs in a storage device of a server computer and transmitting the programs from the server computer to another computer through a network.
For example, the computer that executes the programs, first, once stores the programs recorded in the portable record medium or the programs transmitted from the server computer, in its own storage device. Then, at the time of the execution of the processing, the computer reads a program stored in its own storage device, and executes a process in accordance with the read program. Further, as another form of the execution of the programs, the computer may read a program directly from the portable record medium, and may execute a process in accordance with the program.
Furthermore, whenever a program is transmitted from the server computer to the computer, the computer may execute a process in accordance with the received program. Further, the above-described processes may be executed by a so-called ASP (Application Service Provider) service in which the processing functions are realized by only execution instruction and result acquisition, without the transmission of the programs from the server computer to the computer. The program in the form includes information that is supplied for the processing by an electronic computer and that is similar to the program (for example, data that is not a direct command to the computer but has a property of prescribing the processing by the computer).
In the form, the hardware entity is configured by executing predetermined programs on the computer, but at least some of the processing contents may be realized in hardware.
The above description of the embodiment of the present invention has been presented for the purpose of exemplification and description. It is not intended to be exhaustive, and it is not intended to limit the invention to the disclosed strict form. Modifications and variations can be made from the above disclosure. The embodiments are selected and expressed, such that the best exemplification of the principle of the present invention is provided and such that a person skilled in the art can use the present invention as various embodiments suitable for deliberated actual use or can use the present invention while adding various modifications. All modifications and variations fall within the scope of the present invention that is determined by the attached claims interpreted based on a range given justly, lawfully and fairly.

Claims (13)

The invention claimed is:
1. A filter coefficient optimization apparatus including an optimization unit that calculates an optimum value w* of a filter coefficient w={w1, . . . , WF} (wf(f=1, . . . , F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D),
D being an integer equal to or more than 1,
the optimization unit calculating the optimum value w* based on an optimization problem minwL(w) relevant to the filter coefficient w, under a predetermined constraint condition,
L(w) being a cost function relevant to the filter coefficient w and being defined using a sum of a sum Σf=1 FLMV_f(wf) of a cost function LMV_f(wf) and a predetermined regularization term,
LMV_f(wf)=wf HRfwf(f=1, . . . , F) being a cost function relevant to the filter coefficient wf.
Rf(f=1, . . . , F) being a spatial correlation matrix for sound other than the target sound relevant to the frequency bin f,
the predetermined regularization term being defined using a difference in phase between adjacent frequency bins relevant to a response wf Haf,d (f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for an angular direction θd.
θd(d=1, . . . , D) being the angular direction in which a sound source d exists, and
af,d(f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave; and
an output unit that sets the optimum value w* to the frequency bin of the beamformer and emphasizes at least a part of the sound wave as the target sound.
2. The filter coefficient optimization apparatus according to claim 1, wherein
the predetermined regularization term is ηΣf=1 F−1Σd=1 D|∠(wf Haf,d)−∠(wf+1 Haf+1,d)|or ηΣf=1 F−2Σd=1 D|∠(wf Haf,d)−2∠(wf+1 Haf+1 Haf+1,d)+∠(wf+2 Haf+2,d)|.
η being a predetermined positive number, and
∠(wf Haf,d) (f=1, . . . , F, d=1, . . . , D) being the phase of the response wf Haf,d of the beamformer in the frequency bin f for the angular direction θd.
3. The filter coefficient optimization apparatus according to claim 1, wherein:
the predetermined regularization term is
ηΣf=1 F−1Σd=1 D|cf,d-cf+1,d|c.
η being a predetermined positive number,
C being an integer equal to or more than 1,
∠(wf Haf,d) (f=1, . . . , F, d=1, . . . , D) being the phase of the response wf Haf,d of the beamformer in the frequency bin f for the angular direction θd, and
cf,d(f=1, . . . , F, d=1, . . . , D) being a discrete variable having one value of 1, . . . , C that satisfies ∠(wf Haf,d)∈[2π(cf,d−1)/C, 2πcf,d/C] for the phase ∠(wf Haf,d).
4. The filter coefficient optimization apparatus according to claim 3, wherein the predetermined constraint condition is expressed by the following expression:

w f Hαf,d=1  [Math. 32]
(f=1, . . . , F).
5. The filter coefficient optimization apparatus according to claim 3, wherein the predetermined constraint condition is expressed by the following expression:

w f Hαf,d≥1  [Math. 33]
(f=1, . . . , F, d=1, . . . , D).
6. The filter coefficient optimization apparatus according to claim 4, wherein
the optimization unit includes
a candidate calculation unit that calculates αf[cf] for all values that a discrete variable cf can have, for each frequency bin f, by the following expression, and adopts copt=argmincαF[c] as a value of a variable copt,
cf=(cf,1, . . . , cf,D) (f=1, . . . , F) being the discrete variable that is defined by a discrete variable cf,1, . . . , cf,D.
[ Math . 34 ] w f dp [ c f ] argmin w f L MV f ( w f ) s . t . ( w f H a f , d ) [ 2 π ( c f , d - 1 ) / C , 2 π c f , d / C ] ( d = 1 , , D ) , w f H a f , 1 = 1 c f prev [ c f ] arg min c f - 1 ( α f - 1 [ c f - 1 ] + L ^ η f - 1 ( c f - 1 , c f ) ) α f [ c f ] α f - 1 [ c f prev [ c f ] ] + L ^ η f - 1 ( c f prev [ c f ] , c f ) + L MV f ( w f dp [ c f ] ) ( L η f ( c f , c f + 1 ) = η d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" c ) , and
an optimum value determination unit that calculates an optimum value wf* of the filter coefficient wf and the value of the variable copt for the frequency bin f, in descending order from F to 1, by the following expression, and obtains the optimum value w* from w*={w1*, . . . , wF*}:

w f *←w f dp[c opt]

c opt ←c f prev[c opt].  [Math. 35]
7. The filter coefficient optimization apparatus according to claim 5, wherein:
Λ ( f , d ) , c f , d ( γ f , d ) = { 0 ( R ( γ f , d e - 2 π j ( c f , d + 1 ) / 2 C ) 1 , 2 π c f , d C ∠γ f , d 2 π ( c f , d + 1 ) C ) ( otherwise )
the optimization unit calculates the optimum value w* by solving an optimization problem min{c_f,w_f}f=1 FLMV_f(wf)+Σf=1 d=1 D{circumflex over ( )}(f,d)c_f,d(wf Haf,d)+ηΣf=1 F−1Σd=1 D|df,d-cf+1,3|c) relevant to the filter coefficient w and a discrete variable c1, . . . , cF, instead of solving the optimization problem minwL(w),
cf=(cf,1, . . . , cf,D) (f=1. . . , F) being the discrete variable that is defined by the discrete variable cf,1, . . . , cf,D, and
{circumflex over ( )}(f,d),c_f,d(f=1, . . . , F, d=1, . . . , D) is a function relevant to a variable γf,d that is defined by the following expression (γf,d=wf Haf,d):
[ Math . 36 ] Λ ( f , d ) , c f , d ( γ f , d ) = { 0 ( R ( γ f , d e - 2 π j ( c f , d + 1 ) / 2 C ) 1 , 2 π c f , d C ∠γ f , d 2 π ( c f , d + 1 ) C ) ( otherwise ) .
8. The filter coefficient optimization apparatus according to claim 7, wherein
the optimization unit includes
a candidate calculation unit that calculates αf[cf] for all values that the discrete variable f can have, for each frequency bin f, by the following expression, and adopts as a Copt=argmincαF[c] value of a variable copt:
[ Math . 37 ] w f dp [ c f ] arg min w f ( L MV f ( w f ) + d = 1 D Λ ( f , d ) , c f , d ( w f H a f , d ) ) c f prev [ c f ] argmin c f - 1 ( α f - 1 [ c f - 1 ] + L ^ η f - 1 ( c f - 1 , c f ) ) α f [ c f ] α f - 1 [ c f prev [ c f ] ] + L ^ η f - 1 ( c f prev [ c f ] , c f ) + L MV f ( w f dp [ c f ] ) + d = 1 D Λ ( f , d ) , c f , d ( w f dp [ c f ] H a f , d ) ( L η f ( c f , c f + 1 ) = η d = 1 D "\[LeftBracketingBar]" c f , d - c f + 1 , d "\[RightBracketingBar]" c ) and
an optimum value determination unit that calculates an optimum value wf* of the filter coefficient wf and the value of the variable Copt for the frequency bin f, in descending order from F to 1, by the following expression, and obtains the optimum value w* from w*={w1*, . . . ,wF*}:

w f *←w f dp[c opt]

c opt ←c f prev[c opt].  [Math. 38]
9. A filter coefficient optimization apparatus including an optimization unit that calculates an optimum value w* of a filter coefficient w={w1, . . . , wF} (wf(f=1, . . . , F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D),
D being an integer equal to or more than 1,
the optimization unit calculating the optimum value w* by performing derivation so as to reduce a difference in phase between adjacent frequency bins relevant to a response wf Haf,d(f=1, . . . , , F, d=1, . . . , D) of the beamformer in the frequency bin f for an angular direction θd,
θd(d=1, . . . , D) being the angular direction in which a sound source d exists,
af,d(f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave; and
an output unit that sets the optimum value w* to the frequency bin of the beamformer and emphasizes at least a part of the sound wave as the target sound.
10. A filter coefficient optimization method including an optimization step in which a filter coefficient optimization apparatus calculates an optimum value w* of a filter coefficient w={w1, . . . , wF} (wf(f=1, F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D),
D being an integer equal to or more than 1,
the optimization step being a step of calculating the optimum value w* based on an optimization problem minwL(w) relevant to the filter coefficient w, under a predetermined constraint condition,
L(w) being a cost function relevant to the filter coefficient w and being defined using a sum of a sum Σf=1 FlMV_f(wf) of a cost function LMV_f(wf) and a predetermined regularization term,
LMV_f(wf)=wf HRfwf(f=1, . . . , F) being a cost function relevant to the filter coefficient wf,
Rf(f=1, . . . , F) being a spatial correlation matrix for sound other than the target sound relevant to the frequency bin f,
the predetermined regularization term being defined using a difference in phase between adjacent frequency bins relevant to a response wf Haf,d(f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for an angular direction θd,
θd(d=1, . . . , D) being the angular direction in which a sound source d exists, and
af,d(f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave; and
an output step that sets the optimum value w* to the frequency bin of the beamformer and emphasizes at least a part of the sound wave as the target sound.
11. A filter coefficient optimization method including an optimization step in which a filter coefficient optimization apparatus calculates an optimum value w* of a filter coefficient w={w1, . . . , wF} (wf(f=1, . . . , F, F is an integer equal to or more than 1) is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (hereinafter referred to as target sound) from D sound sources (hereinafter referred to as a sound source 1, . . . , a sound source D),
D being an integer equal to or more than 1,
the optimization step being a step of calculating the optimum value w* by performing derivation so as to reduce a difference in phase between adjacent frequency bins relevant to a response wf Haf,d(f=1, . . . , F, d=1, . . . , D) of the beamformer in the frequency bin f for an angular direction θd,
θd(d=1, . . . ,D) being the angular direction in which a sound source d exists, and
af,d(f=1, . . . , F, d=1, . . . , D) being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from the angular direction θd, the sound wave being a plane wave; and
an output step that sets the optimum value w* to the frequency bin of the beamformer and emphasizes at least a part of the sound wave as the target sound.
12. A non-transitory computer-readable recording medium storing a program that causes a computer to function as the filter coefficient optimization apparatus according to claim 1.
13. A non-transitory computer-readable recording medium storing a program that causes a computer to function as the filter coefficient optimization apparatus according to claim 9.
US17/801,754 2020-02-28 2020-02-28 Filter coefficient optimization apparatus, filter coefficient optimization method, and program Active 2040-07-17 US12120490B2 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/008233 WO2021171533A1 (en) 2020-02-28 2020-02-28 Filter coefficient optimization device, filter coefficient optimization method, and program

Publications (2)

Publication Number Publication Date
US20230088204A1 US20230088204A1 (en) 2023-03-23
US12120490B2 true US12120490B2 (en) 2024-10-15

Family

ID=77491194

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/801,754 Active 2040-07-17 US12120490B2 (en) 2020-02-28 2020-02-28 Filter coefficient optimization apparatus, filter coefficient optimization method, and program

Country Status (3)

Country Link
US (1) US12120490B2 (en)
JP (1) JP7375905B2 (en)
WO (1) WO2021171533A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9456276B1 (en) * 2014-09-30 2016-09-27 Amazon Technologies, Inc. Parameter selection for audio beamforming
US11881206B2 (en) * 2019-08-06 2024-01-23 Insoundz Ltd. System and method for generating audio featuring spatial representations of sound sources
US11908487B2 (en) * 2020-09-16 2024-02-20 Kabushiki Kaisha Toshiba Signal processing apparatus and non-transitory computer readable medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9668066B1 (en) * 2015-04-03 2017-05-30 Cedar Audio Ltd. Blind source separation systems
JP2018107697A (en) 2016-12-27 2018-07-05 キヤノン株式会社 Signal processing apparatus, signal processing method, and program
EP3698160B1 (en) 2017-10-26 2023-03-15 Huawei Technologies Co., Ltd. Device and method for estimating direction of arrival of sound from a plurality of sound sources

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9456276B1 (en) * 2014-09-30 2016-09-27 Amazon Technologies, Inc. Parameter selection for audio beamforming
US11881206B2 (en) * 2019-08-06 2024-01-23 Insoundz Ltd. System and method for generating audio featuring spatial representations of sound sources
US11908487B2 (en) * 2020-09-16 2024-02-20 Kabushiki Kaisha Toshiba Signal processing apparatus and non-transitory computer readable medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Futoshi Asano (2011) "Acoustic Technology Series 16, Array signal processing for acoustics: localization, tracking and separation of sound sources, edited by The Acoustical Society of Japan", Corona Publishing Co., Ltd., pp. 86-90.
J. Capon (1969) "High-resolution frequency-wavenumber spectrum analysis", Proceedings of the IEEE, vol. 57, No. 8, pp. 1408-1418.

Also Published As

Publication number Publication date
WO2021171533A1 (en) 2021-09-02
US20230088204A1 (en) 2023-03-23
JP7375905B2 (en) 2023-11-08
JPWO2021171533A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
Leclere et al. A unified formalism for acoustic imaging based on microphone array measurements
US7478041B2 (en) Speech recognition apparatus, speech recognition apparatus and program thereof
US20190208320A1 (en) Sound source separation device, and method and program
EP2530484A1 (en) Sound source localization apparatus and method
US20130308790A1 (en) Methods and systems for doppler recognition aided method (dream) for source localization and separation
US11450309B2 (en) Information processing method and system, computer system and computer readable medium
JP7207539B2 (en) LEARNING DATA EXTENSION DEVICE, LEARNING DATA EXTENSION METHOD, AND PROGRAM
Gaubitch et al. Statistical analysis of the autoregressive modeling of reverberant speech
JP2019078864A (en) Musical sound emphasis device, convolution auto encoder learning device, musical sound emphasis method, and program
US20250285636A1 (en) Spatio-temporal beamformer
US12212939B2 (en) Target sound signal generation apparatus, target sound signal generation method, and program
Liang et al. Sound field reconstruction using neural processes with dynamic kernels
JP6815956B2 (en) Filter coefficient calculator, its method, and program
CN111755021B (en) Speech enhancement method and device based on binary microphone array
US20230083284A1 (en) Filter coefficient optimization apparatus, latent variable optimization apparatus, filter coefficient optimization method, latent variable optimization method, and program
US12120490B2 (en) Filter coefficient optimization apparatus, filter coefficient optimization method, and program
US20200302917A1 (en) Method and apparatus for data augmentation using non-negative matrix factorization
US6615143B2 (en) Method and apparatus for reconstructing and acoustic field
Ma et al. Differential Volterra filter: A two-stage decoupling method for audible sounds generated by parametric array loudspeakers based on Westervelt equation
JP6087856B2 (en) Sound field recording and reproducing apparatus, system, method and program
Sun et al. Secondary channel estimation in spatial active noise control systems using a single moving higher order microphone
Li et al. Adaptive physics-informed neural networks for underwater acoustic field prediction
Lobato et al. Using learned priors to regularize the helmholtz equation least-squares method
US20220141584A1 (en) Latent variable optimization apparatus, filter coefficient optimization apparatus, latent variable optimization method, filter coefficient optimization method, and program
Ariizumi et al. Sound source distance measurement using complex sparse Bayesian estimation with a small microphone array system

Legal Events

Date Code Title Description
AS Assignment

Owner name: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SATO, RYOTARO;NIWA, KENTA;SIGNING DATES FROM 20210215 TO 20220306;REEL/FRAME:060874/0799

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction