CN112799017B - Sound source positioning method, sound source positioning device, storage medium and electronic equipment - Google Patents

Sound source positioning method, sound source positioning device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112799017B
CN112799017B CN202110369681.9A CN202110369681A CN112799017B CN 112799017 B CN112799017 B CN 112799017B CN 202110369681 A CN202110369681 A CN 202110369681A CN 112799017 B CN112799017 B CN 112799017B
Authority
CN
China
Prior art keywords
matrix
target
angle
preset angle
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110369681.9A
Other languages
Chinese (zh)
Other versions
CN112799017A (en
Inventor
王克彦
俞鸣园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huachuang Video Signal Technology Co Ltd
Original Assignee
Zhejiang Huachuang Video Signal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huachuang Video Signal Technology Co Ltd filed Critical Zhejiang Huachuang Video Signal Technology Co Ltd
Priority to CN202110369681.9A priority Critical patent/CN112799017B/en
Publication of CN112799017A publication Critical patent/CN112799017A/en
Application granted granted Critical
Publication of CN112799017B publication Critical patent/CN112799017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Abstract

The invention discloses a sound source positioning method, a sound source positioning device, a storage medium and electronic equipment, wherein the positioning method comprises the following steps: determining that the microphone array is atkTarget frequency domain information of a target audio signal received by a time frame; according to the target frequency domain information, determiningDetermining an inverse of a target covariance matrix of the target audio signal; determining a spatial spectrum corresponding to a preset angle according to an inverse matrix of a target covariance matrix and a time delay matrix corresponding to the preset angle aiming at each preset angle in a plurality of preset angles, wherein the inverse matrix of the target covariance matrix is a self-conjugate matrix, the time delay matrix corresponding to the preset angle is obtained by reconstructing a guide vector corresponding to the preset angle, and the time delay matrix corresponding to the preset angle is a self-conjugate matrix and is a Toeplitz matrix; and determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and positioning a sound source corresponding to the target audio signal according to the target angle. Thus, the complexity of sound source localization is reduced.

Description

Sound source positioning method, sound source positioning device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to a sound source positioning method and apparatus, a storage medium, and an electronic device.
Background
Direction of Arrival (DOA) refers to the Direction of Arrival of a spatial signal, and at present, a microphone array-based Direction of Arrival estimation technology is widely applied in the fields of audio and video conferences, monitoring and recognition, multimedia systems, intelligent sound boxes and the like, and is an important subject in the field of human-computer interaction. In an actual application scene, the subsequent voice enhancement effect and human-computer interaction experience effect can be influenced by the DOA of the sound source calculated by the microphone array, so that certain requirements are provided for the accuracy of the DOA calculated by the microphone array. The DOA estimation technology is affected by various factors such as voice broadband characteristics, microphone performance, environmental noise and room reverberation, so that the positioning accuracy is reduced, and therefore the DOA technology of the microphone array is required to have certain robustness. Meanwhile, in consideration of the problems of computing resources, efficiency and the like of different hardware platforms in the actual deployment process, the computing complexity of the DOA is reduced as much as possible, and the power consumption occupied by the DOA is reduced.
Disclosure of Invention
The present disclosure provides a sound source localization method, a sound source localization apparatus, a storage medium, and an electronic device, which can reduce the complexity of sound source localization and improve the efficiency of sound source localization.
In order to achieve the above object, in a first aspect, the present disclosure provides a sound source localization method, the method comprising:
determining that the microphone array is atkTargeting of a target audio signal received by a time frameFrequency domain information, wherein the microphone array is composed of a plurality of microphones arranged according to a preset spatial topology,kis a positive integer;
determining an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information;
determining a spatial spectrum corresponding to a preset angle according to an inverse matrix of a target covariance matrix and a time delay matrix corresponding to the preset angle for each preset angle in a plurality of preset angles, wherein elements in the time delay matrix corresponding to the preset angle comprise frequency domain information of relative time delay between audio signals received by every two microphones in a microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the time delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the time delay matrix corresponding to the preset angle is a self-conjugate matrix and is a Toeplitz matrix;
and determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and positioning a sound source corresponding to the target audio signal according to the target angle.
Optionally, the determining the microphone array is in the second placekTarget frequency domain information of a target audio signal received by a time frame, comprising:
determining, for each microphone in the array of microphones, that the microphone is at a secondkTime domain information of the audio signal received by the time frame;
according to each microphone being at the secondkTime domain information of the audio signal received by the time frame determines the target frequency domain information.
Alternatively,kis a positive integer greater than or equal to 2; the determining an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information includes:
determining an initial covariance matrix of the target audio signal according to the target frequency domain information;
reconstructing the initial covariance matrix by adopting a diagonal loading technology to determine the target covariance matrix;
according to Sherman-Morrison-Woodbury algorithm, the target frequency domain information and the first frequency domain informationkForgetting factor corresponding to time frame and microphone array in the second placek-1 inverse of a covariance matrix of the audio signal received at the time frame, determining an inverse of the target covariance matrix, whereinkThe forgetting factor corresponding to the time frame characterizes that the position of the sound source corresponding to the target audio signal is in the second position relative to the microphone arrayk-1 variation of the position of the sound source corresponding to the audio signal received in the time frame.
Optionally, the target frequency domain information, the second frequency domain information, according to Sherman-Morrison-Woodbury algorithmkForgetting factor corresponding to time frame and microphone array in the second placek-1 inverse of a covariance matrix of the audio signal received at the time frame, the inverse of the target covariance matrix being determined, comprising:
determining an inverse of the target covariance matrix by:
Figure 643012DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 167534DEST_PATH_IMAGE002
a target covariance matrix is represented as a function of,
Figure 453022DEST_PATH_IMAGE003
an inverse matrix representing the target covariance matrix,ASMWthe Adaptive Sherman-Morrison-Woodbury algorithm is shown, is an Adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor,fthe point of the frequency is represented by,
Figure 721192DEST_PATH_IMAGE004
is shown askThe forgetting factor corresponding to the time frame,
Figure 775736DEST_PATH_IMAGE005
representing microphone arrays ink1 inverse of a covariance matrix of the audio signal received at the time frame,Xkf) Which represents the target frequency-domain information,Hrepresenting the conjugate transpose of the matrix.
Alternatively,kis a positive integer greater than or equal to 3; is determined by the following formulakForgetting factor corresponding to time frame:
Figure 736739DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 509522DEST_PATH_IMAGE007
is shown askThe forgetting factor corresponding to the time frame,
Figure 315804DEST_PATH_IMAGE008
is shown ask-1 a forgetting factor for the time frame,μin order to preset the iteration step size,
Figure 756013DEST_PATH_IMAGE009
is the absolute value of the difference between the first angle and the second angle, wherein the first angle is the microphone array and the microphone array at the second anglek-1 relative angle between sound sources of audio signals received by a time frame, the second angle being the microphone array and the microphone array at the first anglek-2 relative angles between sound sources of the audio signals received in the time frames.
Optionally, the time delay matrix corresponding to the preset angle is obtained according to an inner product of a steering vector corresponding to the preset angle and a conjugate transpose of the steering vector.
Optionally, the determining, according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle, a spatial spectrum corresponding to the preset angle includes:
obtaining a first vector according to the inverse matrix of the target covariance matrix, wherein elements in the first vector comprise the sum of elements on a main diagonal of the inverse matrix of the target covariance matrix and the sum of elements on a diagonal parallel to the main diagonal in an upper triangular matrix of the inverse matrix of the target covariance matrix;
extracting a first row of a time delay matrix corresponding to the preset angle to obtain a second vector;
and determining the space spectrum corresponding to the preset angle according to the inner product of the first vector and the second vector.
Optionally, the determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle includes:
determining a spatial spectrum maximum value in a spatial spectrum matrix corresponding to each preset angle;
and taking the preset angle corresponding to the maximum value of the spatial spectrum as the target angle.
In a second aspect, the present disclosure provides a sound source localization apparatus, the apparatus comprising:
a frequency domain information determination module for determining that the microphone array is at the secondkTarget frequency domain information of a target audio signal received by a time frame, wherein a microphone array is composed of a plurality of microphones arranged according to a preset spatial topology,kis a positive integer;
an inverse matrix determining module, configured to determine an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information;
a spatial spectrum determining module, configured to determine, for each preset angle in a plurality of preset angles, a spatial spectrum corresponding to the preset angle according to an inverse matrix of the target covariance matrix and a delay matrix corresponding to the preset angle, where an element in the delay matrix corresponding to the preset angle includes frequency domain information of relative delay between audio signals received by each two microphones in the microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the delay matrix corresponding to the preset angle is a self-conjugate matrix and is a toeplitz matrix;
and the positioning module is used for determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle and positioning a sound source corresponding to the target audio signal according to the target angle.
Optionally, the frequency domain information determining module includes:
a first determining sub-module for determining, for each microphone in the microphone array, that the microphone is at a second locationkTime domain information of the audio signal received by the time frame;
a second determining submodule for determining whether each microphone is at the second positionkTime domain information of the audio signal received by the time frame determines the target frequency domain information.
Alternatively,kis a positive integer greater than or equal to 2; the inverse matrix determination module includes:
a third determining submodule, configured to determine an initial covariance matrix of the target audio signal according to the target frequency domain information;
the reconstruction submodule is used for reconstructing the initial covariance matrix by adopting a diagonal loading technology and determining the target covariance matrix;
a fourth determining submodule for determining the target frequency domain information according to Sherman-Morrison-Woodbury algorithm, and the second determining submodulekForgetting factor corresponding to time frame and microphone array in the second placek-1 inverse of a covariance matrix of the audio signal received at the time frame, determining an inverse of the target covariance matrix, whereinkThe forgetting factor corresponding to the time frame characterizes that the position of the sound source corresponding to the target audio signal is in the second position relative to the microphone arrayk-1 variation of the position of the sound source corresponding to the audio signal received in the time frame.
Optionally, the fourth determining submodule is configured to determine an inverse matrix of the target covariance matrix by:
Figure 622338DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 882418DEST_PATH_IMAGE011
a target covariance matrix is represented as a function of,
Figure 492391DEST_PATH_IMAGE012
an inverse matrix representing the target covariance matrix,ASMWthe Adaptive Sherman-Morrison-Woodbury algorithm is shown, is an Adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor,fthe point of the frequency is represented by,
Figure 521527DEST_PATH_IMAGE013
is shown askThe forgetting factor corresponding to the time frame,
Figure 824332DEST_PATH_IMAGE014
representing microphone arrays ink1 inverse of a covariance matrix of the audio signal received at the time frame,Xkf) Which represents the target frequency-domain information,Hrepresenting the conjugate transpose of the matrix.
Alternatively,kis a positive integer greater than or equal to 3; is determined by the following formulakForgetting factor corresponding to time frame:
Figure 306129DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 719793DEST_PATH_IMAGE016
is shown askThe forgetting factor corresponding to the time frame,
Figure 606365DEST_PATH_IMAGE017
is shown ask-1 a forgetting factor for the time frame,μin order to preset the iteration step size,
Figure 814492DEST_PATH_IMAGE018
is the absolute value of the difference between the first angle and the second angle, wherein the first angle is the microphone array and the microphone array at the second anglek-1 time frame receptionRelative angle between sound sources of the incoming audio signals, the second angle being the microphone array and the microphone array at the first anglek-2 relative angles between sound sources of the audio signals received in the time frames.
Optionally, the time delay matrix corresponding to the preset angle is obtained according to an inner product of a steering vector corresponding to the preset angle and a conjugate transpose of the steering vector.
Optionally, the spatial spectrum determination module comprises:
the vector determination submodule is used for obtaining a first vector according to the inverse matrix of the target covariance matrix, wherein elements in the first vector comprise the sum of elements on a main diagonal of the inverse matrix of the target covariance matrix and the sum of elements on a diagonal parallel to the main diagonal in an upper triangular matrix of the inverse matrix of the target covariance matrix;
the extraction submodule is used for extracting a first row of the time delay matrix corresponding to the preset angle to obtain a second vector;
and the spatial spectrum determining submodule is used for determining the spatial spectrum corresponding to the preset angle according to the inner product of the first vector and the second vector.
Optionally, the positioning module includes:
the fifth determining submodule is used for determining the maximum value of the spatial spectrum in the spatial spectrum matrix corresponding to each preset angle;
and the sixth determining submodule is used for taking the preset angle corresponding to the maximum value of the spatial spectrum as the target angle.
In a third aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method provided by the first aspect of the present disclosure.
In a fourth aspect, the present disclosure provides an electronic device comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method provided by the first aspect of the present disclosure.
According to the technical scheme, the spatial spectrum corresponding to the preset angle is determined according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle, the target angle is determined from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and therefore the sound source corresponding to the target audio signal is positioned according to the target angle. The inverse matrix of the target covariance matrix is a self-conjugate matrix, the time delay matrix corresponding to the preset angle is obtained by reconstructing a guide vector corresponding to the preset angle, the time delay matrix corresponding to the preset angle is a self-conjugate matrix and a Toeplitz matrix, and according to the properties of the self-conjugate matrix and the Toeplitz matrix, the calculation complexity is further simplified, the complexity of calculating the spatial spectrum corresponding to each preset angle is reduced, so that the complexity of sound source positioning can be reduced, and the power consumption of the microphone array for sound source positioning is reduced. In addition, the efficiency of sound source positioning is improved, and the voice enhancement effect and the man-machine interaction experience effect of the microphone array can be improved in interactive scenes such as voice conferences and video conferences.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flow chart illustrating a sound source localization method according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a method of determining an inverse of a target covariance matrix of a target audio signal from target frequency domain information according to an example embodiment.
FIG. 3 is a block diagram illustrating a sound source localization arrangement according to an exemplary embodiment.
FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
In a related sound source positioning algorithm, the traditional beam forming positioning algorithm has the defects of main lobe width and insufficient positioning precision; the calculation amount of the time delay estimation positioning algorithm is low, which is beneficial to the quick realization of the algorithm, but the time delay estimation algorithm has the defects of insignificant peak value of the correlation peak under low signal-to-noise ratio, poor algorithm robustness, limitation of factors such as signal sampling rate and microphone array type, and lower positioning precision; the super-resolution estimation can be realized by constructing orthogonal signal subspace and noise subspace through eigenvalue decomposition by using a high-resolution spectrum estimation technology such as a MUSIC algorithm, but the MUSIC algorithm needs certain priori knowledge on signals and noise, and meanwhile, the calculation complexity is higher; the MVDR (Minimum Variance Distortionless response) algorithm obtains a DOA estimation result of a sound source by constructing a covariance matrix of an array frequency domain signal and calculating an inverse matrix, multiplying a guide vector to obtain a spatial power spectrum, and searching a spectrum peak for the spatial spectrum, so that the positioning accuracy of the algorithm is maintained, and meanwhile, the method has good robustness, but the MVDR algorithm also has higher complexity, and the calculation amount is mainly concentrated at the positions of covariance matrix inversion, power spectrum calculation and spectrum peak search. Therefore, in the related art, when sound source localization is performed, the calculation complexity is high, and the efficiency of sound source localization is low.
In view of this, the present disclosure provides a sound source positioning method, a sound source positioning device, a storage medium, and an electronic device, which can reduce the complexity of sound source positioning and improve the efficiency of sound source positioning.
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Fig. 1 is a flowchart illustrating a sound source localization method according to an exemplary embodiment, which may be applied to an electronic device having a processing capability, and as shown in fig. 1, may include S101 to S104.
In S101, it is determined that the microphone array is at the firstkTarget frequency domain information of a target audio signal received by a time frame.
Wherein the microphone array is formed by a plurality of microphones arranged according to a preset spatial topology structureThe preset spatial topological structure can be formed by that the distance between any two adjacent microphones is equal.kPositive integer, each microphone of the microphone array can receive audio signal, the microphone array is at the secondkThe target audio signal received by the time frame may mean that each microphone is at the second positionkThe time frames each receive a combined signal of the audio signals.
Optionally, an exemplary implementation of step S101 may be: determining, for each microphone in the array of microphones, that the microphone is at a secondkTime domain information of the audio signal received by the time frame; according to each microphone being at the secondkTime domain information of the audio signal received by the time frame determines the target frequency domain information.
The audio signal received by each microphone may be preprocessed, for example, by performing VAD (Voice Activity Detection) on the audio signal, performing framing processing and windowing processing on the audio signal, and performing fourier transform on time domain information of the audio signal to obtain frequency domain information of the audio signal. The m-th microphone in the microphone array iskThe frequency domain information of the audio signal received by the time frame may be determined by the following expression (1):
X m kf)= h m kfSkf)+W m kf) (1)
wherein the content of the first and second substances,fthe point of the frequency is represented by,X m kf) Indicating that the m-th microphone is atkFrequency domain information of the audio signal received by the time frame,Skf) Is shown askFrequency domain information of the time-frame original sound source signal,W m kf) Is shown askThe noise of the time frame is such that,h m kf) Is shown inkThe propagation path function of the time frame sound source to the m-th microphone,h m kf) Can be determined by the following expression (2):
Figure 783585DEST_PATH_IMAGE019
(2)
wherein the content of the first and second substances,jthe number of the imaginary numbers is represented,d m representing the separation of the m-th microphone from the microphone array origin,cwhich is indicative of the speed of sound,αrepresenting the target angle.
The microphone array is arranged atkThe target frequency domain information of the target audio signal received by the time frame may be determined by the following expression (3):
Figure 735361DEST_PATH_IMAGE020
(3)
wherein the content of the first and second substances,Xkf) Representing microphone arrays inkTarget frequency domain information of a target audio signal received by a time frame, M representing the number of microphones comprised in the microphone array,Trepresenting a matrix transposition.
In S102, an inverse matrix of a target covariance matrix of the target audio signal is determined according to the target frequency domain information.
Fig. 2 is a flowchart illustrating a method of determining an inverse matrix of a target covariance matrix of a target audio signal according to target frequency domain information according to an exemplary embodiment, and as shown in fig. 2, S102 may include S1021 to S1023.
In S1021, an initial covariance matrix of the target audio signal is determined according to the target frequency domain information.
For example, the initial covariance matrix of the target audio signal may be determined by the following equation (4):
Figure 535827DEST_PATH_IMAGE021
(4)
wherein the content of the first and second substances,Rkf) An initial covariance matrix representing the target audio signal,Nto representThe number of preset time frames within a preset time duration,nis shown asnThe number of time frames is,Xk-nf) Representing microphone arrays ink-nFrequency domain information of the audio signal received by the time frame,Eit is shown that it is desirable to,Hrepresenting the conjugate transpose of the matrix.
In S1022, the initial covariance matrix is reconstructed by using a diagonal loading technique, and a target covariance matrix is determined.
Reconstructing an initial covariance matrix using diagonal loading techniquesRkf) The resulting target covariance matrix can be expressed by the following expression (5):
Figure 180435DEST_PATH_IMAGE022
(5)
wherein the content of the first and second substances,
Figure 636824DEST_PATH_IMAGE023
representing a target covariance matrix, lambda represents a preset diagonal loading factor,Irepresenting an identity matrix.
In S1023, according to Sherman-Morrison-Woodbury algorithm, target frequency domain information, the secondkForgetting factor corresponding to time frame and microphone array on the secondk-1 inverse of the covariance matrix of the audio signal received at the time frame, determining the inverse of the target covariance matrix.kIs a positive integer greater than or equal to 2.
In the disclosure, considering that in practical situations, the position of a sound source relative to a microphone array is not fixed and is possibly changed in relative orientation, an Adaptive forgetting factor is introduced on the basis of an SMW (Sherman-Morrison-Woodbury) algorithm, and an ASMW (Adaptive Sherman-Morrison-Woodbury) generalized algorithm is proposed to be used for calculating an inverse covariance matrix,ASMWthe algorithm is an adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor, wherein the specific calculation mode of the SMW algorithm can refer to the related technology. Wherein, the firstkThe forgetting factor corresponding to the time frame characterizes the position of the sound source corresponding to the target audio signal relative to saidThe microphone array is arranged atk-1 variation of the position of the sound source corresponding to the audio signal received in the time frame.
Illustratively, the inverse of the target covariance matrix is determined by the following equation (6):
Figure 392290DEST_PATH_IMAGE024
(6)
wherein the content of the first and second substances,
Figure 984945DEST_PATH_IMAGE002
a target covariance matrix is represented as a function of,
Figure 800455DEST_PATH_IMAGE025
an inverse matrix representing the target covariance matrix,ASMWthe Adaptive Sherman-Morrison-Woodbury algorithm is shown, and is an Adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor, the inverse Matrix of the target covariance Matrix is an autoconjugate Matrix, which is also called Hermitian Matrix,fthe point of the frequency is represented by,
Figure 478561DEST_PATH_IMAGE013
is shown askThe forgetting factor corresponding to the time frame,
Figure 303297DEST_PATH_IMAGE026
representing microphone arrays ink1 inverse of a covariance matrix of the audio signal received at the time frame,Xkf) Which represents the target frequency-domain information,Hrepresenting the conjugate transpose of the matrix.
Illustratively, the first determination can be made by the following formula (7)kForgetting factor corresponding to time frame:
Figure 750459DEST_PATH_IMAGE027
(7)
wherein the content of the first and second substances,
Figure 471290DEST_PATH_IMAGE004
is shown askThe forgetting factor corresponding to the time frame,
Figure 902272DEST_PATH_IMAGE028
is shown ask-1 a forgetting factor for the time frame,μin order to preset the iteration step size,
Figure 2470DEST_PATH_IMAGE029
is the absolute value of the difference between the first angle and the second angle.kIs a positive integer greater than or equal to 3, the first angle is the microphone array and the microphone array is at the second anglek-1 relative angle between sound sources of audio signals received by a time frame, the second angle being the microphone array and the microphone array at the first anglek-2 relative angles between sound sources of the audio signals received in the time frames. Also, in the case where the azimuth of the sound source with respect to the microphone array is not changed, i.e. the microphone array
Figure 835297DEST_PATH_IMAGE030
Figure 727030DEST_PATH_IMAGE031
In this way, considering that in an actual situation, the position of the sound source relative to the microphone array is not fixed and is likely to change in relative orientation, an adaptive forgetting factor is introduced, an inverse matrix of the target covariance matrix is calculated by the ASMW algorithm, and robustness of inverse matrix calculation of the covariance matrix of the sound source is improved under the condition that the sound source moves.
In S103, for each preset angle in the plurality of preset angles, a spatial spectrum corresponding to the preset angle is determined according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle.
The elements in the delay matrix corresponding to the preset angle comprise frequency domain information of relative delay between audio signals received by every two microphones in the microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the delay matrix corresponding to the preset angle is a self-conjugate matrix and a Toeplitz matrix.
Wherein the preset angle may be first determined by the following expression (8)θCorresponding steering vector:
Figure 645307DEST_PATH_IMAGE032
(8)
wherein the content of the first and second substances,af,θ) Indicating a preset angleθThe corresponding steering vector is set to the direction of the steering vector,Mrepresenting the number of microphones included in the microphone array.
The method for reconstructing the steering vector corresponding to the preset angle to obtain the time delay matrix corresponding to the preset angle may be that the time delay matrix corresponding to the preset angle is obtained according to an inner product of the steering vector corresponding to the preset angle and a conjugate transpose of the steering vector.
Illustratively, the preset angle is calculated using the following expression (9)θCorresponding spatial spectrum:
Figure 546267DEST_PATH_IMAGE033
(9)
wherein the content of the first and second substances,P ASMW-MVDR (fθ) Indicating a preset angleθThe corresponding spatial spectrum of the spectrum,dotthe inner product operation is represented by the following operation,Af,θ) Indicating a preset angleθCorresponding time delay matrix, accessible to preset angleθCorresponding guide vectoraf,θ) The reconstruction is carried out to obtain the compound,Af,θ) Can be determined by the following expression (10):
Figure 968021DEST_PATH_IMAGE035
(10)
wherein the content of the first and second substances,Af,θ) Is a self-conjugate matrix of M x M,d mn representing the relative distance between the mth microphone and the nth microphone,d M1andd M1all represent the relative distance between the 1 st microphone and the Mth microphone, and for the microphone array which is arranged according to the preset spatial topology structure, the distance between any two adjacent microphones is equal, namely the elements on each diagonal line which is parallel to the main diagonal line are equal, so that the elements on each diagonal line are equalAf,θ) And also an M × M Toeplitz Matrix (Toeplitz Matrix), so the delay Matrix is a self-conjugate Matrix and a Toeplitz Matrix. To be provided withAf,θ) Element (1) of
Figure 296234DEST_PATH_IMAGE036
For example, the element characterizes if the relative angle between the sound source corresponding to the target audio signal and the microphone array is the predetermined angleθFrequency domain information of the relative time delay between the audio signals received by the 1 st microphone and the mth microphone in the microphone array, respectively.
Determining the spatial spectrum corresponding to the preset angle according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle may include: obtaining a first vector according to the inverse matrix of the target covariance matrix, wherein elements in the first vector comprise the sum of elements on a main diagonal of the inverse matrix of the target covariance matrix and the sum of elements on a diagonal parallel to the main diagonal in an upper triangular matrix of the inverse matrix of the target covariance matrix; extracting a first row of a time delay matrix corresponding to the preset angle to obtain a second vector; and determining the space spectrum corresponding to the preset angle according to the inner product of the first vector and the second vector.
In a matrix
Figure 436229DEST_PATH_IMAGE037
AndAf,θ) In the inner product operation of (3), the lower triangular matrix operation result S1 and the upper triangular matrix operation result S2 satisfy S1= S2*Therefore, if the inner product is calculated by multiplying and summing the corresponding points of each element, the lower triangular matrix and the upper triangular matrix produce repeated operations. Will matrix
Figure 937617DEST_PATH_IMAGE037
AndAf,θ) Is equivalent to the inner product of a first vector and a second vector, wherein,
Figure 479457DEST_PATH_IMAGE037
the elements on the middle main diagonal are added at the same time
Figure 572047DEST_PATH_IMAGE038
The elements on the diagonal line parallel to the main diagonal line in the upper triangular matrix are added to obtain a first vector
Figure 933758DEST_PATH_IMAGE039
Extracting matrixAf,θ) Second vector obtained from the first row ofA1(f,θ) The preset angle can be obtained according to the inner product of the first vector and the second vectorθA corresponding spatial spectrum in which, among other things,A1(f,θ) As shown in the following expression (11):
Figure 710609DEST_PATH_IMAGE040
(11)
equation (9) may be equivalent to the following expression (12):
Figure 106955DEST_PATH_IMAGE041
(12)
thus, in calculating the preset angleθFor corresponding spatial spectrum, only vector needs to be calculated
Figure 573708DEST_PATH_IMAGE039
And vectorA1(f,θ) The inner product of (2) can greatly reduce the calculation complexity, and can obtain an accurate result and improve the calculation efficiency. Compared with the traditional calculation mode, the multiplication operation number is equivalent to the traditional 1/(2M +1), and the addition operation number is equivalent to the traditional 25%.
In S104, a target angle is determined from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and a sound source corresponding to the target audio signal is positioned according to the target angle.
A plurality of preset angles may be preset. The embodiment of determining the target angle from the plurality of preset angles may be: determining a spatial spectrum maximum value in a spatial spectrum matrix corresponding to each preset angle; and taking the preset angle corresponding to the maximum value of the spatial spectrum as the target angle. For example, if the spatial spectrum corresponding to the preset angle of 20 degrees is the maximum spatial spectrum, 20 degrees may be used as the target angle, which is the calculated DOA result of the sound source.
According to the technical scheme, the spatial spectrum corresponding to the preset angle is determined according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle, the target angle is determined from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and therefore the sound source corresponding to the target audio signal is positioned according to the target angle. The inverse matrix of the target covariance matrix is a self-conjugate matrix, the time delay matrix corresponding to the preset angle is obtained by reconstructing a guide vector corresponding to the preset angle, the time delay matrix corresponding to the preset angle is a self-conjugate matrix and a Toeplitz matrix, and according to the properties of the self-conjugate matrix and the Toeplitz matrix, the calculation complexity is further simplified, the complexity of calculating the spatial spectrum corresponding to each preset angle is reduced, so that the complexity of sound source positioning can be reduced, and the power consumption of the microphone array for sound source positioning is reduced. In addition, the efficiency of sound source positioning is improved, and the voice enhancement effect and the man-machine interaction experience effect of the microphone array can be improved in interactive scenes such as voice conferences and video conferences.
Based on the same inventive concept, the present disclosure also provides a sound source localization apparatus, and fig. 3 is a block diagram of a sound source localization apparatus according to an exemplary embodiment, as shown in fig. 3, the apparatus 300 may include:
a frequency domain information determining module 301 for determining that the microphone array is at the second placekTarget frequency domain information of a target audio signal received by a time frame, wherein a microphone array is composed of a plurality of microphones arranged according to a preset spatial topology,kis a positive integer;
an inverse matrix determining module 302, configured to determine an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information;
a spatial spectrum determining module 303, configured to determine, for each preset angle in a plurality of preset angles, a spatial spectrum corresponding to the preset angle according to an inverse matrix of the target covariance matrix and a delay matrix corresponding to the preset angle, where an element in the delay matrix corresponding to the preset angle includes frequency domain information of relative delay between audio signals received by each two microphones in the microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the delay matrix corresponding to the preset angle is a self-conjugate matrix and is a toeplitz matrix;
a positioning module 304, configured to determine a target angle from the multiple preset angles according to the spatial spectrums corresponding to the preset angles, and position a sound source corresponding to the target audio signal according to the target angle.
Optionally, the frequency domain information determining module 301 includes:
a first determining sub-module for determining, for each microphone in the microphone array, that the microphone is at a second locationkTime domain information of the audio signal received by the time frame;
a second determining submodule for determining whether each microphone is at the second positionkTime domain information of the audio signal received by the time frame determines the target frequency domain information.
Alternatively,kis a positive integer greater than or equal to 2; the inverse matrix determination module 302 includes:
a third determining submodule, configured to determine an initial covariance matrix of the target audio signal according to the target frequency domain information;
the reconstruction submodule is used for reconstructing the initial covariance matrix by adopting a diagonal loading technology and determining the target covariance matrix;
a fourth determining submodule for determining the target frequency domain information according to Sherman-Morrison-Woodbury algorithm, and the second determining submodulekForgetting factor corresponding to time frame and microphone array in the second placek-1 inverse of a covariance matrix of the audio signal received at the time frame, determining an inverse of the target covariance matrix, whereinkThe forgetting factor corresponding to the time frame characterizes that the position of the sound source corresponding to the target audio signal is in the second position relative to the microphone arrayk-1 variation of the position of the sound source corresponding to the audio signal received in the time frame.
Optionally, the fourth determining submodule is configured to determine an inverse matrix of the target covariance matrix by:
Figure 422716DEST_PATH_IMAGE042
wherein the content of the first and second substances,
Figure 203590DEST_PATH_IMAGE002
a target covariance matrix is represented as a function of,
Figure 720022DEST_PATH_IMAGE043
an inverse matrix representing the target covariance matrix,ASMWthe Adaptive Sherman-Morrison-Woodbury algorithm is shown, is an Adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor,fthe point of the frequency is represented by,
Figure 560939DEST_PATH_IMAGE044
is shown askThe forgetting factor corresponding to the time frame,
Figure 162822DEST_PATH_IMAGE014
representing microphone arrays ink1 inverse of a covariance matrix of the audio signal received at the time frame,Xkf) Which represents the target frequency-domain information,Hrepresenting the conjugate transpose of the matrix.
Alternatively,kis a positive integer greater than or equal to 3; is determined by the following formulakForgetting factor corresponding to time frame:
Figure 747387DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 118325DEST_PATH_IMAGE004
is shown askThe forgetting factor corresponding to the time frame,
Figure 130144DEST_PATH_IMAGE045
is shown ask-1 a forgetting factor for the time frame,μin order to preset the iteration step size,
Figure 953743DEST_PATH_IMAGE018
is the absolute value of the difference between the first angle and the second angle, wherein the first angle is the microphone array and the microphone array at the second anglek-1 relative angle between sound sources of audio signals received by a time frame, the second angle being the microphone array and the microphone array at the first anglek-2 relative angles between sound sources of the audio signals received in the time frames.
Optionally, the time delay matrix corresponding to the preset angle is obtained according to an inner product of a steering vector corresponding to the preset angle and a conjugate transpose of the steering vector.
Optionally, the spatial spectrum determination module 303 includes:
the vector determination submodule is used for obtaining a first vector according to the inverse matrix of the target covariance matrix, wherein elements in the first vector comprise the sum of elements on a main diagonal of the inverse matrix of the target covariance matrix and the sum of elements on a diagonal parallel to the main diagonal in an upper triangular matrix of the inverse matrix of the target covariance matrix;
the extraction submodule is used for extracting a first row of the time delay matrix corresponding to the preset angle to obtain a second vector;
and the spatial spectrum determining submodule is used for determining the spatial spectrum corresponding to the preset angle according to the inner product of the first vector and the second vector.
Optionally, the positioning module 304 includes:
the fifth determining submodule is used for determining the maximum value of the spatial spectrum in the spatial spectrum matrix corresponding to each preset angle;
and the sixth determining submodule is used for taking the preset angle corresponding to the maximum value of the spatial spectrum as the target angle.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 4 is a block diagram illustrating an electronic device 700 according to an example embodiment. As shown in fig. 4, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.
The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the sound source localization method. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the sound source localization method described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the sound source localization method described above is also provided. For example, the computer readable storage medium may be the above-mentioned memory 702 comprising program instructions executable by the processor 701 of the electronic device 700 to perform the above-mentioned sound source localization method.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (11)

1. A sound source localization method, characterized in that the method comprises:
determining that the microphone array is atkTarget frequency domain information of a target audio signal received by a time frame, wherein a microphone array is composed of a plurality of microphones arranged according to a preset spatial topology,kis a positive integer;
determining an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information;
determining a spatial spectrum corresponding to a preset angle according to an inverse matrix of a target covariance matrix and a time delay matrix corresponding to the preset angle for each preset angle in a plurality of preset angles, wherein elements in the time delay matrix corresponding to the preset angle comprise frequency domain information of relative time delay between audio signals received by every two microphones in a microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the time delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the time delay matrix corresponding to the preset angle is a self-conjugate matrix and is a Toeplitz matrix;
and determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle, and positioning a sound source corresponding to the target audio signal according to the target angle.
2. The method of claim 1, wherein the determining the microphone array is at a first locationkTarget frequency domain information of a target audio signal received by a time frame, comprising:
determining, for each microphone in the array of microphones, that the microphone is at a secondkTime domain information of the audio signal received by the time frame;
according to each microphone being at the secondkTime domain information of the audio signal received by the time frame determines the target frequency domain information.
3. The method of claim 1,kis a positive integer greater than or equal to 2; the determining an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information includes:
determining an initial covariance matrix of the target audio signal according to the target frequency domain information;
reconstructing the initial covariance matrix by adopting a diagonal loading technology to determine the target covariance matrix;
according to Sherman-Morrison-Woodbury algorithm, the target frequency domain information and the first frequency domain informationkForgetting factor corresponding to time frame and microphone array in the second placek-1 inverse of a covariance matrix of the audio signal received at the time frame, determining an inverse of the target covariance matrix, whereinkThe forgetting factor corresponding to the time frame characterizes that the position of the sound source corresponding to the target audio signal is in the second position relative to the microphone arrayk-1 variation of the position of the sound source corresponding to the audio signal received in the time frame.
4. The method according to claim 3, wherein the target frequency domain information, the first time domain information, is based on Sherman-Morrison-Woodbury algorithmkForgetting factor corresponding to time frame and microphone array in the second placek1 time frame of the received audio signalAn inverse of a covariance matrix, the determining the inverse of the target covariance matrix comprising:
determining an inverse of the target covariance matrix by:
Figure 625080DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 850525DEST_PATH_IMAGE002
a target covariance matrix is represented as a function of,
Figure 33245DEST_PATH_IMAGE003
an inverse matrix representing the target covariance matrix,ASMWshows an Adaptive Sherman-Morrison-Woodbury algorithm, which is an Adaptive Sherman-Morrison-Woodbury algorithm based on a forgetting factor,fthe point of the frequency is represented by,
Figure 609720DEST_PATH_IMAGE004
is shown askThe forgetting factor corresponding to the time frame,
Figure 536087DEST_PATH_IMAGE005
representing microphone arrays ink1 inverse of a covariance matrix of the audio signal received at the time frame,Xkf) Which represents the target frequency-domain information,Hrepresenting the conjugate transpose of the matrix.
5. The method of claim 3,kis a positive integer greater than or equal to 3; is determined by the following formulakForgetting factor corresponding to time frame:
Figure 616039DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 238169DEST_PATH_IMAGE007
is shown askThe forgetting factor corresponding to the time frame,
Figure 36360DEST_PATH_IMAGE008
is shown ask-1 a forgetting factor for the time frame,μin order to preset the iteration step size,
Figure 766419DEST_PATH_IMAGE009
is the absolute value of the difference between the first angle and the second angle, wherein the first angle is the microphone array and the microphone array at the second anglek-1 relative angle between sound sources of audio signals received by a time frame, the second angle being the microphone array and the microphone array at the first anglek-2 relative angles between sound sources of the audio signals received in the time frames.
6. The method of claim 1, wherein the delay matrix corresponding to the predetermined angle is obtained according to an inner product of a steering vector corresponding to the predetermined angle and a conjugate transpose of the steering vector.
7. The method according to claim 1, wherein the determining the spatial spectrum corresponding to the preset angle according to the inverse matrix of the target covariance matrix and the time delay matrix corresponding to the preset angle comprises:
obtaining a first vector according to the inverse matrix of the target covariance matrix, wherein elements in the first vector comprise the sum of elements on a main diagonal of the inverse matrix of the target covariance matrix and the sum of elements on a diagonal parallel to the main diagonal in an upper triangular matrix of the inverse matrix of the target covariance matrix;
extracting a first row of a time delay matrix corresponding to the preset angle to obtain a second vector;
and determining the space spectrum corresponding to the preset angle according to the inner product of the first vector and the second vector.
8. The method according to claim 1, wherein the determining the target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle comprises:
determining a spatial spectrum maximum value in a spatial spectrum matrix corresponding to each preset angle;
and taking the preset angle corresponding to the maximum value of the spatial spectrum as the target angle.
9. A sound source localization apparatus, characterized in that the apparatus comprises:
a frequency domain information determination module for determining that the microphone array is at the secondkTarget frequency domain information of a target audio signal received by a time frame, wherein a microphone array is composed of a plurality of microphones arranged according to a preset spatial topology,kis a positive integer;
an inverse matrix determining module, configured to determine an inverse matrix of a target covariance matrix of the target audio signal according to the target frequency domain information;
a spatial spectrum determining module, configured to determine, for each preset angle in a plurality of preset angles, a spatial spectrum corresponding to the preset angle according to an inverse matrix of the target covariance matrix and a delay matrix corresponding to the preset angle, where an element in the delay matrix corresponding to the preset angle includes frequency domain information of relative delay between audio signals received by each two microphones in the microphone array, the inverse matrix of the target covariance matrix is a self-conjugate matrix, the delay matrix corresponding to the preset angle is obtained by reconstructing a steering vector corresponding to the preset angle, and the delay matrix corresponding to the preset angle is a self-conjugate matrix and is a toeplitz matrix;
and the positioning module is used for determining a target angle from the plurality of preset angles according to the spatial spectrum corresponding to each preset angle and positioning a sound source corresponding to the target audio signal according to the target angle.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
11. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 8.
CN202110369681.9A 2021-04-07 2021-04-07 Sound source positioning method, sound source positioning device, storage medium and electronic equipment Active CN112799017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110369681.9A CN112799017B (en) 2021-04-07 2021-04-07 Sound source positioning method, sound source positioning device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110369681.9A CN112799017B (en) 2021-04-07 2021-04-07 Sound source positioning method, sound source positioning device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112799017A CN112799017A (en) 2021-05-14
CN112799017B true CN112799017B (en) 2021-07-09

Family

ID=75816354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110369681.9A Active CN112799017B (en) 2021-04-07 2021-04-07 Sound source positioning method, sound source positioning device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112799017B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113687304A (en) * 2021-07-07 2021-11-23 浙江大华技术股份有限公司 Direct sound detection method, system and computer readable storage medium
CN113687305A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Method, device and equipment for positioning sound source azimuth and computer readable storage medium
CN113689869A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Speech enhancement method, electronic device, and computer-readable storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199029B (en) * 2014-09-03 2017-01-18 西安电子科技大学 Measurement matrix design method for improving target imaging performance of compressed sensing radar
CN107121669B (en) * 2016-02-25 2021-08-20 松下电器(美国)知识产权公司 Sound source detection device, sound source detection method, and non-transitory recording medium
CN110491403B (en) * 2018-11-30 2022-03-04 腾讯科技(深圳)有限公司 Audio signal processing method, device, medium and audio interaction equipment
CN109655783B (en) * 2018-12-26 2023-07-21 西安云脉智能技术有限公司 Method for estimating incoming wave direction of sensor array
CN109633538B (en) * 2019-01-22 2022-12-02 西安电子科技大学 Maximum likelihood time difference estimation method of non-uniform sampling system
CN110554357B (en) * 2019-09-12 2022-01-18 思必驰科技股份有限公司 Sound source positioning method and device
CN110501682B (en) * 2019-09-29 2021-07-27 北京润科通用技术有限公司 Method for measuring target azimuth angle by vehicle-mounted radar and vehicle-mounted radar

Also Published As

Publication number Publication date
CN112799017A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112799017B (en) Sound source positioning method, sound source positioning device, storage medium and electronic equipment
US10123113B2 (en) Selective audio source enhancement
Erdogan et al. Improved mvdr beamforming using single-channel mask prediction networks.
US11064294B1 (en) Multiple-source tracking and voice activity detections for planar microphone arrays
US9042573B2 (en) Processing signals
CN110610718B (en) Method and device for extracting expected sound source voice signal
KR20150115779A (en) Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field
Ono Fast stereo independent vector analysis and its implementation on mobile phone
Brendel et al. Distributed source localization in acoustic sensor networks using the coherent-to-diffuse power ratio
JP2013536477A (en) Apparatus and method for resolving ambiguity from direction of arrival estimates
WO2016119388A1 (en) Method and device for constructing focus covariance matrix on the basis of voice signal
Ikeshita et al. Blind signal dereverberation based on mixture of weighted prediction error models
Luo et al. Implicit filter-and-sum network for multi-channel speech separation
Pan et al. On the design of target beampatterns for differential microphone arrays
US11902757B2 (en) Techniques for unified acoustic echo suppression using a recurrent neural network
Belloch et al. Real-time sound source localization on an embedded GPU using a spherical microphone array
CN113223552B (en) Speech enhancement method, device, apparatus, storage medium, and program
Loesch et al. On the robustness of the multidimensional state coherence transform for solving the permutation problem of frequency-domain ICA
Čmejla et al. Independent vector analysis exploiting pre-learned banks of relative transfer functions for assumed target’s positions
Chen et al. Sound source DOA estimation and localization in noisy reverberant environments using least-squares support vector machines
Wang et al. Speech separation and extraction by combining superdirective beamforming and blind source separation
Hioka et al. Estimating power spectral density for spatial audio signal separation: An effective approach for practical applications
JP7270869B2 (en) Information processing device, output method, and output program
Wakabayashi et al. Sound field interpolation for rotation-invariant multichannel array signal processing
Wang et al. Low-latency real-time independent vector analysis using convolutive transfer function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant