CN111880148A - Sound source positioning method, device, equipment and storage medium - Google Patents

Sound source positioning method, device, equipment and storage medium Download PDF

Info

Publication number
CN111880148A
CN111880148A CN202010790574.9A CN202010790574A CN111880148A CN 111880148 A CN111880148 A CN 111880148A CN 202010790574 A CN202010790574 A CN 202010790574A CN 111880148 A CN111880148 A CN 111880148A
Authority
CN
China
Prior art keywords
sound source
directions
microphones
generalized cross
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010790574.9A
Other languages
Chinese (zh)
Inventor
王备
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010790574.9A priority Critical patent/CN111880148A/en
Publication of CN111880148A publication Critical patent/CN111880148A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/28Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves by co-ordinating position lines of different shape, e.g. hyperbolic, circular, elliptical or radial

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

The application relates to a sound source positioning method, a sound source positioning device, a sound source positioning equipment and a storage medium. The method comprises the following steps: the audio acquisition equipment acquires audio signals by utilizing an arranged microphone array, wherein the microphone array comprises a plurality of microphones which are respectively arranged in different directions of the audio acquisition equipment; the audio acquisition equipment determines a frequency domain signal of the audio signal; the audio acquisition equipment calculates target generalized cross-correlation values in multiple directions based on frequency information of the frequency domain signal corresponding to the microphone; the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions. Therefore, the sound source can be quickly positioned based on the generalized cross-correlation value, and quick positioning can be realized even under the condition of multiple sound sources, so that the problems that the prior art is slow in reaction speed and cannot support multiple sound source scenes are fundamentally solved.

Description

Sound source positioning method, device, equipment and storage medium
Technical Field
The present application relates to audio processing technologies, and in particular, to a sound source localization method, apparatus, device, and storage medium.
Background
At present, a microphone array used in a teleconference scene usually adopts an energy estimation method to perform audio transmission, that is, a microphone array beam forming technology is utilized to select a signal of a fixed beam with the maximum acquired energy as a target signal in a plurality of preset fixed beams in different directions to complete audio transmission. Although this method is simple to implement, it has the following disadvantages: firstly, sound source localization cannot be realized; secondly, the reaction speed is slow, and the phenomenon of word loss is easy to occur instantly when a new sound source occurs, because energy estimation needs certain accumulation time, instant reaction is not easy to realize; secondly, when multiple sound sources are simultaneously present, the relatively weak sound source is ignored.
Disclosure of Invention
In order to solve the problems, the invention provides a sound source positioning method, a sound source positioning device, sound source positioning equipment and a sound source positioning storage medium, which can quickly position a sound source based on generalized cross-correlation values, can realize quick positioning even under the condition of multiple sound sources, and fundamentally solve the problems that the prior art is slow in response speed and cannot support multiple sound source scenes.
In a first aspect, an embodiment of the present application provides a sound source localization method, including:
the audio acquisition equipment acquires audio signals by utilizing an arranged microphone array, wherein the microphone array comprises a plurality of microphones which are respectively arranged in different directions of the audio acquisition equipment and are used for acquiring the audio signals from different directions;
the audio acquisition equipment determines a frequency domain signal of the audio signal;
the audio acquisition equipment calculates target generalized cross-correlation values in multiple directions based on frequency information of the frequency domain signals corresponding to the microphones, wherein the target generalized cross-correlation values in any one direction of the multiple directions are used for representing delay characteristics of frequency information reaching a pair of microphones in the microphone array;
the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions.
In a specific example of the present application, the method further includes:
combining any two microphones in the microphone array to obtain N pairs of microphones, wherein the microphones are combined in a matrix of N pairs
Figure BDA0002623617760000021
M is the number of microphones in the microphone arrayAmount of the compound (A).
In a specific example of the present application, the calculating, by the audio acquisition device, target generalized cross-correlation values in multiple directions based on frequency information of the frequency domain signal corresponding to the microphone includes:
calculating generalized cross-correlation values corresponding to each pair of microphones in the microphone array for one direction based on each frequency information of the frequency domain signal corresponding to the microphone;
and obtaining a target generalized cross-correlation value in one direction based on all the generalized cross-correlation values in one direction so as to obtain target generalized cross-correlation values in multiple directions.
In a specific example of the present application, the determining, by the audio acquisition device, a sound source direction of the audio signal from the plurality of directions based on the delay features represented by the target generalized cross-correlation values in the plurality of directions includes:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values as the sound source direction of the audio signal.
In a specific example of the present application, the determining, by the audio acquisition device, a sound source direction of the audio signal from the plurality of directions based on the delay features represented by the target generalized cross-correlation values in the plurality of directions includes:
selecting a suspected sound source direction of the audio signal from a plurality of directions based on the delay characteristics represented by the target generalized mutual values;
determining a plurality of adjacent directions corresponding to the suspected sound source direction;
and calculating to obtain target generalized cross-correlation values of the multiple adjacent directions, and determining the sound source direction of the audio signal from the multiple adjacent directions.
In a specific example of the present application, the determining a sound source direction of the audio signal from the plurality of adjacent directions includes:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values of the multiple adjacent directions as the sound source direction of the audio signal.
In a second aspect, an embodiment of the present application provides a sound source localization apparatus, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring audio signals by utilizing an arranged microphone array, the microphone array comprises a plurality of microphones, and the microphones are respectively arranged in different directions of the audio acquisition equipment and are used for acquiring the audio signals from different directions;
a signal conversion unit for determining a frequency domain signal of the audio signal;
a calculating unit, configured to calculate target generalized cross-correlation values in multiple directions based on frequency information of the frequency-domain signal corresponding to the microphone, where the target generalized cross-correlation value in any one of the multiple directions is used to characterize delay characteristics of frequency information reaching a pair of microphones in the microphone array;
a positioning unit, configured to determine a sound source direction of the audio signal from the multiple directions based on the delay characteristics represented by the target generalized cross-correlation values in the multiple directions.
In a specific example of the present application, the calculating unit is further configured to:
combining any two microphones in the microphone array to obtain N pairs of microphones, wherein the microphones are combined in a matrix of N pairs
Figure BDA0002623617760000041
The M is the number of microphones in the microphone array.
In a specific example of the present application, the calculating unit is further configured to:
calculating generalized cross-correlation values corresponding to each pair of microphones in the microphone array for one direction based on each frequency information of the frequency domain signal corresponding to the microphone;
and obtaining a target generalized cross-correlation value in one direction based on all the generalized cross-correlation values in one direction so as to obtain target generalized cross-correlation values in multiple directions.
In a specific example of the present application, the positioning unit is further configured to use a direction corresponding to a maximum value in the target generalized cross-correlation value as a sound source direction of the audio signal.
In a specific example of the present application, the positioning unit is further configured to:
selecting a suspected sound source direction of the audio signal from a plurality of directions based on the delay characteristics represented by the target generalized mutual values;
determining a plurality of adjacent directions corresponding to the suspected sound source direction;
and calculating to obtain target generalized cross-correlation values of the multiple adjacent directions, and determining the sound source direction of the audio signal from the multiple adjacent directions.
In a specific example of the present application, the positioning unit is further configured to:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values of the multiple adjacent directions as the sound source direction of the audio signal.
In a third aspect, an embodiment of the present application provides a sound source localization apparatus, including:
one or more processors;
a memory communicatively coupled to the one or more processors;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method described above.
Therefore, according to the scheme of the application, the microphones in the microphone array can be used for collecting audio signals from different directions, the delay characteristics of the audio signals reaching the microphones in the microphone pair (namely a pair of microphones formed by any two microphones in the microphone array) are different according to the frequency information corresponding to the audio signals, and the generalized cross-correlation values (namely the target generalized cross-correlation values) in multiple directions are rapidly calculated, so that the sound source positioning is completed; moreover, even if the positioning mode is also suitable for multiple sound sources, the positioning mode can also realize quick positioning without word loss, thereby fundamentally solving the problems that the prior art has slow reaction speed and cannot support multiple sound source scenes.
Drawings
FIG. 1 is a schematic flow chart of a sound source localization method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a sound source localization method according to an embodiment of the present application in a specific application scenario;
FIG. 3 is a schematic diagram of a two-step positioning method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a sound source localization apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a sound source localization apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In some of the flows described in the specification and claims of the present application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that the flows may include more or less operations, and that the operations may be performed sequentially or in parallel.
The embodiment of the application provides a sound source positioning method, a sound source positioning device, sound source positioning equipment and a storage medium; specifically, fig. 1 is a schematic flow chart of an implementation of a sound source localization method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
step 101: the audio acquisition equipment acquires audio signals by utilizing the arranged microphone array, wherein the microphone array comprises a plurality of microphones which are respectively arranged in different directions of the audio acquisition equipment and used for acquiring the audio signals from different directions. For example, the microphone array includes M microphones, and each of the microphones is disposed in a different direction of the audio acquisition device to acquire an audio signal from the different direction.
In the scheme, the microphone array is a fully-directional microphone array, wherein each microphone in the microphone array is arranged in different directions of the audio acquisition equipment, so that audio signals in all directions can be acquired; in addition, due to the fact that the delay of the audio signals reaching the microphones at different positions is different, the foundation can be laid for achieving sound source positioning.
Step 102: the audio acquisition device determines a frequency domain signal of the audio signal, for example, the audio acquisition device performs short-time fourier transform on the audio signal acquired by all the microphones to obtain the frequency domain signal of the audio signal.
In practical applications, for the same sound source, since the microphones are disposed in different directions, the audio signals collected by different microphones for the same sound source are different. At this time, the audio acquisition device performs short-time fourier transform on the audio signals acquired by all the microphones to obtain frequency domain signals corresponding to the audio signals acquired by each microphone.
Step 103: the audio acquisition equipment calculates target generalized cross-correlation values in multiple directions based on frequency information of the frequency domain signals corresponding to the microphones, wherein the target generalized cross-correlation values in any one direction of the multiple directions are used for representing delay characteristics of frequency information reaching a pair of microphones in the microphone array.
In a specific example, a pair of microphones, which may also be referred to as a microphone pair, may be obtained by, for example, combining any two microphones in the microphone array to obtain N microphone pairs, where the microphone pair is obtained
Figure BDA0002623617760000071
That is, the microphones are combined two by two to obtain the microphone pair, thus laying a foundation for realizing sound source positioning。
Step 104: the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions.
In practical application, the audio signal can be an audio signal corresponding to one sound source, and can also be an audio signal with a plurality of sound sources mixed, that is, the scheme of the application can be used for positioning one sound source, and meanwhile, multi-sound-source positioning can also be realized. Here, when the audio signal is an audio signal in which a plurality of sound sources are mixed, the determined sound source directions are also a plurality of directions.
Here, the directions are a plurality of directions set in advance, and thus, sound source localization is accomplished by calculating a target generalized cross-correlation value in the directions set in advance.
In a specific example, the target generalized cross-correlation value may be obtained by calculating a generalized cross-correlation value corresponding to each pair of microphones in the microphone array for one direction based on each frequency information of the frequency domain signal corresponding to the microphone; and obtaining a target generalized cross-correlation value in one direction based on all the generalized cross-correlation values in one direction so as to obtain target generalized cross-correlation values in multiple directions. For example, a generalized cross-correlation value is obtained based on one frequency information of the frequency domain signal corresponding to the microphone and one microphone pair, and so on, all generalized cross-correlation values for one direction are obtained, in other words, one frequency information and one microphone pair are obtained, one generalized cross-correlation value is obtained, the frequency domain signal includes a plurality of frequency information, and at the same time, there are a plurality of microphone pairs, so that a plurality of generalized cross-correlation values can be obtained for one direction, and further, a target generalized cross-correlation value in one direction is obtained based on all generalized cross-correlation values for one direction, and further, target generalized cross-correlation values in a plurality of directions are obtained. For example, the target generalized cross-correlation value in a direction is obtained by adding all the generalized cross-correlation values in the direction.
In practical application, after the target generalized cross-correlation values in each direction are determined, the direction corresponding to the maximum value in the target generalized cross-correlation values is taken as the sound source direction of the audio signal, so that sound source positioning is realized.
In a specific example, to reduce the amount of computation, a two-step method may be adopted to obtain the sound source direction of the audio signal, for example, first, rough positioning, and a suspected sound source direction of the audio signal is selected from multiple directions based on the delay characteristics represented by the target generalized mutual values, where, for simplicity, the direction of rough positioning may be specifically a collecting direction opposite to a microphone; then, in fine positioning, that is, determining a plurality of adjacent directions corresponding to the suspected sound source direction, for example, selecting the plurality of adjacent directions from positive and negative preset degrees of the suspected sound source direction, further calculating to obtain target generalized cross-correlation values of the plurality of adjacent directions in the same manner, and determining the sound source direction of the audio signal from the plurality of adjacent directions based on the target generalized cross-correlation values of the plurality of adjacent directions. In this way, fast positioning is achieved with a reduced amount of data. Of course, in a specific example, a direction corresponding to a maximum value of the target generalized cross-correlation values of the plurality of adjacent directions may be taken as a sound source direction of the audio signal.
In the scene, after the sound source is positioned, audio acquisition is carried out from the positioned sound source direction based on the positioning result, and the acquired signals are subjected to aliasing and then transmitted, so that the audio acquisition and transmission processes in the conference call scene are completed; moreover, the scheme of the application can carry out quick positioning, so that the problems of word loss and the like can be avoided; when the direction of the sound source is changed, the sound source can still be quickly positioned, so that the problems that the prior art is slow in response speed and cannot support a multi-sound-source conference scene are fundamentally solved. Furthermore, because the selected audio is only transmitted in the scheme of the application, the call quality in the call scene can be ensured, and a foundation is laid for improving the user experience.
Therefore, according to the scheme of the application, the microphones in the microphone array can be used for collecting audio signals from different directions, the delay characteristics of the audio signals reaching the microphones in the microphone pair (namely a pair of microphones formed by any two microphones in the microphone array) are different according to the frequency information corresponding to the audio signals, and the generalized cross-correlation values (namely the target generalized cross-correlation values) in multiple directions are rapidly calculated, so that the sound source positioning is completed; moreover, even if the positioning mode is also suitable for multiple sound sources, the positioning mode can also realize quick positioning without word loss, thereby fundamentally solving the problems that the prior art has slow reaction speed and cannot support multiple sound source scenes.
The following describes the present application in further detail with reference to specific examples, and specifically, the present example takes a conference call scene as an example, and uses a uniform annular microphone array (the uniform annular microphone array, all microphones are placed on a circle at equal intervals) to achieve fast localization of a sound source, thereby fundamentally solving the problems of slow response speed and incapability of supporting multiple sound source scenes in the prior art. Here, Sound Localization (SSL) in this example refers to fast Sound Localization using phase information of signals collected by microphones.
As shown in fig. 2, the fast sound source localization procedure based on the phase information is as follows:
obtaining a frame of time domain signals for all microphones in the array of microphones;
for all microphones, transforming the acquired Time domain signal into a frequency domain signal by Short Time Fourier Transform (STFT);
for all microphones, calculating the phase of the frequency domain signal to obtain frequency information corresponding to the frequency domain signal;
pairing all the microphones in the microphone array to obtain
Figure BDA0002623617760000091
And each microphone pair, wherein M is the number of the microphones. For example,6 microphones were used, in this case, 15 microphone pairs were obtained;
presetting D target directions, wherein D is a positive integer greater than or equal to 2; for each target direction, calculating a generalized cross-correlation (GCC) value based on each microphone pair and frequency information corresponding to each time domain signal, calculating in the same manner for all frequency updates and all microphone pairs, that is, obtaining a plurality of generalized cross-correlation values for one target direction, summing the plurality of generalized cross-correlation values, and finally obtaining an output value (that is, a target generalized cross-correlation value) in the target direction (indicated by a bold arrow in the figure);
for the D directions, D output values are obtained, and the maximum value of the D output values is found, and the corresponding target Direction is the Direction of sound source (DOA).
In practical application, a generalized cross-correlation (hereinafter referred to as GCC) value can be calculated as follows:
taking the microphone pair composed of microphones # 1 and # 2 as an example, at an angular frequency ω, the generalized cross-correlation is defined as:
Figure BDA0002623617760000101
therein, Ψ12(ω) is a weighting function related to angular frequency, ω being 2πf (f denotes frequency information), X1(omega) and X2And (ω) is a frequency domain coefficient at an angular frequency ω after the microphones STFT 1 and STFT 2, respectively, which represents a conjugate, and τ represents a Time Difference between the positions of the microphones 1 and 2 of the far-field sound source in the current direction (Time Difference of Arrival, TDOA).
Here, in order to achieve fast sound source localization, a Phase Transform (PHAT) weighting function, that is, a Phase Transform (PHAT) weighting function may be used
Figure BDA0002623617760000102
At this time, the generalized cross-correlation formula can be transformed into:
Figure BDA0002623617760000103
taking into account the conjugate symmetry of the Fourier transform of the real signal, i.e. < X >m(-ω)=-∠Xm(ω), m is 1,2, …, and the above formula can be simplified as:
Figure BDA0002623617760000111
in practical application, the upper and lower limits of the integral are limited to a certain frequency band, for example [500, 3000] Hz, in consideration of the voice characteristics, so that the stability of the algorithm is increased.
Here, to reduce the amount of calculation, the present example may also adopt a Two-step SSL (Two-step SSL) approach to achieve fast localization, and in particular, in a conference call scenario, sound source localization is usually required in a Two-dimensional horizontal plane, that is, 360 ° full space. In order to improve the positioning accuracy, it is desirable that the number D of preset target directions is as large as possible. For example, when D is 60, the positioning accuracy may be 360 °/2D is 3 °. However, as D increases, the amount of computation also increases, and therefore, the amount of computation is significantly reduced without reducing the value of D (i.e., without reducing the accuracy), and this example proposes a two-step positioning method positioning scheme based on the circumferential symmetry of the uniform circular array. The method comprises the following specific steps:
assuming that M microphones are included in a uniform circular array, for convenience of implementation, D is an integer multiple of M, i.e., D ═ D1M,D1∈Z+. For example, for a 6 microphone uniform annular array, i.e., M-6, D-D may be selected1M=60,D110, wherein D1Is the precision.
First, rough positioning: firstly, scanning the direction opposite to the M microphones to find out the direction corresponding to the maximum value of the GCC.
Second, fine positioning: on both sides in the direction of return of the first step D 11 fine directions, 2 (D) in total1After scanning in the directions of-1) +1, finding out the maximum value correspondence of GCCThe direction of the positioning table is used for obtaining a final positioning result.
For example, as described in fig. 3, for example, M-6 and D-60:
in the first step, coarse positioning is performed in the directions indicated by the 6 microphones to obtain the direction corresponding to the maximum GCC value (the direction corresponding to the gray icon in the figure).
Second, fine scanning (shown as an arc) is performed on both sides of the gray icon in the corresponding direction, each side having a D1-1-D/M-1-9 equally spaced directions, with the directions on both sides corresponding to the gray icons, for a total of 2 (D)1And (3) 1) +1 is 19 directions, and the final positioning result is obtained.
In the example shown in fig. 3, using two positioning directions, only 6+19 needs to be scanned in 25 directions, which reduces the amount of computation by more than half compared to directly scanning 60 directions.
Here, Voice Activity Detection (VAD), when audio does not exist, the sound source localization algorithm may localize a noise source if it continues to work, or a random localization result occurs, and thus, the sound source localization algorithm needs to work in cooperation with the VAD algorithm. Only when the audio is detected to exist, the sound source is positioned to output DOA; when speech is not present, the sound source localization output is NULL.
An embodiment of the present application further provides a sound source localization apparatus, as shown in fig. 4, the apparatus includes:
the acquisition unit 41 is configured to acquire an audio signal by using an arranged microphone array, where the microphone array includes a plurality of microphones, and the plurality of microphones are respectively arranged in different directions of the audio acquisition device and are used to acquire audio signals from different directions;
a signal conversion unit 42 for determining a frequency domain signal of the audio signal;
a calculating unit 43, configured to calculate target generalized cross-correlation values in multiple directions based on frequency information of the frequency-domain signal corresponding to the microphone, where the target generalized cross-correlation value in any one of the multiple directions is used to characterize delay characteristics of frequency information arriving at a pair of microphones in the microphone array;
a localization unit 44 configured to determine a sound source direction of the audio signal from the plurality of directions based on the delay characteristics represented by the target generalized cross-correlation values in the plurality of directions.
In a specific example of the present application, the calculating unit 43 is further configured to:
combining any two microphones in the microphone array to obtain N pairs of microphones, wherein the microphones are combined in a matrix of N pairs
Figure BDA0002623617760000121
The M is the number of microphones in the microphone array.
In a specific example of the present application, the calculating unit 43 is further configured to:
calculating generalized cross-correlation values corresponding to each pair of microphones in the microphone array for one direction based on each frequency information of the frequency domain signal corresponding to the microphone;
and obtaining a target generalized cross-correlation value in one direction based on all the generalized cross-correlation values in one direction so as to obtain target generalized cross-correlation values in multiple directions.
In a specific example of the present application, the positioning unit 44 is further configured to use a direction corresponding to a maximum value of the target generalized cross-correlation values as a sound source direction of the audio signal.
In a specific example of the present application, the positioning unit 44 is further configured to:
selecting a suspected sound source direction of the audio signal from a plurality of directions based on the delay characteristics represented by the target generalized mutual values;
determining a plurality of adjacent directions corresponding to the suspected sound source direction;
and calculating to obtain target generalized cross-correlation values of the multiple adjacent directions, and determining the sound source direction of the audio signal from the multiple adjacent directions.
In a specific example of the present application, the positioning unit 44 is further configured to:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values of the multiple adjacent directions as the sound source direction of the audio signal.
Here, it should be noted that: the descriptions of the embodiments of the apparatus are similar to the descriptions of the methods, and have the same advantages as the embodiments of the methods, and therefore are not repeated herein. For technical details that are not disclosed in the embodiments of the apparatus of the present invention, those skilled in the art should refer to the description of the embodiments of the method of the present invention to understand, and for brevity, will not be described again here.
An embodiment of the present application further provides a sound source localization apparatus, including: one or more processors; a memory communicatively coupled to the one or more processors; one or more application programs; wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method described above.
In a specific example, the sound source device according to the embodiment of the present application may be embodied as a structure as shown in fig. 5, and the sound source device at least includes a processor 51, a storage medium 52, and at least one external communication interface 53; the processor 51, the storage medium 52 and the external communication interface 53 are all connected by a bus 54. The processor 51 may be a microprocessor, a central processing unit, a digital signal processor, a programmable logic array, or other electronic components with processing functions. The storage medium has stored therein computer executable code capable of performing the method of any of the above embodiments. In practical applications, the acquisition unit 41, the signal conversion unit 42, the calculation unit 43, and the positioning unit 44 can be implemented by the processor 51.
Here, it should be noted that: the above description of the embodiment of the sound source localization device is similar to the above description of the method, and has the same beneficial effects as the embodiment of the method, and therefore, the description thereof is omitted. For technical details not disclosed in the embodiments of the sound source localization apparatus of the present invention, those skilled in the art should refer to the description of the embodiments of the method of the present invention to understand, and for the sake of brevity, no further description is provided here.
Embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, the computer program implements the method described above.
A computer-readable storage medium can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that all or part of the steps carried by the method for implementing the above embodiments can be implemented by hardware related to instructions of a program, which can be stored in a computer readable storage medium, and the program includes one or a combination of the steps of the method embodiments when the program is executed.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The embodiments described above are only a part of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims (10)

1. A sound source localization method, characterized in that the method comprises:
the audio acquisition equipment acquires audio signals by utilizing an arranged microphone array, wherein the microphone array comprises a plurality of microphones which are respectively arranged in different directions of the audio acquisition equipment and are used for acquiring the audio signals from different directions;
the audio acquisition equipment determines a frequency domain signal of the audio signal;
the audio acquisition equipment calculates target generalized cross-correlation values in multiple directions based on frequency information of the frequency domain signals corresponding to the microphones, wherein the target generalized cross-correlation values in any one direction of the multiple directions are used for representing delay characteristics of frequency information reaching a pair of microphones in the microphone array;
the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions.
2. The method of claim 1, further comprising:
combining any two microphones in the microphone array to obtain N pairs of microphones, wherein the microphones are combined in a matrix of N pairs
Figure FDA0002623617750000011
The M is the number of microphones in the microphone array.
3. The method of claim 1 or 2, wherein the audio acquisition device calculates the target generalized cross-correlation values in a plurality of directions based on the frequency information of the frequency domain signal corresponding to the microphone, comprising:
calculating generalized cross-correlation values corresponding to each pair of microphones in the microphone array for one direction based on each frequency information of the frequency domain signal corresponding to the microphone;
and obtaining a target generalized cross-correlation value in one direction based on all the generalized cross-correlation values in one direction so as to obtain target generalized cross-correlation values in multiple directions.
4. The method of claim 1, wherein the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions, comprising:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values as the sound source direction of the audio signal.
5. The method of claim 1, wherein the audio acquisition device determines a sound source direction of the audio signal from the plurality of directions based on the delay characteristics characterized by the target generalized cross-correlation values in the plurality of directions, comprising:
selecting a suspected sound source direction of the audio signal from a plurality of directions based on the delay characteristics represented by the target generalized mutual values;
determining a plurality of adjacent directions corresponding to the suspected sound source direction;
and calculating to obtain target generalized cross-correlation values of the multiple adjacent directions, and determining the sound source direction of the audio signal from the multiple adjacent directions.
6. The method according to claim 5, wherein said determining a sound source direction of the audio signal from the plurality of adjacent directions comprises:
and taking the direction corresponding to the maximum value in the target generalized cross-correlation values of the multiple adjacent directions as the sound source direction of the audio signal.
7. A sound source localization apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring audio signals by utilizing an arranged microphone array, the microphone array comprises a plurality of microphones, and the microphones are respectively arranged in different directions of the audio acquisition equipment and are used for acquiring the audio signals from different directions;
a signal conversion unit for determining a frequency domain signal of the audio signal;
a calculating unit, configured to calculate target generalized cross-correlation values in multiple directions based on frequency information of the frequency-domain signal corresponding to the microphone, where the target generalized cross-correlation value in any one of the multiple directions is used to characterize delay characteristics of frequency information reaching a pair of microphones in the microphone array;
a positioning unit, configured to determine a sound source direction of the audio signal from the multiple directions based on the delay characteristics represented by the target generalized cross-correlation values in the multiple directions.
8. The apparatus of claim 7, wherein the computing unit is further configured to:
combining any two microphones in the microphone array to obtain N pairs of microphones, wherein the microphones are combined in a matrix of N pairs
Figure FDA0002623617750000031
The M is the number of microphones in the microphone array.
9. A sound source localization apparatus, comprising:
one or more processors;
a memory communicatively coupled to the one or more processors;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN202010790574.9A 2020-08-07 2020-08-07 Sound source positioning method, device, equipment and storage medium Pending CN111880148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010790574.9A CN111880148A (en) 2020-08-07 2020-08-07 Sound source positioning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010790574.9A CN111880148A (en) 2020-08-07 2020-08-07 Sound source positioning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111880148A true CN111880148A (en) 2020-11-03

Family

ID=73211894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010790574.9A Pending CN111880148A (en) 2020-08-07 2020-08-07 Sound source positioning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111880148A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311391A (en) * 2021-04-25 2021-08-27 普联国际有限公司 Sound source positioning method, device and equipment based on microphone array and storage medium
CN114863943A (en) * 2022-07-04 2022-08-05 杭州兆华电子股份有限公司 Self-adaptive positioning method and device for environmental noise source based on beam forming
CN115616082A (en) * 2022-12-14 2023-01-17 杭州兆华电子股份有限公司 Keyboard defect analysis method based on noise detection
CN116609726A (en) * 2023-05-11 2023-08-18 钉钉(中国)信息技术有限公司 Sound source positioning method and device
WO2023246224A1 (en) * 2022-06-20 2023-12-28 青岛海尔科技有限公司 Method and apparatus for determining orientation of sound source, storage medium, and electronic apparatus
CN118362977A (en) * 2024-06-19 2024-07-19 北京远鉴信息技术有限公司 Sound source positioning device and method, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method
CN105388459A (en) * 2015-11-20 2016-03-09 清华大学 Robustness sound source space positioning method of distributed microphone array network
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN108370470A (en) * 2015-12-04 2018-08-03 森海塞尔电子股份有限及两合公司 Voice acquisition methods in conference system and conference system with microphone array system
CN109001682A (en) * 2018-05-30 2018-12-14 大连民族大学 A kind of positioning sound source by robot based on microphone array
CN111025233A (en) * 2019-11-13 2020-04-17 阿里巴巴集团控股有限公司 Sound source direction positioning method and device, voice equipment and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method
CN105388459A (en) * 2015-11-20 2016-03-09 清华大学 Robustness sound source space positioning method of distributed microphone array network
CN108370470A (en) * 2015-12-04 2018-08-03 森海塞尔电子股份有限及两合公司 Voice acquisition methods in conference system and conference system with microphone array system
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN109001682A (en) * 2018-05-30 2018-12-14 大连民族大学 A kind of positioning sound source by robot based on microphone array
CN111025233A (en) * 2019-11-13 2020-04-17 阿里巴巴集团控股有限公司 Sound source direction positioning method and device, voice equipment and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚欢等: "基于时延估计的麦克风阵列一致性分析", 《复旦学报(自然科学版)》 *
谭颖等: "改进的SRP-PHAT声源定位方法", 《电子与信息学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311391A (en) * 2021-04-25 2021-08-27 普联国际有限公司 Sound source positioning method, device and equipment based on microphone array and storage medium
WO2023246224A1 (en) * 2022-06-20 2023-12-28 青岛海尔科技有限公司 Method and apparatus for determining orientation of sound source, storage medium, and electronic apparatus
CN114863943A (en) * 2022-07-04 2022-08-05 杭州兆华电子股份有限公司 Self-adaptive positioning method and device for environmental noise source based on beam forming
CN115616082A (en) * 2022-12-14 2023-01-17 杭州兆华电子股份有限公司 Keyboard defect analysis method based on noise detection
CN116609726A (en) * 2023-05-11 2023-08-18 钉钉(中国)信息技术有限公司 Sound source positioning method and device
CN118362977A (en) * 2024-06-19 2024-07-19 北京远鉴信息技术有限公司 Sound source positioning device and method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111880148A (en) Sound source positioning method, device, equipment and storage medium
US9641929B2 (en) Audio signal processing method and apparatus and differential beamforming method and apparatus
JP2021153315A (en) Method and apparatus for decoding stereo loudspeaker signal from higher-order ambisonics audio signal
CN109102822B (en) Filtering method and device based on fixed beam forming
CN102804809B (en) Audio-source is located
KR101415026B1 (en) Method and apparatus for acquiring the multi-channel sound with a microphone array
WO2020151133A1 (en) Sound acquisition system having distributed microphone array, and method
EP3675527A1 (en) Audio processing device and method, and program therefor
CN105388459B (en) The robust sound source space-location method of distributed microphone array network
CN109254266A (en) Sound localization method, device and storage medium based on microphone array
WO2014007911A1 (en) Audio signal processing device calibration
WO2014008253A1 (en) Systems and methods for surround sound echo reduction
CN112363112B (en) Sound source positioning method and device based on linear microphone array
CN113192486B (en) Chorus audio processing method, chorus audio processing equipment and storage medium
KR20140015894A (en) Apparatus and method for estimating location of sound source
CN112799017A (en) Sound source positioning method, sound source positioning device, storage medium and electronic equipment
Griebel et al. Microphone array source localization using realizable delay vectors
CN113314138B (en) Sound source monitoring and separating method and device based on microphone array and storage medium
CN113744752A (en) Voice processing method and device
CN110517703B (en) Sound collection method, device and medium
CN111933182B (en) Sound source tracking method, device, equipment and storage medium
CN104424971A (en) Audio file playing method and audio file playing device
CN113514799B (en) Sound source positioning method, device, equipment and storage medium based on microphone array
CN114827798B (en) Active noise reduction method, active noise reduction circuit, system and storage medium
CN108845292B (en) Sound source positioning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: Tiktok vision (Beijing) Co.,Ltd.

CB02 Change of applicant information