CN110954866A

CN110954866A - Sound source positioning method, electronic device and storage medium

Info

Publication number: CN110954866A
Application number: CN201911158057.3A
Authority: CN
Inventors: 董天旭
Original assignee: Cloudminds Chengdu Technologies Co ltd
Current assignee: Cloudminds Shanghai Robotics Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-03
Anticipated expiration: 2039-11-22
Also published as: CN110954866B

Abstract

The embodiment of the invention relates to the field of data processing, and discloses a sound source positioning method, electronic equipment and a storage medium. In some embodiments of the present invention, a sound source localization method includes: acquiring indication parameters of each angle, wherein the indication parameters of the angle are used for determining whether the direction corresponding to the angle is a sound source direction; aiming at any angle, updating the indication parameter of the angle according to the indication parameter in the reference interval corresponding to the angle; the reference interval corresponding to the angle comprises a predefined reference angle interval of the angle and/or a predefined reference time interval of the angle, and the indication parameter in the reference interval corresponding to the angle comprises the indication parameter of the angle; and determining the sound source direction of the sound according to the updated indication parameters of the angles. The embodiment improves the robustness of sound source positioning, reduces deviation and can reduce the occurrence of bad values.

Description

Sound source positioning method, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of data processing, in particular to a sound source positioning method, electronic equipment and a storage medium.

Background

Since voice interaction is a natural and friendly interaction mode, it is gradually accepted by people and widely applied to various life scenes, such as vehicle-mounted voice, smart television and sound, intelligent robots, and the like. Voice interactions are classified into near-field voices and far-field voices. Near-field speech, such as speech input methods on mobile phones, is well established. What really changes the way people interact is far-field speech. Far-field speech interaction includes far-field microphone array pickup, speech recognition and speech understanding. Because far-field speech is closely related to product hardware, such as microphone array shapes, product acoustic structures, etc., far-field speech is an important concern in the design and manufacture of current intelligent interactive devices.

However, the inventors found that at least the following problems exist in the prior art: the existing sound source positioning method in microphone array sound pickup has poor robustness, large deviation and easy bad value acquisition.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of embodiments of the present invention is to provide a sound source positioning method, an electronic device, and a storage medium, which improve robustness of sound source positioning, reduce deviation, and reduce occurrence of bad values.

In order to solve the above technical problem, an embodiment of the present invention provides a sound source localization method, including the following steps: acquiring indication parameters of each angle, wherein the indication parameters of the angle are used for determining whether the direction corresponding to the angle is a sound source direction; aiming at any angle, updating the indication parameter of the angle according to the indication parameter in the reference interval corresponding to the angle; the reference interval corresponding to the angle comprises a predefined reference angle interval of the angle and/or a predefined reference time interval of the angle, and the indication parameter in the reference interval corresponding to the angle comprises the indication parameter of the angle; and determining the sound source direction of the sound according to the updated indication parameters of the angles.

An embodiment of the present invention provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the sound source localization method as mentioned in the above embodiments.

Embodiments of the present invention also provide a computer-readable storage medium, including: a computer program is stored which, when executed by a processor, implements the sound source localization method mentioned in the above embodiments.

Compared with the prior art, the embodiment of the invention carries out sound source positioning according to the reference time interval of the angle and/or the indication parameter in the reference angle interval, and compared with the method only referring to the indication parameter of the angle, the robustness of a positioning algorithm can be improved, the deviation between the actual direction and the calculated sound source direction is reduced, the occurrence of bad values is reduced, and powerful technical support is provided for modules of beam forming far-field speech enhancement and the like. In addition, due to the adoption of the sound source positioning method, other algorithms such as scoring and the like are not required to be combined, and the method has the characteristics of easiness in implementation and high-efficiency operation.

In addition, the reference interval includes a reference angle interval; updating the indication parameters of the angles according to the indication parameters of the angles in the reference interval corresponding to the angles, specifically comprising: and accumulating or weighting and accumulating the indication parameters in the reference angle interval corresponding to the angle to obtain the updated indication parameters of the angle. In this embodiment, the indication parameters of the reference angle interval are accumulated or weighted, so that false peaks can be effectively eliminated, and some abrupt abnormal points can be removed.

In addition, the reference interval further includes a reference time interval; accumulating or weighted accumulation calculation is carried out on the indication parameters of all the angles in the reference angle interval corresponding to the angles to obtain the indication parameters after the angles are updated, and the method specifically comprises the following steps: accumulating or weighting the indication parameters of each angle in the reference angle interval corresponding to the accumulated angle to obtain the space accumulated value of the current time of the angle; and accumulating or weighting and accumulating the spatial accumulated value of the angle at each time in the reference time interval to obtain the updated indication parameter of the angle. In the embodiment, the indication parameters of the reference time interval [ T-T, T ] are accumulated, so that the problem that the accurate positioning cannot be realized due to a large number of low-energy sampling points can be avoided, and the success probability of the positioning is further improved.

In addition, the reference interval includes a reference time interval; according to the indication parameter in the reference interval corresponding to the angle, updating the indication parameter of the angle, specifically comprising: and accumulating or weighting the indication parameters of the angle at each time in the reference time interval to obtain the updated indication parameters of the angle. In the embodiment, the indication parameters of the reference time interval [ T-T, T ] are accumulated, so that the problem that the accurate positioning cannot be realized due to a large number of low-energy sampling points can be solved.

In addition, the reference interval also includes a reference angle interval; accumulating or weighting the indication parameters of the angle at each time in the reference time interval to obtain the updated indication parameters of the angle, which specifically comprises: accumulating or weighting the indication parameters of the angle at each time in the reference time interval to obtain the time accumulated value of the angle; and accumulating or weighting and accumulating the time accumulated value of each angle in the reference angle interval corresponding to the angle to obtain the updated indication parameter of the angle.

In addition, the indication parameter is a GCC-PHAT value calculated based on a generalized cross-correlation-phase transformation GCC-PHAT algorithm, or an SRP-PHAT value calculated based on a controllable steering response power-phase transformation SRP-PHAT algorithm.

In addition, the indication parameter is an SRP-phot value, and the obtaining of the indication parameter for each angle specifically includes: determining the corresponding relation between the time difference of each microphone pair and the GCC-PHAT value; carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference of each microphone pair; and calculating the SRP-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle. In the embodiment, the corresponding relation of the GCC-PHAT values is subjected to interpolation processing to obtain higher-precision time difference resolution, so that the angular resolution of the array is improved, and the accuracy of sound source positioning is improved.

In addition, the indication parameter is a GCC-phot value, and the obtaining of the indication parameter of each angle specifically includes: determining the corresponding relation between the time difference and the GCC-PHAT value; carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference; and determining the GCC-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle. In the embodiment, the corresponding relation of the GCC-PHAT values is subjected to interpolation processing to obtain higher-precision time difference resolution, so that the angular resolution of the array is improved, and the accuracy of sound source positioning is improved.

In addition, the interpolation processing is performed on the GCC-PHAT value corresponding to the time difference, which specifically includes: calculating the maximum time delay according to the maximum distance between each microphone pair; and (4) carrying out interpolation processing on the GCC-PHAT value of the time difference interval [ -maximum time delay, maximum time delay ].

In addition, before the obtaining of the indication parameter of each angle, the sound source localization method further includes: it is determined that speech is currently present.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

Fig. 1 is a flowchart of a sound source localization method according to a first embodiment of the present invention;

fig. 2 is a flowchart of a sound source localization method according to a second embodiment of the present invention;

fig. 3 is a schematic configuration diagram of a sound source localization method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a sound source localization method applied to an electronic device, for example: various terminals (intelligent sound equipment, voice robot, etc.) or servers. As shown in fig. 1, the sound source localization method includes:

step 101: and acquiring the indication parameters of all the angles.

Specifically, the indication parameter of the angle is used to determine whether the direction corresponding to the angle is the sound source direction. The indication parameter may be a GCC-PHAT value calculated based on a Generalized Cross Correlation-Phase transformation (GCC-PHAT) algorithm, or may be an SRP-PHAT value calculated based on a controlled steering Response Power-Phase transformation (SRP-PHAT) algorithm, which is not illustrated in the present embodiment.

It should be noted that, as can be understood by those skilled in the art, in practical applications, the indication parameter may be other parameters, and the embodiment is merely an example.

The following exemplifies a method of acquiring the indication parameter for each angle.

In one embodiment, the indicative parameter is the GCC-PHAT value of the microphone pair. The electronic device can obtain the indication parameters of each angle by the following method:

the method comprises the following steps: the electronic device determines a correspondence of the time difference and the GCC-PHAT value. And the electronic equipment determines the GCC-PHAT value of each angle according to the corresponding relation between each time difference and each angle and the corresponding relation between each time difference and each GCC-PHAT value. The electronic equipment can determine the corresponding relation between the time difference and the GCC-PHAT value through a GCC-PHAT algorithm.

The method 2 comprises the following steps: the electronic equipment determines the corresponding relation between the time difference and the GCC-PHAT value; carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference; and determining the GCC-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle. The electronic device may determine the correspondence between each angle and the GCC-PHAT value according to the correspondence between each time difference after interpolation and the GCC-PHAT value after interpolation, and the correspondence between each time difference after interpolation and each angle, so as to determine the GCC-PHAT value corresponding to each angle.

In one embodiment, the indicator parameter is an SRP-PHAT value. The electronic device can obtain the indication parameters of each angle by the following method:

the method comprises the following steps: the electronic device calculates a time difference for each microphone pair; calculating a GCC-PHAT value corresponding to each time difference of each microphone pair according to the time difference of each microphone pair; and according to the corresponding relation between the time difference and the angle, taking the sum of the GCC-PHAT values of all the microphone pairs corresponding to the time difference corresponding to the angle as the SRP-PHAT value of the angle. Specifically, the electronic device calculates a time difference in units of sampling periods for each microphone pair in each microphone array according to the GCC-PHAT algorithm. And calculating the SRP-PHAT value according to the SRP-PHAT algorithm and the corresponding relation between the time difference of each microphone pair and the GCC-PHAT.

The method 2 comprises the following steps: the electronic equipment determines the corresponding relation between the time difference of each microphone pair and the GCC-PHAT value; carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference of each microphone pair; and calculating the SRP-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle. Specifically, the electronic device determines the corresponding relationship between the time difference and the GCC-PHAT, and the process of calculating the SRP-PHAT value may refer to method 1 when the indication parameter is SRP-PHAT, which is not described herein again.

The following illustrates the process of the electronic device interpolating the GCC-PHAT value corresponding to the time difference.

In one embodiment, the electronic device calculates a maximum time delay based on a maximum separation between each microphone pair; and (4) carrying out interpolation processing on the GCC-PHAT value of the time difference interval [ -maximum time delay, maximum time delay ]. Wherein the maximum time delay may be the time required for the sound to travel the maximum distance expressed in terms of the sampling period.

It is worth mentioning that the corresponding relation of the GCC-PHAT value is subjected to interpolation processing to obtain higher-precision time difference discrimination capability, so that the angular resolution of the array is improved, and the accuracy of sound source positioning is improved.

It should be noted that, as will be understood by those skilled in the art, in the present embodiment, the determination method of the time difference interval for performing interpolation is exemplified by determining the time difference interval for performing interpolation based on the maximum time delay, and in practical applications, the time difference interval for performing interpolation processing may also be determined in other manners.

The principle of the interpolation process to improve the angular resolution is illustrated below with reference to an example.

Taking a microphone pair as an example, sound sources with different angles arrive at two microphones with different time differences, and the time differences are in units of sampling periods T. The optimal angular resolution occurs at the vertical angle of the microphone pair, and the sampling period T is (d/c) cos (90- θ '), where d is the pitch of the microphone pair, c is 342m/s, the speed of sound, and θ' is the angular resolution. A set of commonly used microphone arrays has parameters: d is 0.12m, T is 1/16000s, and θ' is 10.3 °. This resolution will cause a serious positioning error, and therefore, the angular resolution of the microphone array can be improved by means of interpolation. The inventor considers that the distance and the sound speed are fixed and constant, so that the time difference resolution of the microphone pair can be improved by adopting an algorithm. Assuming that their temporal difference resolution can be improved to 0.1 sampling periods, the optimum angular resolution θ' is 1 °. Therefore, the inventor finds that the GCC-PHAT corresponding to a certain time difference zone is interpolated by adopting an interpolation algorithm, so that the time difference resolution capability with higher precision can be obtained, and the angle resolution is improved. For example, 10 times interpolation to obtain a GCC-PHAT value corresponding to a 0.1 precision time delay.

In one example, the electronic device performs interpolation processing on the corresponding relationship between the time difference and the GCC-phot value of each microphone pair as follows: calculating the maximum time delay according to the maximum distance between each microphone pair; and (3) carrying out interpolation processing on the GCC-PHAT value of the time difference interval [ -maximum time delay, maximum time delay ] of each microphone pair.

It should be noted that, in the interpolation process, the interpolation algorithm used may be polynomial interpolation, natural cubic spline algorithm, and the like, and the embodiment does not limit the type of the interpolation algorithm specifically used in the interpolation process.

Step 102: and aiming at any angle, updating the indication parameter of the angle according to the indication parameter in the reference interval corresponding to the angle.

Specifically, the reference interval corresponding to the angle comprises a reference angle interval of the predefined angle, and/or the reference time interval corresponding to the predefined angle, and the indication parameter in the reference interval corresponding to the angle comprises an indication parameter of the angle.

In one example, the electronic device accumulates or weights the indication parameters in the reference interval corresponding to the angle to obtain the updated indication parameters of the angle.

The following description exemplifies a method of updating the indication parameter for each angle.

In one embodiment, the reference interval includes a reference angle interval, and the electronic device performs accumulation or weighted accumulation calculation on the indication parameter in the reference angle interval corresponding to the angle to obtain the updated indication parameter of the angle. Specifically, for the angle θ, the reference angle interval may be [ θ - Δ, θ + Δ ], Δ may be set as needed, for example, 3 to 5 times of the angular resolution of the microphone, for example, 5 degrees, and the embodiment is not limited.

It should be noted that, by accumulating or weighting and accumulating the indication parameters of the reference angle interval, false peaks can be effectively eliminated, and some abrupt abnormal points can be removed.

The method for updating the indication parameters of the angles by the electronic equipment includes but is not limited to:

the method comprises the following steps: and the electronic equipment performs accumulation or weighted accumulation calculation on the indication parameters in the reference angle interval corresponding to the angle to obtain a spatial accumulated value of the angle, and the spatial accumulated value of the angle is used as the updated indication parameters of the angle. Specifically, taking the indication parameter as the SRP-PHAT value as an example, after the SRP-PHAT value corresponding to each angle is calculated by the SRP-PHAT positioning algorithm, the SRP-PHAT values in the reference angle interval [ θ - Δ, θ + Δ ] are accumulated or weighted, and spatial accumulation can effectively eliminate false SRP-PHAT peak values. The weighting and accumulating coefficient of each angle is 1, that is, the weighting and accumulating coefficient is a common one, the weighting and accumulating coefficient can be designed by adopting window functions such as a rectangular window, a triangular window, a sweat window and the like, and the embodiment does not limit the specific weighting and accumulating coefficient of the indication coefficient of each angle in the reference angle interval corresponding to the angle.

The method 2 comprises the following steps: under the condition that the reference interval also comprises a reference time interval, the electronic equipment accumulates or weights and accumulates the indication parameters of each angle in the reference angle interval corresponding to the angle to obtain a space accumulated value of the current time of the angle; and accumulating or weighting and accumulating the spatial accumulated value of the angle at each time in the reference time interval to obtain the updated indication parameter of the angle.

In one example, the electronic device may be in accordance with formula a: TSPHAT (k) ═ a TSPHAT (k-1) + (1-a) _ SPHAT (k), and the updated index parameter of the angle is calculated. TSPHAT (k) represents the updated indicating parameter of the angle, a represents a preset proportion parameter, TSPHAT (k-1) represents the updated indicating parameter of the angle obtained by the last calculation, and SPHAT (k) represents the space accumulated value of the angle at the current time.

In another embodiment, the reference interval includes a reference time interval, and the electronic device accumulates or weights the indication parameter of the angle at each time in the reference time interval to obtain the updated indication parameter of the angle.

It is worth mentioning that the indication parameters of the reference time interval [ T-T, T ] are accumulated, so that the problem that the accurate positioning cannot be realized due to a large number of low-energy sampling points can be avoided.

the method comprises the following steps: the electronic equipment accumulates or weights the indication parameters of the angle at each time in the reference time interval to obtain the time accumulated value of the angle, and the time accumulated value of the angle is used as the updated indication parameters of the angle.

The method 2 comprises the following steps: under the condition that the reference interval also comprises a reference angle interval, the electronic equipment accumulates or weights and accumulates the indication parameters of the angle at each time in the reference time interval to obtain a time accumulated value of the angle; and accumulating or weighting and accumulating the time accumulated value of each angle in the reference angle interval corresponding to the angle to obtain the updated indication parameter of the angle.

In one example, when performing weighted accumulation of the indication parameter based on the reference time interval, the electronic device may, for any angle, follow equation b: TPHAT (k) ═ TPHAT (k-1) + (1-a) × phot (k), calculating the time accumulation value of the angle; wherein, TPHAT (k) represents the time accumulated value of the angle, a represents a preset proportion parameter, TPHAT (k-1) represents the time accumulated value of the angle obtained by the last calculation, and phat (k) represents the indication parameter of the angle at the current time. The weighting coefficient a can be 0-1, and the larger a is, the larger the historical effect is, and the better the robustness is. When the sound source is static in the use scene, a can be the parameter within the interval of 0.6-0.8, and when the sound source changes constantly, a can be the parameter within the interval of 0.2-0.4.

It should be noted that, as can be understood by those skilled in the art, in practical application, other manners may also be adopted for calculation, and the description of the embodiment is not repeated.

Step 103: and determining the sound source direction of the sound according to the updated indication parameters of the angles.

Specifically, the electronic device may determine the sound source direction of the sound according to the updated indication parameters for each angle based on a positioning algorithm used in sound source positioning.

In one embodiment, the indicator parameter is an SRP-PHAT value. The electronic device may use the direction indicated by the angle corresponding to the maximum updated SRP-PHAT value as the sound source direction of the sound.

It should be noted that, when the GCC-PHAT value is indicated, the electronic device may set a mode of determining the sound source direction based on an algorithm used subsequently, and the present embodiment is not limited.

The above description is only for illustrative purposes and does not limit the technical aspects of the present invention.

Compared with the prior art, the sound source positioning method provided by the embodiment carries out sound source positioning according to the reference time interval of the angle and/or the indication parameter in the reference angle interval, and compared with the method only referring to the indication parameter of the angle, the robustness of a positioning algorithm can be improved, the deviation between the actual direction and the calculated sound source direction is reduced, the occurrence of bad values is reduced, and powerful technical support is provided for modules of beam forming far-field speech enhancement and the like. In addition, due to the adoption of the sound source positioning method, other algorithms such as scoring and the like are not required to be combined, and the method has the characteristics of easiness in implementation and high-efficiency operation.

A second embodiment of the present invention relates to a sound source localization method. The embodiment is further improved on the basis of the first embodiment, and the specific improvements are as follows: before the indication parameters of each angle are acquired, whether voice exists or not is judged.

Specifically, as shown in fig. 2, the present embodiment includes steps 201 to 204, wherein steps 202 to 204 are substantially the same as steps 101 to 103 in the first embodiment, and are not repeated herein. The following mainly introduces the differences:

step 201: and judging whether voice exists in the data collected by the microphone.

Specifically, if it is determined that there is speech, step 202 to step 204 are performed, otherwise, step 201 may be continued.

It should be noted that, in fig. 2, for example, when it is determined that no voice exists in the data collected by the microphone, the electronic device continuously detects whether a voice exists, and a sound source positioning method is illustrated, in practical applications, if no voice exists, other operations may also be performed. For example, if the reference interval includes a reference time interval, when it is determined that no voice is present in the collected data, the time integrated value (TPHAT (k-1)) of the angle calculated last time is set to 0.

In one embodiment, before determining whether voice exists in data collected by microphones, the electronic device initializes a microphone array, pairs the microphones in pairs, and calculates a maximum time delay using a maximum distance between the microphones; and preprocessing the data acquired by the microphone by framing, windowing and the like.

In one embodiment, the electronic device may perform the detection of the presence of speech through a commonly used Voice Activity Detection (VAD) algorithm.

It should be noted that, as will be understood by those skilled in the art, in practical applications, other manners may be used to detect whether voice exists, which are not listed here.

Step 202: and acquiring the indication parameters of all the angles.

Step 203: and aiming at any angle, updating the indicating parameter of the angle according to the indicating parameter in the reference interval corresponding to the angle.

Step 204: and determining the sound source direction of the sound according to the updated indication parameters of the angles.

Compared with the prior art, the sound source positioning method provided by the embodiment can perform sound source positioning according to the reference time interval of the angle and/or the indication parameter in the reference angle interval, can improve the robustness of a positioning algorithm, reduce the deviation between the actual direction and the calculated sound source direction, reduce the occurrence of bad values, and provide powerful technical support for modules such as far-field speech enhancement in beam forming. In addition, due to the adoption of the sound source positioning method, other algorithms such as scoring and the like are not required to be combined, and the method has the characteristics of easiness in implementation and high-efficiency operation. In addition, before the indication parameters of each angle are calculated, whether voice exists or not is judged, and the waste of calculation resources caused by the follow-up processing of the data without voice is avoided.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A third embodiment of the present invention relates to a sound source localization apparatus, as shown in fig. 3, including: an acquisition module 301, an update module 302, and a determination module 303. The obtaining module 301 is configured to obtain an indication parameter of each angle, where the indication parameter of the angle is used to determine whether a direction corresponding to the angle is a sound source direction. The updating module is used for updating the indicating parameters of the angle according to the indicating parameters in the reference interval corresponding to the angle aiming at any angle; the reference interval corresponding to the angle comprises a predefined reference angle interval of the angle, and/or a predefined reference time interval of the angle, and the indication parameter in the reference interval corresponding to the angle comprises the indication parameter of the angle. The determining module 303 is configured to determine a sound source direction of the sound according to the updated indication parameters of the angles.

It should be understood that this embodiment is a system example corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.

A fourth embodiment of the present invention relates to an electronic apparatus, as shown in fig. 4, including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the sound source localization method according to the above embodiments.

The electronic device includes: one or more processors 401 and a memory 402, one processor 401 being exemplified in fig. 4. The processor 401 and the memory 402 may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example. Memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 401 executes various functional applications of the device and data processing by executing non-volatile software programs, instructions and modules stored in the memory 402, thereby implementing the sound source localization method described above.

The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 402 and, when executed by the one or more processors 401, perform the sound source localization method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A sound source localization method, comprising:

acquiring indication parameters of each angle, wherein the indication parameters of the angles are used for determining whether the direction corresponding to the angle is a sound source direction;

aiming at any angle, updating the indication parameter of the angle according to the indication parameter in the reference interval corresponding to the angle; the reference interval corresponding to the angle comprises a predefined reference angle interval of the angle and/or a predefined reference time interval of the angle, and the indication parameter in the reference interval corresponding to the angle comprises an indication parameter of the angle;

and determining the sound source direction of the sound according to the updated indication parameters of the angles.

2. The sound source localization method according to claim 1, wherein the reference interval includes a reference angle interval;

the updating the indication parameters of the angles according to the indication parameters of the angles in the reference interval corresponding to the angles specifically includes:

and accumulating or weighting and accumulating the indication parameters in the reference angle interval corresponding to the angle to obtain the indication parameters after the angle is updated.

3. The sound source localization method according to claim 2, wherein the reference interval further includes a reference time interval;

the step of performing accumulation or weighted accumulation calculation on the indication parameters of each angle in the reference angle interval corresponding to the angle to obtain the indication parameters after the angle update specifically includes:

accumulating or weighting and accumulating the indication parameters of each angle in the reference angle interval corresponding to the angle to obtain a space accumulated value of the current time of the angle;

and accumulating or weighting and accumulating the spatial accumulated value of the angle at each time in the reference time interval to obtain the updated indication parameter of the angle.

4. The sound source localization method according to claim 1, wherein the reference interval includes a reference time interval;

the updating the indication parameter of the angle according to the indication parameter in the reference interval corresponding to the angle specifically includes:

and accumulating or weighting and accumulating the indication parameters of the angle at each time in the reference time interval to obtain the updated indication parameters of the angle.

5. The sound source positioning method according to claim 4, wherein the reference interval further includes a reference angle interval;

the accumulating or weighted accumulating the indication parameters of the angle at each time in the reference time interval to obtain the updated indication parameters of the angle specifically includes:

accumulating or weighting and accumulating the indication parameters of the angle at each time in the reference time interval to obtain a time accumulated value of the angle;

and accumulating or weighting and accumulating the time accumulated value of each angle in the reference angle interval corresponding to the angle to obtain the updated indication parameter of the angle.

6. The sound source localization method according to any of claims 1 to 5, wherein the indication parameter is a GCC-PHAT value calculated based on a generalized cross-correlation-phase transformation GCC-PHAT algorithm or an SRP-PHAT value calculated based on a controllable steering response power-phase transformation SRP-PHAT algorithm.

7. The sound source localization method according to claim 6, wherein the indication parameter is an SRP-PHAT value, and the obtaining the indication parameter of each angle specifically includes:

determining the corresponding relation between the time difference of each microphone pair and the GCC-PHAT value;

carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference of each microphone pair;

and calculating the SRP-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle.

8. The sound source localization method according to claim 6, wherein the indication parameter is a GCC-PHAT value, and the obtaining the indication parameter for each angle specifically includes:

determining the corresponding relation between the time difference and the GCC-PHAT value;

carrying out interpolation processing on the GCC-PHAT value corresponding to the time difference;

and determining the GCC-PHAT value of each angle according to the corresponding relation between the time difference after interpolation and the GCC-PHAT value and the corresponding relation between the time difference after interpolation and the angle.

9. The sound source localization method according to claim 7 or 8, wherein the interpolating the GCC-PHAT values corresponding to the time difference specifically includes:

calculating the maximum time delay according to the maximum distance between each microphone pair;

and (4) carrying out interpolation processing on the GCC-PHAT value of the time difference interval [ -maximum time delay, maximum time delay ].

10. The sound source localization method according to any one of claims 1 to 9, wherein before the obtaining of the indication parameter for each angle, the sound source localization method further comprises:

it is determined that speech is currently present.

11. An electronic device, comprising: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a sound source localization method as claimed in any one of claims 1 to 10.

12. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the sound source localization method according to any one of claims 1 to 10.