CN106872945B

CN106872945B - Sound source positioning method and device and electronic equipment

Info

Publication number: CN106872945B
Application number: CN201710259736.4A
Authority: CN
Inventors: 徐荣强
Original assignee: Beijing Horizon Information Technology Co Ltd
Current assignee: Beijing Horizon Information Technology Co Ltd
Priority date: 2017-04-19
Filing date: 2017-04-19
Publication date: 2020-01-17
Anticipated expiration: 2037-04-19
Also published as: CN106872945A

Abstract

A sound source positioning method, a sound source positioning device and electronic equipment are disclosed. The method comprises the following steps: receiving a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source; determining a reference noise from noise components in the first and second input signals; extracting signal components in the first input signal and the second input signal respectively according to the reference noise; and determining the location of the signal source from signal components in the first input signal and the second input signal. Thus, accurate localization of the sound source can be achieved.

Description

Sound source positioning method and device and electronic equipment

Technical Field

The present application relates to the field of audio technology, and more particularly, to a sound source localization method, apparatus, electronic device, computer program product, and computer-readable storage medium.

Background

The application of the voice control far-field equipment has a plurality of problems, and particularly, the voice control in the interference environment is required to be high.

For example, in the process of voice control, if only a microphone in the remote controller is used for voice acquisition, since the remote controller is generally a single microphone, and there is a limitation when the single microphone processes an unsteady noise source, the unsteady noise source cannot be separated; or, if only the microphone on the controlled device is used for voice acquisition, although the controlled device is generally a microphone array, the microphone array can separate spatial noise, but if the noise source and the signal source (user) are in the same direction, the noise source and the signal source (user) cannot be separated at the same time, so that accurate positioning and ranging can be performed on the signal source.

Therefore, the existing sound source localization method has defects.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a sound source localization method, apparatus, electronic device, computer program product, and computer-readable storage medium, which may achieve accurate localization of a sound source.

According to an aspect of the present application, there is provided a sound source localization method including: receiving a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source; determining a reference noise from noise components in the first and second input signals; extracting signal components in the first input signal and the second input signal respectively according to the reference noise; and determining the location of the signal source from signal components in the first input signal and the second input signal.

According to another aspect of the present application, there is provided a sound source localization apparatus including: a signal receiving unit, configured to receive a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, where each of the first input signal and the second input signal includes a signal component from a signal source and a noise component from a noise source, and a distance between the first sound collection device and the signal source is smaller than a distance between the second sound collection device and the signal source; a reference determination unit for determining a reference noise from noise components in the first and second input signals; a component extraction unit for extracting signal components in the first input signal and the second input signal, respectively, according to the reference noise; and a position determining unit for determining the position of the signal source from the signal components in the first input signal and the second input signal.

According to another aspect of the present application, there is provided an electronic device including: a processor; a memory; and computer program instructions stored in the memory, which when executed by the processor, cause the processor to perform the sound source localization method described above.

According to another aspect of the present application, a computer program product is provided, comprising computer program instructions which, when executed by a processor, cause the processor to perform the sound source localization method described above.

According to another aspect of the present application, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the sound source localization method described above.

Compared with the prior art, with the sound source positioning method, the sound source positioning apparatus, the electronic device, the computer program product, and the computer-readable storage medium according to the embodiments of the present application, a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device may be received, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source; determining a reference noise from noise components in the first and second input signals; extracting signal components in the first input signal and the second input signal respectively according to the reference noise; and determining the location of the signal source from signal components in the first input signal and the second input signal. Therefore, it is possible to well separate the signal component and the noise component in the input signal and accurately determine the position of the sound source based thereon.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 illustrates a schematic diagram of an application scenario of a sound source localization operation according to an embodiment of the present application.

Fig. 2 illustrates a flow chart of a sound source localization method according to an embodiment of the present application.

Fig. 3 illustrates a flow chart of the reference noise determination step according to an embodiment of the present application.

FIG. 4 illustrates a schematic diagram of voice activity detection according to an embodiment of the present application.

Fig. 5 illustrates a flow chart of signal component extraction steps according to an embodiment of the application.

Fig. 6 illustrates a schematic diagram of an adaptive filter according to an embodiment of the application.

Fig. 7 illustrates a flow chart of signal source localization steps according to an embodiment of the present application.

Fig. 8 illustrates a schematic diagram of microphone array orientation according to an embodiment of the application.

Fig. 9 illustrates a block diagram of a sound source localization apparatus according to an embodiment of the present application.

FIG. 10 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

As described above, the existing sound source localization method has drawbacks. For example, the independent collection of voice commands by only a single microphone in the remote controller and a microphone array in the controlled device cannot eliminate unsteady noises such as television sound, sound, human voice, etc. well, thereby making accurate location ranging for users.

In view of the technical problem, the present application provides a sound source positioning method, apparatus, electronic device, computer program product, and computer readable storage medium, which can effectively integrate a microphone in a remote controller and a microphone in a controlled device into a complete set of microphone enhancement system, can well process unsteady noise signals, and can obtain the distance between a speaking user and the remote controller and the controlled device.

It should be noted that the above basic concept of the present application can be applied not only to remote control applications of a remote controller and a controlled device, but also to other system applications as long as two or more devices have a sound collection device. For example, the present application is equally applicable to devices where two devices do not have a master and slave relationship, but rather are functionally independent. In addition, the above basic concept of the present application can be applied not only to sound source localization scenarios of voices but also to sound source localization scenarios of other various sound sources such as animals, robots, and the like.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary System

As shown in fig. 1, an application scenario for a sound source localization operation includes a first device 100, a second device 200, and a sound source 300.

The first device 100 may be any type of electronic device comprising first sound collecting means. The second device 200 may be any type of electronic device, which may be of the same or different type as the first device 100, and which comprises second sound collecting means.

For example, the sound collection device may be used to collect audio signals of a sound source, including a signal source or a noise source, which may be a separate microphone or an array of microphones. For example, the microphone may be an omni-directional microphone and/or a directional microphone. The sensitivity of the omnidirectional microphone is basically the same for sounds from different angles, the head of the omnidirectional microphone is designed by adopting a pressure sensing principle, and the diaphragm only receives pressure from the outside. The directional microphone is mainly designed by adopting a pressure gradient principle, and the vibrating diaphragm receives pressure on the front side and the back side through a small hole in the back of the head cavity, so that the vibrating diaphragm is subjected to different pressures in different directions, and the microphone has directivity. For example, the microphone array may be a system composed of a certain number of microphones for sampling and processing spatial characteristics of a sound field, and may include a plurality of microphones MIC1 to MICn whose respective pickup areas are not identical, where n is a natural number equal to or greater than 2. For example, depending on the relative positional relationship of the respective microphones, a microphone array may be classified into: the centers of the array elements of the linear array are positioned on the same straight line; the center of the array elements of the planar array is distributed on a plane; and the spatial array, the center of the array element of which is distributed in the three-dimensional space.

The acoustic source 300 may be any type of acoustic source that may include a signal source that emits a signal component of interest and a noise source that emits a noise component of interest to be cancelled. For example, the signal source may be a live signal source and an inanimate signal. For example, a source of living signals may include humans and animals, etc.; while inanimate signal sources may include robots, televisions, stereos, etc.

It should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited thereto. Rather, embodiments of the present application may be applied to any scenario where it may be applicable. For example, the device may comprise two or more, and the sound source may be one or more.

Exemplary method

In the following, a sound source localization method according to an embodiment of the present application is described with reference to fig. 2 in conjunction with the application scenario of fig. 1.

As shown in fig. 2, a sound source localization method according to an embodiment of the present application may include:

in step S110, a first input signal picked up by a first sound pickup device on a first apparatus and a second input signal picked up by a second sound pickup device on a second apparatus are received, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source.

For example, to locate the sound source 300, input signals collected by sound collection devices on both devices 100 and 200 may be received for subsequent processing.

In one example, in order to obtain an optimal separation effect of the signal component and the noise component, a first distance from the signal source to the first sound collection device may be different from a second distance from the signal source to the second sound collection device. For example, the first distance may be less than the second distance.

Because the distances from the signal source to the two sound collection devices are different, the input signal collected by the first sound collection device is also different from the input signal collected by the second sound collection device.

For example, since the signal source is closer to the first sound collection device and farther from the second sound collection device, the signal component in the input signal collected by the first sound collection device is larger than the signal component in the input signal collected by the second sound collection device. In addition, since the noise source is a background noise source having substantially the same distance from the first sound collection device and the second sound collection device, the noise component in the input signal collected by the first sound collection device is substantially the same as the noise component in the input signal collected by the second sound collection device.

With the above characteristics, it is possible to perform signal separation and processing on two input signals to extract a signal component and a noise component therein, and further to use for sound source localization.

In one example, to prevent a first sound capture device on a first device and a second sound capture device on a second device from potentially having different sound conversion capabilities to offset signal component differences due to different distances, a parameter calibration may first be performed on the first sound capture device on the first device and the second sound capture device on the second device.

Therefore, as shown in fig. 2, before step S110, the sound source localization method according to the embodiment of the present application may further include:

in step S105, the first sound collection device and the second sound collection device are calibrated so that they have the same sound conversion capability.

The parameters of the two sound collection devices can be adjusted so that they maintain the same conversion capacity.

In the following, the sound source localization method will be explained in a specific example, in which it is assumed that the first device is a remote controller equipped with a microphone or a microphone array, the second device is a controlled device (e.g., a television, etc.) corresponding thereto equipped with a microphone or a microphone array, and the sound source is a user for issuing a voice control command.

For example, the microphone system on the remote control and the microphone system on the television may first be parametrically tuned so that both retain the same conversion capability. The aim is to maintain the same amplification gain and delay compensation for the remote control and the input and output of the television.

For example, if the microphone system on the remote control has a higher amplification characteristic and the microphone system on the television has a lower amplification characteristic, such that the same input signal is received by the remote control by 3dB higher than it is received by the television, then the two may be gain compensated to ensure that the amplification factor of the two is the same, thereby ensuring the accuracy of the subsequent algorithm.

Next, a voice control command issued by the user (e.g., turn on the television, tune the channel, etc.) may be received using the two microphone systems. Since the two microphone systems are calibrated, the difference of the distances between the two microphone systems and the user can be reflected accurately by the acquired input signals of the two microphone systems.

In step S120, a reference noise is determined according to noise components in the first input signal and the second input signal.

Next, the input signals collected by the sound collection devices on both devices may be analyzed to determine a reference noise for signal-to-noise separation.

As shown in fig. 3, the step S120 may include:

in sub-step S121, a separation operation is performed on the first input signal and the second input signal to obtain a noisy signal segment and a pure noise segment in the first input signal, and a noisy signal segment and a pure noise segment in the second input signal, respectively.

For example, when a user is voice controlled, Voice Activity Detection (VAD) techniques may be used to separate noisy speech segments from pure noise segments in each input signal.

Voice activity detection, also known as voice endpoint detection, voice boundary detection, refers to detecting the presence or absence of voice in a noise environment, and is generally used in voice processing systems such as voice coding, voice enhancement, and the like, and plays roles in reducing voice coding rate, saving communication bandwidth, reducing energy consumption of mobile devices, improving recognition rate, and the like. For example, a representative VAD method is G.729Annex B of ITU-T.

As shown in fig. 4, the first input signal M1 can be separated into noisy speech segments M1 by using VAD segmentation_S+NAnd noise section M1_N(ii) a Similarly, the second input signal M2 may be separated into noisy speech segments M2_S+NAnd noise section M2_N. That is, speech segments contain speech and noise, while noise segments contain only noise.

It can be seen that the noisy speech segment M1 in the first input signal M1 is due to the fact that the user tends to hold the remote control in normal use with his mouth closer to the remote control and further away from the television_S+NIs larger than the noisy speech segments M2 in the second input signal M2_S+NWhile the noise section M1 in the first input signal M1_NIs equal to or substantially equal to the noise section M2 in the second input signal M2_NOf the amplitude of (c).

In sub-step S122, the reference noise is determined at least from pure noise segments in the second input signal.

For example, as the television is farther from the source user, it is less affected by the user's speech, i.e., the noise segment M2 included therein_NOften closer to the true background noise, so the noise segment M2 can be directly used_NAs the reference noise.

In addition, since the noise in M1 and M2 is broadly the same in the noise section, a noise reference can be generated based on either.

Alternatively, the noisy speech segment M1 in the first input signal M1 may be first detected_S+NWith noisy speech segments M2 in a second input signal M2_S+NAligned in the time domain, and then the noise component therein is calculated by subtracting the two, and the resultant is taken as the reference noise.

In step S130, signal components in the first input signal and the second input signal are respectively extracted according to the reference noise.

The resulting reference noise may be used to perform a separation operation on the first input signal and the second input signal, respectively, to determine signal components therein.

For example, the above separation operation may be implemented using an adaptive filter.

The adaptive filter is a digital filter capable of performing digital signal processing with automatic performance adjustment according to an input signal. For some applications, it is desirable to use adaptive coefficients for processing, since the parameters that are required to operate, such as the characteristics of some noise signals, are not known in advance. In this case, an adaptive filter is typically used that uses feedback to adjust the filter coefficients as well as the frequency response. In general, the adaptive process involves an algorithm that uses a cost function to determine how to alter the filter coefficients, thereby reducing the cost of the next iteration process. The cost function is a criterion for the optimum performance of the filter, such as the ability to reduce noise components in the input signal.

As shown in fig. 5, the step S130 may include:

in sub-step S131, the reference noise is input to an adaptive filter.

In sub-step S132, parameters of the adaptive filter are adjusted to extract signal components in the first input signal and the second input signal from noisy signal segments in the first input signal and the second input signal, respectively.

As shown in fig. 6, for example, the reference noise obtained in sub-step S122 may be provided as an input to an adaptive filter. The noisy speech segment M1 in the first input signal M1 may be substituted with a reference noise combining adaptive filter_S+NAnd noisy speech segments M2 in the second input signal M2_S+NTo extract the speech component M1 in M1 and M2_SAnd M2_S。

In step S140, the position of the signal source is determined according to the signal components in the first input signal and the second input signal.

For example, the signal source may be located based on the resulting signal components.

As shown in fig. 7, the step S140 may include:

in sub-step S141, a distance difference between a first distance from the signal source to the first sound collection device and a second distance from the signal source to the second sound collection device is determined according to a phase difference between signal components in the first input signal and the second input signal.

For example, the substep S141 may include: performing a cross-correlation analysis on signal components in the first input signal and the second input signal to determine a phase difference therebetween; determining a delay difference between the two according to the phase difference; and calculating the distance difference from the delay difference.

For example, the speech component M1 in the first input signal M1 and the second input signal M2 may be compared_SAnd M2_SAnd carrying out generalized cross-correlation analysis, and analyzing the phase difference between the two to obtain the delay difference delta t. From this, it can be known that the relationship between the distance L1 from the sound source user to the remote controller and the distance L2 from the sound source user to the television is:

Δt*340m/s＝L1-L2。

in sub-step S142, a multiple relationship between the first distance and the second distance is determined from a difference in amplitude between signal components in the first input signal and the second input signal.

For example, the substep S142 may comprise: calculating a magnitude difference between signal components in the first input signal and the second input signal; and calculating the multiple relationship according to the amplitude difference and the distance amplitude relationship.

For example, the speech component M1 in the first input signal M1 and the second input signal M2 may be compared_SAnd M2_SAnd carrying out short-time power spectrum calculation, and analyzing the amplitude attenuation between the two to obtain the amplitude difference delta p. Thus, according to the principle of attenuation of sound waves with distance, the multiple relationship between the sum of distances L1 can be determined. Specifically, under normal conditions, the distance between the sound wave and the microphone is doubled and the energy is attenuated by 6dB, so that the difference in the distances can be determined from the energy difference. In other words, from the energy difference Δ p, the relationship between the distance L1 and the distance L2 can be calculated as:

L1＝k*L2。

in sub-step S143, the first distance and the second distance are determined according to the distance difference and the multiple relation.

By combining the above two ways, the distance L1 from the sound source user to the remote controller and the distance L2 from the sound source user to the television can be obtained.

However, this only determines the distance of the sound source to the first and second devices, i.e. only the location range of the sound source, and not the complete location of the sound source. To this end, it is also possible that the first sound collection means and/or the second sound collection means comprise a microphone array for determining an angle of the sound source relative to the microphone array.

Therefore, as shown in fig. 7, the step S140 may further include:

in step S144, in response to the first sound collection device and/or the second sound collection device including a microphone array, determining a relative angle of the signal source and an array element center of the microphone array using the microphone array.

For the sake of brevity, a two-microphone array of two microphones is described as an example.

As shown in fig. 8, the microphone array includes two microphones, which are located at positions a and B, respectively.

For example, the split input signals received by each microphone in the microphone array may be determined, the correlation between the respective signals may be calculated, and the time of the plane wave from position C to position a, i.e. the delay Δ d between the respective microphones, may be deduced as follows:

CA＝Δd*340m/s。

since the separation AB between the two microphones is known, the relative angle of the signal source and the microphone array can be found as follows:

further, as shown in fig. 7, the step S140 may further include:

in step S145, the position of the sound source is determined by integrating the first distance, the second distance, and the relative angle.

For example, from the results of the orientation of the microphone array, in combination with L1 and L2, the precise angle and position of the sound source user relative to the microphone array can be determined.

Since in this particular example the second device is a remotely controlled device (e.g. a television) on which the microphone array may be inherently provided for receiving voice instructions, the microphone array inherent on the second device may be multiplexed for directional operation for cost reasons. In addition, since the position of the second device is often fixed, i.e. the position coordinates are known, the position coordinates of the sound source can be determined directly, knowing the distance and angle of the sound source relative to the second device.

In one example, the sound source localization method according to the embodiment of the present application may be implemented on either or both of the first device 100 and the second device 200. At this time, the first device 100 and the second device 200 have a communication connection with each other, and can receive the input signal collected by the sound collection device on the other device and perform joint processing with the input signal collected by the sound collection device on the present device to locate the sound source.

It should be noted that, although the specific example is described with the first device as a remote controller and the second device as an electronic device as an example, the present application is not limited to this. For example, the second device may be other devices located far away from the user that require voice control, such as a refrigerator, an air conditioner, etc., while the first device may be other devices that are typically located near the user under normal usage conditions, such as a portable device (a cell phone, a smart bracelet, a smart eye, etc.), or even a stationary device that is temporarily located near the user (e.g., a smart sofa on which the user sits, etc.).

In another example, the sound source localization method according to the embodiment of the present application may also be implemented on a separate sound source localization device other than the first device 100 and the second device 200. At this time, the sound source positioning device is in communication connection with the first device 100 and the second device 200, respectively, and can receive input signals collected by sound collecting devices on the two devices, and perform joint processing on the two input signals to position the sound source.

It can be seen that with the sound source localization method according to the embodiment of the present application, a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device may be received, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source; determining a reference noise from noise components in the first and second input signals; extracting signal components in the first input signal and the second input signal respectively according to the reference noise; and determining the location of the signal source from signal components in the first input signal and the second input signal. Therefore, it is possible to well separate the signal component and the noise component in the input signal and further accurately determine the position of the sound source based thereon.

Specifically, the sound source localization method according to the embodiment of the present application has the following benefits:

1) unsteady noise can be better suppressed, and separation of sound source signals and noise is realized;

2) in combination with the microphone array of the device, the sound source can be accurately oriented and ranged.

Exemplary devices

Next, a sound source localization apparatus according to an embodiment of the present application is described with reference to fig. 9.

As shown in fig. 9, the sound source localization apparatus 400 according to an embodiment of the present application may include: a signal receiving unit 410, configured to receive a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, where each of the first input signal and the second input signal includes a signal component from a signal source and a noise component from a noise source, and a distance between the first sound collection device and the signal source is smaller than a distance between the second sound collection device and the signal source; a reference determination unit 420 for determining a reference noise from noise components in the first input signal and the second input signal; a component extracting unit 430 for extracting signal components in the first input signal and the second input signal, respectively, according to the reference noise; and a position determining unit 440 for determining a position of the signal source from signal components in the first input signal and the second input signal.

In one example, a first distance of the signal source to the first sound collection device may be less than a second distance of the signal source to the second sound collection device.

In one example, the sound source localization apparatus 400 may further include: the device calibration unit is used for calibrating the first sound acquisition device and the second sound acquisition device before receiving a first input signal acquired by a first sound acquisition device on first equipment and a second input signal acquired by a second sound acquisition device on second equipment respectively so as to enable the first sound acquisition device and the second sound acquisition device to have the same sound conversion capability.

In one example, the reference determination unit 420 may perform a separation operation on the first input signal and the second input signal to obtain a noisy signal segment and a pure noise segment in the first input signal and a noisy signal segment and a pure noise segment in the second input signal, respectively; and determining the reference noise from at least a noise-only segment in the second input signal.

In one example, component extraction unit 430 may input the reference noise to an adaptive filter; and adjusting parameters of the adaptive filter to extract signal components in the first input signal and the second input signal from noisy signal segments in the first input signal and the second input signal, respectively.

In one example, the position determination unit 440 may determine a distance difference between a first distance from the signal source to the first sound collection device and a second distance from the signal source to the second sound collection device according to a phase difference between signal components in the first input signal and the second input signal; determining a multiple relationship between the first distance and the second distance from a difference in amplitude between signal components in the first input signal and the second input signal; and determining the first distance and the second distance according to the distance difference and the multiple relation.

In one example, position determination unit 440 may perform a cross-correlation analysis on signal components in the first input signal and the second input signal to determine a phase difference between the two; determining a delay difference between the two according to the phase difference; and calculating the distance difference from the delay difference.

In one example, the position determination unit 440 may calculate an amplitude difference between signal components in the first input signal and the second input signal; and calculating the multiple relationship from the amplitude difference and distance amplitude relationship.

In one example, the position determining unit 440 may also determine a relative angle of the signal source to an element center of a microphone array using the microphone array in response to the first sound collection device and/or the second sound collection device including the microphone array.

In one example, the position determining unit 440 may further determine the position of the sound source by integrating the first distance, the second distance, and the relative angle.

The specific functions and operations of the respective units and modules in the sound source localization apparatus 400 described above have been described in detail in the sound source localization method described above with reference to fig. 1 to 8, and thus, a repetitive description thereof will be omitted.

As described above, the sound source localization apparatus 400 according to the embodiment of the present application can be implemented in a sound source localization device, which can be either one or both of the first device 100 and the second device 200 as shown in fig. 1, or a stand-alone device independent therefrom.

In one example, the sound source localization apparatus 400 according to the embodiment of the present application may be integrated into the sound source localization device as a software module and/or a hardware module. For example, the sound source localization apparatus 400 may be a software module in the operating system of the sound source localization device, or may be an application developed for the sound source localization device; of course, the sound source localization arrangement 400 may equally well be one of many hardware modules of the sound source localization device.

Alternatively, in another example, the sound source localization apparatus 400 and the sound source localization device may be separate devices, and the sound source localization apparatus 400 may be connected to the sound source localization device through a wired and/or wireless network and transmit the mutual information according to an agreed data format.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 10. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.

As shown in fig. 10, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by the processor 11 to implement the sound source localization methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 13 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 13 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.

The input device 13 may also include, for example, a keyboard, a mouse, and the like.

The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 10, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and devices, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the sound source localization method according to various embodiments of the present application described in the above-mentioned "exemplary methods" section of the present description.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in the sound source localization method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A sound source localization method, comprising:

receiving a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, each of the first input signal and the second input signal including a signal component from a signal source and a noise component from a noise source, the first input signal collected by the first sound collection device being different from the second input signal collected by the second sound collection device;

determining a reference noise from noise components in the first and second input signals;

extracting signal components in the first input signal and the second input signal respectively according to the reference noise; and

determining a position of the signal source from signal components in the first input signal and the second input signal,

wherein determining the location of the signal source from the signal components in the first input signal and the second input signal comprises:

determining a distance difference between a first distance from the signal source to the first sound collection device and a second distance from the signal source to the second sound collection device according to a phase difference between signal components in the first input signal and the second input signal;

determining a multiple relationship between the first distance and the second distance from a difference in amplitude between signal components in the first input signal and the second input signal; and

determining the first distance and the second distance according to the distance difference and the multiple relation.

2. The method of claim 1, wherein a first distance from the signal source to the first sound collection device is less than a second distance from the signal source to the second sound collection device.

3. The method of claim 1, wherein prior to receiving the first input signal picked up by the first sound pickup device on the first device and the second input signal picked up by the second sound pickup device on the second device, respectively, further comprising:

the first sound collection device and the second sound collection device are calibrated so that they have the same sound conversion capability.

4. The method of claim 1, wherein determining reference noise from noise components in the first and second input signals comprises:

performing separation operation on the first input signal and the second input signal to obtain a noise-containing signal segment and a pure noise segment in the first input signal and a noise-containing signal segment and a pure noise segment in the second input signal respectively; and

determining the reference noise from at least a pure noise segment in the second input signal.

5. The method of claim 4, wherein extracting signal components in the first and second input signals, respectively, from the reference noise comprises:

inputting the reference noise into an adaptive filter; and

adjusting parameters of the adaptive filter to extract signal components in the first input signal and the second input signal from noisy signal segments in the first input signal and the second input signal, respectively.

6. The method of claim 1, wherein determining a distance difference between a first distance from the signal source to the first sound collection device and a second distance from the signal source to the second sound collection device based on a phase difference between signal components in the first input signal and the second input signal comprises:

performing a cross-correlation analysis on signal components in the first input signal and the second input signal to determine a phase difference therebetween;

determining a delay difference between the two according to the phase difference; and

the distance difference is calculated from the delay difference.

7. The method of claim 1, wherein determining the multiple relationship between the first distance and the second distance from the difference in amplitude between signal components in the first input signal and the second input signal comprises:

calculating a magnitude difference between signal components in the first input signal and the second input signal; and

and calculating the multiple relation according to the amplitude difference and the distance amplitude relation.

8. The method of claim 1, wherein determining the location of the signal source from the signal components in the first input signal and the second input signal further comprises:

in response to the first sound collection device and/or the second sound collection device comprising a microphone array, determining a relative angle of the signal source to an array element center of the microphone array using the microphone array.

9. The method of claim 8, wherein determining the location of the signal source from the signal components in the first input signal and the second input signal further comprises:

determining a position of the sound source by integrating the first distance, the second distance, and the relative angle.

10. A sound source localization apparatus comprising:

a signal receiving unit, configured to receive a first input signal collected by a first sound collection device on a first device and a second input signal collected by a second sound collection device on a second device, respectively, where each of the first input signal and the second input signal includes a signal component from a signal source and a noise component from a noise source, the first input signal collected by the first sound collection device is different from the second input signal collected by the second sound collection device, and a distance between the first sound collection device and the signal source is smaller than a distance between the second sound collection device and the signal source;

a reference determination unit for determining a reference noise from noise components in the first and second input signals;

a component extraction unit for extracting signal components in the first input signal and the second input signal, respectively, according to the reference noise; and

a position determination unit for determining a position of the signal source from signal components in the first input signal and the second input signal,

11. An electronic device, comprising:

a processor;

a memory; and

computer program instructions stored in the memory, which, when executed by the processor, cause the processor to perform the method of any of claims 1-9.

12. A computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-9.