WO2022121800A1

WO2022121800A1 - Sound source positioning method and apparatus, and electronic device

Info

Publication number: WO2022121800A1
Application number: PCT/CN2021/135400
Authority: WO
Inventors: 张志飞; 徐杨飞
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2020-12-10
Filing date: 2021-12-03
Publication date: 2022-06-16
Also published as: CN112946576B; CN112946576A

Abstract

A sound source positioning method and apparatus, and an electronic device (403). The sound source positioning method comprises: respectively collecting, by using array units in a microphone array (401), signal feature parameters of sounds generated from a set of sound sources to be positioned, so as to obtain signal feature parameter vectors (101); according to the signal feature parameter vectors, determining orientation distribution vectors of sound sources in the set of sound sources that are distributed in various orientations (102); and according to the orientation distribution vectors, determining the orientations of the sound sources in the set of sound sources to be positioned (103). More accurate sound source positioning can be realized.

Description

Sound source localization method, device and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed on December 10, 2020 with the application number 202011462430.7 and the invention titled "Sound Source Localization Method, Apparatus and Electronic Device", the full text of which is incorporated herein by reference middle.

technical field

The present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.

Background technique

With the development of technology, products with voice capture and processing functions are welcomed by more and more users. When these voice products process the collected voice information, in order to achieve a better voice processing effect, it is usually necessary to locate the sound source first.

However, the current sound source localization algorithm usually has poor localization accuracy when the size of the microphone array used is limited, and the localization accuracy decreases significantly with the increase of reverberation, making it difficult to locate and track multiple sound sources at the same time.

SUMMARY OF THE INVENTION

This disclosure section is provided to introduce concepts in a simplified form that are described in detail in the detailed description section that follows. This disclosure section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

The embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device, which can achieve high accuracy of sound source localization in an automated manner.

In a first aspect, an embodiment of the present disclosure provides a sound source localization method, the method includes: using each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized, to obtain a signal characteristic parameter vector; The azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth is determined according to the signal characteristic parameter vector; the azimuth of each sound source in the to-be-located sound source set is determined according to the azimuth distribution vector.

In a second aspect, an embodiment of the present disclosure provides a sound source localization device, including: a collection unit configured to use each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized to obtain the signal characteristic A parameter vector; a vector determination unit, used for determining the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector; The orientation of each sound source.

In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs The one or more processors executes such that the one or more processors implement the sound source localization method as described in the first aspect or the second aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect or the second aspect.

Description of drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

1 is a flowchart of one embodiment of sound source localization according to the present disclosure;

FIG. 2 is a flowchart of another embodiment of sound source localization according to the present disclosure;

3 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure;

4 is an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied;

FIG. 5 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

Please refer to FIG. 1 , which shows a flow of an embodiment of a sound source localization method according to the present disclosure. The sound source localization method is applied to electronic devices such as desktop devices, servers or embedded devices. As shown in Figure 1, the sound source localization method includes the following steps:

Step 101 , each array unit in the microphone array is used to collect signal characteristic parameters of the sound emitted by the sound source set to be located, respectively, to obtain a signal characteristic parameter vector.

A microphone array can be a device consisting of a certain number of microphone arrangements that can be used to sample and process the spatial characteristics of the sound field. Generally, each array unit in the microphone array, that is, a single microphone, can individually collect sound field signals. The set of sound sources to be located is one or more sound sources that need to be located. These sound sources may be human voices or sounds emitted by other things. These sound sources may emit sounds individually or together.

In this embodiment, when the sound source set to be located emits sound, each array unit in the microphone array can be used to sample the sound respectively, and a single array unit can obtain the corresponding signal characteristic parameters, and all the microphones in the entire microphone array can collect the sound. The intensity signal of , can form the signal characteristic parameter vector.

Step 102: Determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector.

Based on the signal characteristic parameter vector obtained in step 102, the signal characteristic parameter vector can be calculated and processed, so that the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth can be obtained.

Step 103: Determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.

In this embodiment, according to the value of the component corresponding to each azimuth in the output azimuth distribution vector, it can be determined which azimuths have sound sources. For example, each component can be a value of 0 or 1, and 1 and 0 can be used to indicate whether there is a sound source in the relevant azimuth, respectively. When the corresponding component of the output azimuth distribution vector in a certain azimuth is 1, it can be determined that there is a sound source in the azimuth, otherwise there is no sound source. In this way, whether there is a sound source in each direction can be sequentially determined, so that the orientation of each sound source in the set of sound sources to be located can be determined.

In some optional implementations of this embodiment, the above step 102 may include: inputting the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model The source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the distribution of the sound source in the corresponding sound source set in each azimuth. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters of . In this implementation manner, in the above sound source localization model, the input end corresponds to the signal characteristic parameters of the sound emitted by each array unit in the microphone array when the sound source set emits sound. The output end of the sound source localization model may be an azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth of the corresponding sound source set. Usually, the space can be divided with the microphone as the center to obtain multiple orientations. For example, the space can be divided in two dimensions, east-west and north-south, or the space can be divided in three dimensions: east-west, north-south, and up and down. The granularity of division can also be set according to actual needs. Taking the division of space in the east-west and north-south dimensions as an example, the space can be divided with each orientation occupying 5 degrees of the plane. For example, an area from due east to 5 degrees south by east can be used as an azimuth, an area from 5 degrees south by east to 10 degrees south by east can be classified as an azimuth, and so on.

In some optional implementations of this embodiment, the above-mentioned method further includes a sound source localization model training step, and the sound source localization model training step includes: a sample sound source set formed by the distribution parameters of the sound sources distributed in various azimuths. The azimuth distribution vector is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model, and the model is trained. , to obtain the sound source localization model.

In some optional implementations of this embodiment, the distribution parameter is a distribution probability, and the above step 103 may include: determining the sound to be localized according to the azimuth distribution vector output from the sound source localization model The azimuth of each sound source in the source set, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is the to-be-located sound in each azimuth according to the comparison result For the sound sources in the source set, the number of the sound sources in the to-be-located sound source set and the orientation of each sound source are obtained. In this implementation manner, the distribution parameter may be a distribution probability that continuously takes values between 0 and 1, for example, may be 0.1, 0.3, 0.7, and so on. The distribution probability represents the probability that a sound source exists in the corresponding azimuth. When determining the direction, the probability corresponding to each direction can be compared with the corresponding preset threshold, so as to determine whether there is a sound source in the corresponding direction according to the comparison result. Generally, when the component in the azimuth is greater than the preset threshold, it can be considered that there is a sound source in this direction, otherwise, it is considered that there is no sound source. For each azimuth in turn, it is possible to obtain whether there is a sound source in each azimuth, and then obtain the number of sound sources in the set of sound sources to be located and the azimuth where each sound source is located.

In some optional implementations of this embodiment, the sound source localization model is a neural network model. Specifically, the neural network structure in the neural network model may include one or more of the following: CNN, LSTM, Linear. In this implementation, since the neural network model is used for data, the mapping relationship between the sound source azimuth and the signal characteristic parameters collected by the corresponding microphone can be more accurately described, so that the final identified sound source azimuth is more accurate.

In the embodiments of the present disclosure, the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth can be determined by the signal characteristic parameters of the sound sources on each microphone, and then the azimuth of each sound source in the sound source set to be located can be determined , so that accurate sound source localization can be achieved.

Please continue to refer to FIG. 2 , which shows the flow of still another embodiment of the sound source localization method according to the present disclosure. The sound source localization method is applied to a mobile terminal. As shown in Figure 2, the sound source localization method includes the following steps:

Step 201 , each array unit in the microphone array is used to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.

In this embodiment, the electronic device may use each array unit in the microphone array to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.

Step 202: Extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.

In this embodiment, based on the initial signal characteristic parameters extracted in step 201, the electronic device can extract coherent-to-diffuse power ratio (CDR, coherent-to-diffuse power ratio) characteristic parameters from the initial signal characteristic parameters, and obtain each component as The signal eigenparameter vector of the coherent to scattered signal energy ratio eigenparameters.

Step 203: Input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the sound source set used when the sound is emitted. The mapping relationship between the signal characteristic parameter vector obtained by each array unit in the microphone array separately collecting the signal characteristic parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound source distribution in each azimuth .

In this embodiment, for the processing of step 203, reference may be made to step 102 in the corresponding embodiment of FIG. 1 . The sound source localization model in this embodiment describes the mapping relationship between the extracted CDR feature parameter vector and the azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth in the sound source set. The corresponding feature parameter vector is also used when training this model.

Step 204: Determine, according to the azimuth distribution vector, the azimuth of each sound source in the to-be-located sound source set.

In this embodiment, for the specific processing of step 204, reference may be made to step 103 of the embodiment corresponding to FIG. 1 , and details are not repeated here.

In some optional implementations of this embodiment, before step 202, the method further includes: performing time-frequency transformation on the initial signal characteristic parameter. Through time-frequency-frequency conversion, the collected time-domain signal can be transformed into a frequency-domain signal, which is more convenient for subsequent signal feature processing.

In this embodiment, the extracted signal feature is a CDR feature, and the sound source position information is obtained based on the model corresponding to the CDR feature, which can further improve the robustness of high-reverberation scene positioning and reduce the influence of interference on the positioning result. .

Further referring to FIG. 3 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a sound source localization apparatus. The apparatus embodiment shown in FIG. 3 corresponds to the method embodiment shown in FIG. 1 . Specifically, the device can be applied to various electronic devices.

As shown in FIG. 3 , the sound source localization apparatus of this embodiment includes: a collection unit 301 , a vector determination unit 302 , and an orientation determination unit 303 . Among them, the acquisition unit 301 is used to use each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted by the sound source set to be located, and obtain the signal characteristic parameter vector; the vector determination unit 302 is used to determine according to the signal characteristic parameter vector. The azimuth distribution vector of the sound source distribution in each azimuth in the sound source set; the azimuth determination unit 303 is configured to determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.

In this embodiment, the specific processing of the acquisition unit 301 , the vector determination unit 302 , and the orientation determination unit 303 of the sound source localization device and the technical effects brought about by them may refer to

steps

101 , 102 and 102 in the corresponding embodiment of FIG. 1 , respectively. The relevant description of step 103 will not be repeated here.

In some optional implementations of this embodiment, the vector determination unit is further configured to: input the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the The sound source localization model is used to describe the signal feature parameter vector obtained by using each array unit in the microphone array to collect the signal feature parameters of the emitted sound when the sound source set emits sound and the corresponding sound source set. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters on the azimuth.

In some optional implementations of this embodiment, the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is configured to: set the distribution parameters of the sample sound sources to distribute the sound sources in various azimuths The formed azimuth distribution vector is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model. The model is trained to obtain the sound source localization model.

In some optional implementations of this embodiment, the collection unit includes: a collection subunit, which uses each array unit in the microphone array to separately collect initial signal characteristic parameters of the sound emitted by the sound source set to be located; the extraction subunit, It is used to extract the coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, and obtain the signal characteristic parameter vector whose components are the coherent and scattered signal energy ratio characteristic parameters.

In some optional implementation manners of this embodiment, the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.

In some optional implementations of this embodiment, the distribution parameter is a distribution probability, and the orientation determining unit is further configured to: for each component in the output orientation distribution vector, associate the component value with the corresponding Preset thresholds are compared; according to the comparison result, it is determined whether there is a sound source in the set of sound sources to be located in each orientation, and the number of sound sources in the set of sound sources to be located and the orientation of each sound source are obtained.

In some optional implementations of this embodiment, the sound source localization model is a neural network model.

Please refer to FIG. 4 , which illustrates an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied.

As shown in FIG. 4 , the system architecture may include a microphone array 401 , a transmission medium 402 , and an electronic device 403 . The transmission medium 402 is a medium used for data transmission between the microphone array 401 and the electronic device 403 . The transmission medium 402 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc., and may also be a USB transmission line.

The microphone array 401 can interact with the terminal device 403 through the transmission medium 402 to receive or send signals and the like.

The electronic device 401 can be various electronic devices with signal and data processing, and can be terminal devices, such as smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Video Expert Compression Standard Audio Layer 4) Player, etc., it can also be a server device.

It should be noted that the sound source localization method provided in this embodiment may be executed by the electronic device 403 , and correspondingly, the sound source localization apparatus may be provided in the electronic device 403 .

It should be understood that the number of microphone arrays, networks and electronic devices in Figure 4 is merely illustrative. There may be any number of microphone arrays, transmission media, and electronics depending on implementation needs.

Referring next to FIG. 5 , it shows a schematic structural diagram of an electronic device (eg, the electronic device in FIG. 4 ) suitable for implementing an embodiment of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, Mobile terminals such as in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, etc., may also include server equipment. The electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 5 , the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into a random access memory according to a program stored in a read only memory (ROM) 502 or from a storage device 508 The program in the (RAM) 503 executes various appropriate operations and processes. In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504 .

Typically, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 . Communication means 509 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 5 shows an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509, or from the storage device 508, or from the ROM 502. When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: uses each array unit in the microphone array to separately collect the sound emitted by the sound source set to be located. The signal characteristic parameters are obtained, and the signal characteristic parameter vector is obtained; the signal characteristic parameter vector is input into the sound source localization model, and the sound source localization model is used to describe the use of each array unit in the microphone array when the sound source set emits sound. The mapping relationship between the signal feature parameter vector obtained by separately collecting the signal feature parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound source distribution in each azimuth; according to the sound source localization The azimuth distribution vector output in the model determines the azimuth of each sound source in the set of sound sources to be located. .

Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Python, Java, Smalltalk, C++, or a combination thereof , as well as conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances. For example, the acquisition unit can also be described as "using each array unit in the microphone array to separately collect the sound signals from the sound source set to be located. eigenparameters to obtain the unit of the signal eigenparameter vector".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, a method for localizing a sound source is provided, which includes: using each array unit in a microphone array to collect signal characteristic parameters of sounds emitted by a set of sound sources to be localized, respectively, to obtain a signal Characteristic parameter vector; according to the signal characteristic parameter vector, determine the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth; according to the azimuth distribution vector, determine the azimuth of each sound source in the to-be-located sound source set.

According to one or more embodiments of the present disclosure, the determining, according to the signal characteristic parameter vector, the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth comprises: inputting the signal characteristic parameter vector to the sound source. In the source localization model, the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe that when the sound source set emits sound, each array unit in the microphone array is used to collect the emitted sound respectively. The mapping relationship between the signal characteristic parameter vector obtained from the signal characteristic parameter and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentration in each azimuth.

According to one or more embodiments of the present disclosure, the method further includes a sound source localization model training step, and the sound source localization model training step includes: distributing the sample sound source set and the sound sources in various azimuths. The azimuth distribution vector formed by the distribution parameters is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model. , and perform model training to obtain the sound source localization model.

According to one or more embodiments of the present disclosure, the step of using each array unit in the microphone array to separately collect signal characteristic parameters of the sound emitted by the sound source set to be located to obtain a signal characteristic parameter vector includes: using a microphone Each array unit in the array separately collects the initial signal characteristic parameters of the sound emitted by the sound source set to be located; based on the initial signal characteristic parameters, the coherent and scattered signal energy ratio characteristic parameters are extracted, and each component is obtained as the coherent and scattered signal energy ratio characteristic The signal feature parameter vector of the parameters.

According to one or more embodiments of the present disclosure, it is characterized in that, in the extraction of the coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, to obtain the coherent and scattered signal energy ratio characteristic parameters for each component Before obtaining the signal characteristic parameter vector, the method further includes: performing time-frequency transformation on the initial signal characteristic parameter.

According to one or more embodiments of the present disclosure, the distribution parameter is a distribution probability, and according to the azimuth distribution vector output in the sound source localization model, the determination of each sound source in the to-be-located sound source set is The azimuth of the source, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is a sound source in the to-be-located sound source set on each azimuth according to the comparison result source, to obtain the number of sound sources in the set of sound sources to be located and the orientation of each sound source.

According to one or more embodiments of the present disclosure, the sound source localization model is a neural network model.

According to one or more embodiments of the present disclosure, a sound source localization apparatus is provided, which is characterized by comprising: a collection unit configured to use each array unit in the microphone array to collect the signals of the sound emitted by the sound source set to be localized respectively The characteristic parameter is used to obtain the signal characteristic parameter vector; the vector determination unit is used to determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector; the azimuth determination unit is used to determine according to the azimuth distribution vector. The to-be-located sound sources focus on the azimuths of each sound source.

According to one or more embodiments of the present disclosure, the vector determination unit is further configured to: input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the corresponding sound source set. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters of the sound source in each azimuth.

According to one or more embodiments of the present disclosure, the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is used for: distributing the concentrated sound sources of the sample sound sources in various azimuths The azimuth distribution vector formed by the distribution parameters is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input of the model. vector, and perform model training to obtain the sound source localization model.

According to one or more embodiments of the present disclosure, the collection unit includes: a collection sub-unit, which uses each array unit in the microphone array to collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located; The subunit is configured to extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.

According to one or more embodiments of the present disclosure, the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.

According to one or more embodiments of the present disclosure, the distribution parameter is a distribution probability, and the orientation determination unit is further configured to: for each component in the output orientation distribution vector, convert the component value Compare with the corresponding preset threshold; determine whether there is a sound source in the set of sound sources to be located in each azimuth according to the comparison result, and obtain the number of sound sources in the set of sound sources to be located and the location of each sound source. position.

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

A sound source localization method, comprising:

Using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted by the sound source set to be located, to obtain a signal characteristic parameter vector;

Determine the azimuth distribution vector of the sound source in the sound source set in each azimuth according to the signal characteristic parameter vector;

According to the azimuth distribution vector, the azimuth of each sound source in the to-be-located sound source set is determined.
The method according to claim 1, wherein the determining, according to the signal characteristic parameter vector, the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth, comprising:

The signal feature parameter vector is input into the sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the use of the microphone array when the sound source set emits sound The mapping relationship between the signal feature parameter vector obtained by each array unit separately collecting the signal feature parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound sources in each azimuth.
The method according to claim 2, wherein the method further comprises a sound source localization model training step, and the sound source localization model training step comprises:

The azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth of the sample sound source set is used as the output vector of the model, and each array unit of the microphone array respectively collects the signal of the sound emitted by the corresponding sample sound source set The signal feature parameter vector obtained from the feature parameter is used as the input vector of the model, and the model is trained to obtain the sound source localization model.
The method according to claim 1, wherein the use of each array unit in the microphone array to collect signal characteristic parameters of the sound emitted by the sound source set to be located, to obtain a signal characteristic parameter vector, comprises:

Use each array unit in the microphone array to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located;

The coherent and scattered signal energy ratio characteristic parameters are extracted based on the initial signal characteristic parameters, and the signal characteristic parameter vector whose components are the coherent and scattered signal energy ratio characteristic parameters is obtained.
The method according to claim 3, characterized in that, in the extraction of coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, the signal characteristics whose respective components are coherent and scattered signal energy ratio characteristic parameters are obtained. Before the parameter vector, the method further includes:

Time-frequency transform is performed on the initial signal characteristic parameters.
The method of claim 1, wherein the distribution parameter is a distribution probability, and

Determining the azimuth of each sound source in the set of sound sources to be located according to the azimuth distribution vector output in the sound source localization model, including:

For each component in the output azimuth distribution vector, compare the component value with the corresponding preset threshold;

According to the comparison result, it is determined whether there is a sound source in the set of sound sources to be located in each azimuth, and the number of sound sources in the set of sound sources to be located and the azimuth where each sound source is located is obtained.
The method according to any one of claims 1-5, wherein the sound source localization model is a neural network model.
A sound source localization device, comprising:

a collection unit, used for using each array unit in the microphone array to collect signal characteristic parameters of the sound emitted by the set of sound sources to be located, respectively, to obtain a signal characteristic parameter vector;

A vector determination unit, used for determining the azimuth distribution vector of the sound source in the sound source set in each azimuth according to the signal characteristic parameter vector;

An azimuth determination unit, configured to determine the azimuth of each sound source in the set of sound sources to be located according to the azimuth distribution vector.
An electronic device, comprising:

at least one processor;

storage means for storing at least one program,

When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the method of any one of claims 1-7.
A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-7 is implemented.