WO2022121800A1 - Sound source positioning method and apparatus, and electronic device - Google Patents
Sound source positioning method and apparatus, and electronic device Download PDFInfo
- Publication number
- WO2022121800A1 WO2022121800A1 PCT/CN2021/135400 CN2021135400W WO2022121800A1 WO 2022121800 A1 WO2022121800 A1 WO 2022121800A1 CN 2021135400 W CN2021135400 W CN 2021135400W WO 2022121800 A1 WO2022121800 A1 WO 2022121800A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound source
- azimuth
- sound
- vector
- signal characteristic
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 239000013598 vector Substances 0.000 claims abstract description 127
- 230000004807 localization Effects 0.000 claims description 79
- 238000012549 training Methods 0.000 claims description 13
- 230000001427 coherent effect Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 9
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
Definitions
- the present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.
- the current sound source localization algorithm usually has poor localization accuracy when the size of the microphone array used is limited, and the localization accuracy decreases significantly with the increase of reverberation, making it difficult to locate and track multiple sound sources at the same time.
- the embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device, which can achieve high accuracy of sound source localization in an automated manner.
- an embodiment of the present disclosure provides a sound source localization method, the method includes: using each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized, to obtain a signal characteristic parameter vector; The azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth is determined according to the signal characteristic parameter vector; the azimuth of each sound source in the to-be-located sound source set is determined according to the azimuth distribution vector.
- an embodiment of the present disclosure provides a sound source localization device, including: a collection unit configured to use each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized to obtain the signal characteristic A parameter vector; a vector determination unit, used for determining the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector; The orientation of each sound source.
- embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs.
- the one or more processors executes such that the one or more processors implement the sound source localization method as described in the first aspect or the second aspect.
- an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect or the second aspect.
- FIG. 1 is a flowchart of one embodiment of sound source localization according to the present disclosure
- FIG. 2 is a flowchart of another embodiment of sound source localization according to the present disclosure.
- FIG. 3 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure.
- FIG. 4 is an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied;
- FIG. 5 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
- the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- FIG. 1 shows a flow of an embodiment of a sound source localization method according to the present disclosure.
- the sound source localization method is applied to electronic devices such as desktop devices, servers or embedded devices.
- the sound source localization method includes the following steps:
- each array unit in the microphone array is used to collect signal characteristic parameters of the sound emitted by the sound source set to be located, respectively, to obtain a signal characteristic parameter vector.
- a microphone array can be a device consisting of a certain number of microphone arrangements that can be used to sample and process the spatial characteristics of the sound field.
- each array unit in the microphone array that is, a single microphone, can individually collect sound field signals.
- the set of sound sources to be located is one or more sound sources that need to be located. These sound sources may be human voices or sounds emitted by other things. These sound sources may emit sounds individually or together.
- each array unit in the microphone array can be used to sample the sound respectively, and a single array unit can obtain the corresponding signal characteristic parameters, and all the microphones in the entire microphone array can collect the sound.
- the intensity signal of can form the signal characteristic parameter vector.
- Step 102 Determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector.
- the signal characteristic parameter vector can be calculated and processed, so that the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth can be obtained.
- Step 103 Determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.
- each component can be a value of 0 or 1, and 1 and 0 can be used to indicate whether there is a sound source in the relevant azimuth, respectively.
- the corresponding component of the output azimuth distribution vector in a certain azimuth is 1, it can be determined that there is a sound source in the azimuth, otherwise there is no sound source. In this way, whether there is a sound source in each direction can be sequentially determined, so that the orientation of each sound source in the set of sound sources to be located can be determined.
- the above step 102 may include: inputting the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model
- the source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the distribution of the sound source in the corresponding sound source set in each azimuth.
- the mapping relationship between the azimuth distribution vectors formed by the distribution parameters of .
- the input end corresponds to the signal characteristic parameters of the sound emitted by each array unit in the microphone array when the sound source set emits sound.
- the output end of the sound source localization model may be an azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth of the corresponding sound source set.
- the space can be divided with the microphone as the center to obtain multiple orientations.
- the space can be divided in two dimensions, east-west and north-south, or the space can be divided in three dimensions: east-west, north-south, and up and down.
- the granularity of division can also be set according to actual needs. Taking the division of space in the east-west and north-south dimensions as an example, the space can be divided with each orientation occupying 5 degrees of the plane. For example, an area from due east to 5 degrees south by east can be used as an azimuth, an area from 5 degrees south by east to 10 degrees south by east can be classified as an azimuth, and so on.
- the above-mentioned method further includes a sound source localization model training step
- the sound source localization model training step includes: a sample sound source set formed by the distribution parameters of the sound sources distributed in various azimuths.
- the azimuth distribution vector is used as the output vector of the model
- the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model, and the model is trained. , to obtain the sound source localization model.
- the distribution parameter is a distribution probability
- the above step 103 may include: determining the sound to be localized according to the azimuth distribution vector output from the sound source localization model The azimuth of each sound source in the source set, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is the to-be-located sound in each azimuth according to the comparison result
- the distribution parameter may be a distribution probability that continuously takes values between 0 and 1, for example, may be 0.1, 0.3, 0.7, and so on.
- the distribution probability represents the probability that a sound source exists in the corresponding azimuth.
- the probability corresponding to each direction can be compared with the corresponding preset threshold, so as to determine whether there is a sound source in the corresponding direction according to the comparison result.
- the component in the azimuth is greater than the preset threshold, it can be considered that there is a sound source in this direction, otherwise, it is considered that there is no sound source.
- the sound source localization model is a neural network model.
- the neural network structure in the neural network model may include one or more of the following: CNN, LSTM, Linear.
- CNN neural network model
- LSTM Linear
- the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth can be determined by the signal characteristic parameters of the sound sources on each microphone, and then the azimuth of each sound source in the sound source set to be located can be determined , so that accurate sound source localization can be achieved.
- FIG. 2 shows the flow of still another embodiment of the sound source localization method according to the present disclosure.
- the sound source localization method is applied to a mobile terminal. As shown in Figure 2, the sound source localization method includes the following steps:
- each array unit in the microphone array is used to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.
- the electronic device may use each array unit in the microphone array to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.
- Step 202 Extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.
- the electronic device can extract coherent-to-diffuse power ratio (CDR, coherent-to-diffuse power ratio) characteristic parameters from the initial signal characteristic parameters, and obtain each component as The signal eigenparameter vector of the coherent to scattered signal energy ratio eigenparameters.
- CDR coherent-to-diffuse power ratio
- Step 203 Input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the sound source set used when the sound is emitted.
- step 203 for the processing of step 203, reference may be made to step 102 in the corresponding embodiment of FIG. 1 .
- the sound source localization model in this embodiment describes the mapping relationship between the extracted CDR feature parameter vector and the azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth in the sound source set.
- the corresponding feature parameter vector is also used when training this model.
- Step 204 Determine, according to the azimuth distribution vector, the azimuth of each sound source in the to-be-located sound source set.
- step 204 for the specific processing of step 204, reference may be made to step 103 of the embodiment corresponding to FIG. 1 , and details are not repeated here.
- the method before step 202, further includes: performing time-frequency transformation on the initial signal characteristic parameter.
- the collected time-domain signal can be transformed into a frequency-domain signal, which is more convenient for subsequent signal feature processing.
- the extracted signal feature is a CDR feature
- the sound source position information is obtained based on the model corresponding to the CDR feature, which can further improve the robustness of high-reverberation scene positioning and reduce the influence of interference on the positioning result.
- the present disclosure provides an embodiment of a sound source localization apparatus.
- the apparatus embodiment shown in FIG. 3 corresponds to the method embodiment shown in FIG. 1 .
- the device can be applied to various electronic devices.
- the sound source localization apparatus of this embodiment includes: a collection unit 301 , a vector determination unit 302 , and an orientation determination unit 303 .
- the acquisition unit 301 is used to use each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted by the sound source set to be located, and obtain the signal characteristic parameter vector;
- the vector determination unit 302 is used to determine according to the signal characteristic parameter vector.
- the azimuth distribution vector of the sound source distribution in each azimuth in the sound source set; the azimuth determination unit 303 is configured to determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.
- step 103 the specific processing of the acquisition unit 301 , the vector determination unit 302 , and the orientation determination unit 303 of the sound source localization device and the technical effects brought about by them may refer to steps 101 , 102 and 102 in the corresponding embodiment of FIG. 1 , respectively.
- the relevant description of step 103 will not be repeated here.
- the vector determination unit is further configured to: input the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the The sound source localization model is used to describe the signal feature parameter vector obtained by using each array unit in the microphone array to collect the signal feature parameters of the emitted sound when the sound source set emits sound and the corresponding sound source set.
- the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is configured to: set the distribution parameters of the sample sound sources to distribute the sound sources in various azimuths
- the formed azimuth distribution vector is used as the output vector of the model
- the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model.
- the model is trained to obtain the sound source localization model.
- the collection unit includes: a collection subunit, which uses each array unit in the microphone array to separately collect initial signal characteristic parameters of the sound emitted by the sound source set to be located; the extraction subunit, It is used to extract the coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, and obtain the signal characteristic parameter vector whose components are the coherent and scattered signal energy ratio characteristic parameters.
- the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.
- the distribution parameter is a distribution probability
- the orientation determining unit is further configured to: for each component in the output orientation distribution vector, associate the component value with the corresponding Preset thresholds are compared; according to the comparison result, it is determined whether there is a sound source in the set of sound sources to be located in each orientation, and the number of sound sources in the set of sound sources to be located and the orientation of each sound source are obtained.
- the sound source localization model is a neural network model.
- FIG. 4 illustrates an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied.
- the system architecture may include a microphone array 401 , a transmission medium 402 , and an electronic device 403 .
- the transmission medium 402 is a medium used for data transmission between the microphone array 401 and the electronic device 403 .
- the transmission medium 402 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc., and may also be a USB transmission line.
- the microphone array 401 can interact with the terminal device 403 through the transmission medium 402 to receive or send signals and the like.
- the electronic device 401 can be various electronic devices with signal and data processing, and can be terminal devices, such as smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Video Expert Compression Standard Audio Layer 4) Player, etc., it can also be a server device.
- terminal devices such as smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Video Expert Compression Standard Audio Layer 4) Player, etc., it can also be a server device.
- the sound source localization method provided in this embodiment may be executed by the electronic device 403 , and correspondingly, the sound source localization apparatus may be provided in the electronic device 403 .
- microphone arrays networks and electronic devices in Figure 4 is merely illustrative. There may be any number of microphone arrays, transmission media, and electronics depending on implementation needs.
- FIG. 5 it shows a schematic structural diagram of an electronic device (eg, the electronic device in FIG. 4 ) suitable for implementing an embodiment of the present disclosure.
- the electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, Mobile terminals such as in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, etc., may also include server equipment.
- the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into a random access memory according to a program stored in a read only memory (ROM) 502 or from a storage device 508
- the program in the (RAM) 503 executes various appropriate operations and processes.
- various programs and data required for the operation of the electronic device 500 are also stored.
- the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
- An input/output (I/O) interface 505 is also connected to bus 504 .
- I/O interface 505 input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
- An output device 507 such as a computer
- a storage device 508 including, for example, a magnetic tape, a hard disk, etc.
- Communication means 509 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 5 shows an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 509, or from the storage device 508, or from the ROM 502.
- the processing apparatus 501 When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
- HTTP HyperText Transfer Protocol
- Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: uses each array unit in the microphone array to separately collect the sound emitted by the sound source set to be located.
- the signal characteristic parameters are obtained, and the signal characteristic parameter vector is obtained; the signal characteristic parameter vector is input into the sound source localization model, and the sound source localization model is used to describe the use of each array unit in the microphone array when the sound source set emits sound.
- the azimuth distribution vector output in the model determines the azimuth of each sound source in the set of sound sources to be located. .
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Python, Java, Smalltalk, C++, or a combination thereof , as well as conventional procedural programming languages - such as "C" or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
- the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
- the acquisition unit can also be described as "using each array unit in the microphone array to separately collect the sound signals from the sound source set to be located. eigenparameters to obtain the unit of the signal eigenparameter vector".
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- a method for localizing a sound source includes: using each array unit in a microphone array to collect signal characteristic parameters of sounds emitted by a set of sound sources to be localized, respectively, to obtain a signal Characteristic parameter vector; according to the signal characteristic parameter vector, determine the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth; according to the azimuth distribution vector, determine the azimuth of each sound source in the to-be-located sound source set.
- the determining, according to the signal characteristic parameter vector, the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth comprises: inputting the signal characteristic parameter vector to the sound source.
- the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe that when the sound source set emits sound, each array unit in the microphone array is used to collect the emitted sound respectively.
- the mapping relationship between the signal characteristic parameter vector obtained from the signal characteristic parameter and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentration in each azimuth.
- the method further includes a sound source localization model training step
- the sound source localization model training step includes: distributing the sample sound source set and the sound sources in various azimuths.
- the azimuth distribution vector formed by the distribution parameters is used as the output vector of the model
- the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model. , and perform model training to obtain the sound source localization model.
- the step of using each array unit in the microphone array to separately collect signal characteristic parameters of the sound emitted by the sound source set to be located to obtain a signal characteristic parameter vector includes: using a microphone Each array unit in the array separately collects the initial signal characteristic parameters of the sound emitted by the sound source set to be located; based on the initial signal characteristic parameters, the coherent and scattered signal energy ratio characteristic parameters are extracted, and each component is obtained as the coherent and scattered signal energy ratio characteristic The signal feature parameter vector of the parameters.
- the method further includes: performing time-frequency transformation on the initial signal characteristic parameter.
- the distribution parameter is a distribution probability
- the determination of each sound source in the to-be-located sound source set is The azimuth of the source, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is a sound source in the to-be-located sound source set on each azimuth according to the comparison result source, to obtain the number of sound sources in the set of sound sources to be located and the orientation of each sound source.
- the sound source localization model is a neural network model.
- a sound source localization apparatus which is characterized by comprising: a collection unit configured to use each array unit in the microphone array to collect the signals of the sound emitted by the sound source set to be localized respectively
- the characteristic parameter is used to obtain the signal characteristic parameter vector;
- the vector determination unit is used to determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector;
- the azimuth determination unit is used to determine according to the azimuth distribution vector.
- the to-be-located sound sources focus on the azimuths of each sound source.
- the vector determination unit is further configured to: input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the corresponding sound source set.
- the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is used for: distributing the concentrated sound sources of the sample sound sources in various azimuths
- the azimuth distribution vector formed by the distribution parameters is used as the output vector of the model
- the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input of the model. vector, and perform model training to obtain the sound source localization model.
- the collection unit includes: a collection sub-unit, which uses each array unit in the microphone array to collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located;
- the subunit is configured to extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.
- the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.
- the distribution parameter is a distribution probability
- the orientation determination unit is further configured to: for each component in the output orientation distribution vector, convert the component value Compare with the corresponding preset threshold; determine whether there is a sound source in the set of sound sources to be located in each azimuth according to the comparison result, and obtain the number of sound sources in the set of sound sources to be located and the location of each sound source. position.
- the sound source localization model is a neural network model.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
A sound source positioning method and apparatus, and an electronic device (403). The sound source positioning method comprises: respectively collecting, by using array units in a microphone array (401), signal feature parameters of sounds generated from a set of sound sources to be positioned, so as to obtain signal feature parameter vectors (101); according to the signal feature parameter vectors, determining orientation distribution vectors of sound sources in the set of sound sources that are distributed in various orientations (102); and according to the orientation distribution vectors, determining the orientations of the sound sources in the set of sound sources to be positioned (103). More accurate sound source positioning can be realized.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年12月10日提交的,申请号为202011462430.7、发明名称为“声源定位方法、装置和电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 10, 2020 with the application number 202011462430.7 and the invention titled "Sound Source Localization Method, Apparatus and Electronic Device", the full text of which is incorporated herein by reference middle.
本公开涉及信息处理技术领域,尤其涉及一种声源定位方法、装置和电子设备。The present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.
随着技术发展,带有语音采集和处理功能的产品得到越来越多用户的欢迎。这些语音类产品在对采集到的语音信息进行处理时,为了实现更好的语音处理效果,通常需要首先对声源位置进行定位。With the development of technology, products with voice capture and processing functions are welcomed by more and more users. When these voice products process the collected voice information, in order to achieve a better voice processing effect, it is usually necessary to locate the sound source first.
但是,目前的声源定位算法在使用的麦克风阵列的尺寸受限时通常定位精度不佳,且随着混响的提高定位准确定明显下降,难以同时定位和跟踪多个声源。However, the current sound source localization algorithm usually has poor localization accuracy when the size of the microphone array used is limited, and the localization accuracy decreases significantly with the increase of reverberation, making it difficult to locate and track multiple sound sources at the same time.
发明内容SUMMARY OF THE INVENTION
提供该公开内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该公开内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在 用于限制所要求的保护的技术方案的范围。This disclosure section is provided to introduce concepts in a simplified form that are described in detail in the detailed description section that follows. This disclosure section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
本公开实施例提供了一种声源定位方法、装置和电子设备,可以通过自动化方式实现声源定位的高精确度。The embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device, which can achieve high accuracy of sound source localization in an automated manner.
第一方面,本公开实施例提供了一种声源定位方法,该方法包括:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。In a first aspect, an embodiment of the present disclosure provides a sound source localization method, the method includes: using each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized, to obtain a signal characteristic parameter vector; The azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth is determined according to the signal characteristic parameter vector; the azimuth of each sound source in the to-be-located sound source set is determined according to the azimuth distribution vector.
第二方面,本公开实施例提供了一种声源定位装置,包括:采集单元,用于使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;向量确定单元,用于根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;确定单元,用于根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。In a second aspect, an embodiment of the present disclosure provides a sound source localization device, including: a collection unit configured to use each array unit in a microphone array to separately collect signal characteristic parameters of sounds emitted by a sound source set to be localized to obtain the signal characteristic A parameter vector; a vector determination unit, used for determining the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector; The orientation of each sound source.
第三方面,本公开实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面或第二方面所述的声源定位方法。In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs The one or more processors executes such that the one or more processors implement the sound source localization method as described in the first aspect or the second aspect.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面或第二方面所述的声源定位方法的步骤。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect or the second aspect.
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.
图1是根据本公开的声源定位的一个实施例的流程图;1 is a flowchart of one embodiment of sound source localization according to the present disclosure;
图2是根据本公开的另一种声源定位一个实施例的流程图;FIG. 2 is a flowchart of another embodiment of sound source localization according to the present disclosure;
图3是根据本公开的声源定位装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure;
图4是本公开的一个实施例的声源定位方法可以应用于其中的示例性系统架构;4 is an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied;
图5是根据本公开实施例提供的电子设备的基本结构的示意图。FIG. 5 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的 名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.
请参考图1,其示出了根据本公开的声源定位方法的一个实施例的流程。该声源定位方法应用于桌面设备、服务器或嵌入式设备等电子设备上。如图1所示,该声源定位方法,包括以下步骤:Please refer to FIG. 1 , which shows a flow of an embodiment of a sound source localization method according to the present disclosure. The sound source localization method is applied to electronic devices such as desktop devices, servers or embedded devices. As shown in Figure 1, the sound source localization method includes the following steps:
步骤101,使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量。 Step 101 , each array unit in the microphone array is used to collect signal characteristic parameters of the sound emitted by the sound source set to be located, respectively, to obtain a signal characteristic parameter vector.
麦克风阵列可以是由一定数目的麦克风排列组成的设备,可以用来对声场的空间特性进行采样并处理。通常,麦克风阵列中的各个阵列单元,即单个麦克风可以各自单独采集声场信号。待定位声源集是需要对其进行定位的一个或多个声源,这些声源可以是人的声音,也可以是其他事物发出的声音,这些声源可以单独或共同发出声音。A microphone array can be a device consisting of a certain number of microphone arrangements that can be used to sample and process the spatial characteristics of the sound field. Generally, each array unit in the microphone array, that is, a single microphone, can individually collect sound field signals. The set of sound sources to be located is one or more sound sources that need to be located. These sound sources may be human voices or sounds emitted by other things. These sound sources may emit sounds individually or together.
在本实施例中,可以在待定位声源集发出声音时,利用麦克风阵列中的各个阵列单元分别进行声音采样,单个阵列单元都可以得到相应的信号特征参数,整个麦克风阵列中所有麦克风采集得到的强度信号即可形成信号特征参数向量。In this embodiment, when the sound source set to be located emits sound, each array unit in the microphone array can be used to sample the sound respectively, and a single array unit can obtain the corresponding signal characteristic parameters, and all the microphones in the entire microphone array can collect the sound. The intensity signal of , can form the signal characteristic parameter vector.
步骤102,根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量。Step 102: Determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector.
在基于步骤102得到信号特征参数向量,可以对该信号特征参数向量进行计算处理,从而可以得到声源集中声源在各个方位上分布的方位分布向量。Based on the signal characteristic parameter vector obtained in step 102, the signal characteristic parameter vector can be calculated and processed, so that the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth can be obtained.
步骤103,根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。Step 103: Determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.
本实施例中,根据所输出的方位分布向量中在每个方位上所对应的分量的值,可以确定哪些方位上有声源。例如,每个分量均可以是0或1中某一个值,可以通过1和0分别表示相关方位上是否存在声源。当所输出的方位分布向量在某一方位上对应的分量为1时,可以确定该方位存在声源,反之则无声源。这样,即可依次确定每个方向上是否存在声源,从而可以确定待定位声源 集中各个声源的方位。In this embodiment, according to the value of the component corresponding to each azimuth in the output azimuth distribution vector, it can be determined which azimuths have sound sources. For example, each component can be a value of 0 or 1, and 1 and 0 can be used to indicate whether there is a sound source in the relevant azimuth, respectively. When the corresponding component of the output azimuth distribution vector in a certain azimuth is 1, it can be determined that there is a sound source in the azimuth, otherwise there is no sound source. In this way, whether there is a sound source in each direction can be sequentially determined, so that the orientation of each sound source in the set of sound sources to be located can be determined.
在本实施例的一些可选实现方式,上述步骤102可以包括:将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。在该实现方式中,上述声源定位模型中,输入端对应的是声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数。声源定位模型的输出端则可以是对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量。通常,可以以麦克风为中心,对空间进行划分,得到多个方位。例如,可以分别以东西、南北两个维度对空间进行划分,也可以以东西、南北、上下三个维度对空间进行划分。划分的粒度也可以根据实际需要进行设定,以东西、南北两个维度对空间进行划分为例,可以以每个方位占平面5度进行空间划分。例如,正东到东偏南5度的区域可以作为一个方位,东偏南5度到东偏南10度的区域可以划为一个方位,依次类推。In some optional implementations of this embodiment, the above step 102 may include: inputting the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model The source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the distribution of the sound source in the corresponding sound source set in each azimuth. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters of . In this implementation manner, in the above sound source localization model, the input end corresponds to the signal characteristic parameters of the sound emitted by each array unit in the microphone array when the sound source set emits sound. The output end of the sound source localization model may be an azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth of the corresponding sound source set. Usually, the space can be divided with the microphone as the center to obtain multiple orientations. For example, the space can be divided in two dimensions, east-west and north-south, or the space can be divided in three dimensions: east-west, north-south, and up and down. The granularity of division can also be set according to actual needs. Taking the division of space in the east-west and north-south dimensions as an example, the space can be divided with each orientation occupying 5 degrees of the plane. For example, an area from due east to 5 degrees south by east can be used as an azimuth, an area from 5 degrees south by east to 10 degrees south by east can be classified as an azimuth, and so on.
在本实施例的一些可选实现方式中,上述方法还包括声源定位模型训练步骤,所述声源定位模型训练步骤包括:将样本声源集中声源分布在各个方位上的分布参数所形成的方位分布向量作为模型的输出向量,将所述麦克风阵列的各个阵列单元分别采集所对应的样本声源集发出的声音的信号特征参数得到的信号特征参数向量作为模型的输入向量,进行模型训练,得到所述声源定位模型。In some optional implementations of this embodiment, the above-mentioned method further includes a sound source localization model training step, and the sound source localization model training step includes: a sample sound source set formed by the distribution parameters of the sound sources distributed in various azimuths. The azimuth distribution vector is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model, and the model is trained. , to obtain the sound source localization model.
在本实施例的一些可选实现方式中,所述分布参数是分布概率,以及,上述步骤103可以包括:所述根据所述声源定位模型中输出的方位分布向量,确定所述待定位声源集中各个声源的方位,包括:针对所输出的方位分布向量中的每个分量,将分量值与对应的预设阈值进行比较;根据比较结果确定每个方位上是否存在 所述待定位声源集中的声源,得到所述待定位声源集中声源的个数及各个声源所处的方位。在本实现方式中,分布参数可以是处于0到1之间连续取值的分布概率,例如可以是0.1,0.3,0.7等。该分布概率表示对应的方位上存在声源的概率。在确定方向时,可以将各个方位所对应的概率与对应的预设阈值进行比较,从而根据比较结果确定相应方位上是否存在声源。通常,当方位上的分量大于该预设阈值时,可以认为该方向存在声源,否则认为不存在。对于各个方位依次处理,即可得到每个方位上是否存在声源,进而得到所述待定位声源集中声源的个数及各个声源所处的方位。In some optional implementations of this embodiment, the distribution parameter is a distribution probability, and the above step 103 may include: determining the sound to be localized according to the azimuth distribution vector output from the sound source localization model The azimuth of each sound source in the source set, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is the to-be-located sound in each azimuth according to the comparison result For the sound sources in the source set, the number of the sound sources in the to-be-located sound source set and the orientation of each sound source are obtained. In this implementation manner, the distribution parameter may be a distribution probability that continuously takes values between 0 and 1, for example, may be 0.1, 0.3, 0.7, and so on. The distribution probability represents the probability that a sound source exists in the corresponding azimuth. When determining the direction, the probability corresponding to each direction can be compared with the corresponding preset threshold, so as to determine whether there is a sound source in the corresponding direction according to the comparison result. Generally, when the component in the azimuth is greater than the preset threshold, it can be considered that there is a sound source in this direction, otherwise, it is considered that there is no sound source. For each azimuth in turn, it is possible to obtain whether there is a sound source in each azimuth, and then obtain the number of sound sources in the set of sound sources to be located and the azimuth where each sound source is located.
在本实施例的一些可选实现方式,所述声源定位模型为神经网络模型。具体的,神经网络模型中的神经网络结构可以包括以下一种或多种:CNN、LSTM、Linear。该实现方式中,由于使用了神经网络模型进行数据,可以更精准地描述出声源方位和对应麦克风采集到的信号特征参数的映射关系,从而使得最终识别得到的声源方位更为准确。In some optional implementations of this embodiment, the sound source localization model is a neural network model. Specifically, the neural network structure in the neural network model may include one or more of the following: CNN, LSTM, Linear. In this implementation, since the neural network model is used for data, the mapping relationship between the sound source azimuth and the signal characteristic parameters collected by the corresponding microphone can be more accurately described, so that the final identified sound source azimuth is more accurate.
在本公开的实施例中,可以通过声源在各个麦克风上的信号特征参数确定声源集中声源在各个方位上分布的方位分布向量,并进而可确定待定位声源集中各个声源的方位,从而可以实现精准的声源定位。In the embodiments of the present disclosure, the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth can be determined by the signal characteristic parameters of the sound sources on each microphone, and then the azimuth of each sound source in the sound source set to be located can be determined , so that accurate sound source localization can be achieved.
请继续参考图2,其示出了根据本公开的声源定位方法的又一个实施例的流程。该声源定位方法应用于移动终端上。如图2所示该声源定位方法,包括以下步骤:Please continue to refer to FIG. 2 , which shows the flow of still another embodiment of the sound source localization method according to the present disclosure. The sound source localization method is applied to a mobile terminal. As shown in Figure 2, the sound source localization method includes the following steps:
步骤201,使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数。 Step 201 , each array unit in the microphone array is used to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.
在本实施例中,电子设备可以使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数。In this embodiment, the electronic device may use each array unit in the microphone array to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located.
步骤202,基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征 参数的所述信号特征参数向量。Step 202: Extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.
在本实施例中,基于步骤201所提取的初始信号特征参数,电子设备可以从初始信号特征参数提取相干与散射信号能量比值(CDR,coherent-to-diffuse power ratio)特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量。In this embodiment, based on the initial signal characteristic parameters extracted in step 201, the electronic device can extract coherent-to-diffuse power ratio (CDR, coherent-to-diffuse power ratio) characteristic parameters from the initial signal characteristic parameters, and obtain each component as The signal eigenparameter vector of the coherent to scattered signal energy ratio eigenparameters.
步骤203,将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。Step 203: Input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the sound source set used when the sound is emitted. The mapping relationship between the signal characteristic parameter vector obtained by each array unit in the microphone array separately collecting the signal characteristic parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound source distribution in each azimuth .
在本实施例中,步骤203的处理可以参考图1对应实施例中的步骤102。该实施例中的声源定位模型描述的是提取出的CDR特征参数向量与声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。对该模型进行训练时也使用对应的特征参数向量。In this embodiment, for the processing of step 203, reference may be made to step 102 in the corresponding embodiment of FIG. 1 . The sound source localization model in this embodiment describes the mapping relationship between the extracted CDR feature parameter vector and the azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth in the sound source set. The corresponding feature parameter vector is also used when training this model.
步骤204,根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。Step 204: Determine, according to the azimuth distribution vector, the azimuth of each sound source in the to-be-located sound source set.
在本实施例中,步骤204的具体处理可以参考图1对应的实施例的步骤103,这里不再赘述。In this embodiment, for the specific processing of step 204, reference may be made to step 103 of the embodiment corresponding to FIG. 1 , and details are not repeated here.
在本实施例的一些可选实现方式中,在上述步骤202之前,所述方法还包括:对所述初始信号特征参数进行时频变换。通过时频频变,可以将采集的时域信号变换成频域信号,更便于后续的信号特征处理。In some optional implementations of this embodiment, before step 202, the method further includes: performing time-frequency transformation on the initial signal characteristic parameter. Through time-frequency-frequency conversion, the collected time-domain signal can be transformed into a frequency-domain signal, which is more convenient for subsequent signal feature processing.
在本实施例中,提取的信号特征为CDR特征,并通过基于该CDR特征所对应的模型得到声源位置信息,可以进一步提高高混响场景定位的鲁棒性,减少干扰对定位结果的影响。In this embodiment, the extracted signal feature is a CDR feature, and the sound source position information is obtained based on the model corresponding to the CDR feature, which can further improve the robustness of high-reverberation scene positioning and reduce the influence of interference on the positioning result. .
进一步参考图3,作为对上述各图所示方法的实现,本公开提供了一种声源定位装置的一个实施例,图3所示装置实施例与图1 所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a sound source localization apparatus. The apparatus embodiment shown in FIG. 3 corresponds to the method embodiment shown in FIG. 1 . Specifically, the device can be applied to various electronic devices.
如图3所示,本实施例的声源定位装置包括:采集单元301、向量确定单元302、方位确定单元303。其中,采集单元301,用于使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;向量确定单元302,用于根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;方位确定单元303,用于根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。As shown in FIG. 3 , the sound source localization apparatus of this embodiment includes: a collection unit 301 , a vector determination unit 302 , and an orientation determination unit 303 . Among them, the acquisition unit 301 is used to use each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted by the sound source set to be located, and obtain the signal characteristic parameter vector; the vector determination unit 302 is used to determine according to the signal characteristic parameter vector. The azimuth distribution vector of the sound source distribution in each azimuth in the sound source set; the azimuth determination unit 303 is configured to determine the azimuth of each sound source in the to-be-located sound source set according to the azimuth distribution vector.
在本实施例中,声源定位装置的采集单元301、向量确定单元302、方位确定单元303的具体处理及其所带来的技术效果可分别参考图1对应实施例中步骤101、步骤102和步骤103的相关说明,在此不再赘述。In this embodiment, the specific processing of the acquisition unit 301 , the vector determination unit 302 , and the orientation determination unit 303 of the sound source localization device and the technical effects brought about by them may refer to steps 101 , 102 and 102 in the corresponding embodiment of FIG. 1 , respectively. The relevant description of step 103 will not be repeated here.
在本实施例的一些可选实现方式中,向量确定单元进一步用于:将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。In some optional implementations of this embodiment, the vector determination unit is further configured to: input the signal characteristic parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the The sound source localization model is used to describe the signal feature parameter vector obtained by using each array unit in the microphone array to collect the signal feature parameters of the emitted sound when the sound source set emits sound and the corresponding sound source set. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters on the azimuth.
在本实施例的一些可选实现方式中,所述装置还包括声源定位模型训练单元,所述声源定位模型训练单元用于:将样本声源集中声源分布在各个方位上的分布参数所形成的方位分布向量作为模型的输出向量,将所述麦克风阵列的各个阵列单元分别采集所对应的样本声源集发出的声音的信号特征参数得到的信号特征参数向量作为模型的输入向量,进行模型训练,得到所述声源定位模型。In some optional implementations of this embodiment, the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is configured to: set the distribution parameters of the sample sound sources to distribute the sound sources in various azimuths The formed azimuth distribution vector is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model. The model is trained to obtain the sound source localization model.
在本实施例的一些可选实现方式中,所述采集单元包括:采集子单元,使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数;提取子单元,用于基于所 述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量。In some optional implementations of this embodiment, the collection unit includes: a collection subunit, which uses each array unit in the microphone array to separately collect initial signal characteristic parameters of the sound emitted by the sound source set to be located; the extraction subunit, It is used to extract the coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, and obtain the signal characteristic parameter vector whose components are the coherent and scattered signal energy ratio characteristic parameters.
在本实施例的一些可选实现方式中,所述采集单元还包括:变换子单元,用于对所述初始信号特征参数进行时频变换。In some optional implementation manners of this embodiment, the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.
在本实施例的一些可选实现方式中,所述分布参数是分布概率,以及,所述方位确定单元进一步用于:针对所输出的方位分布向量中的每个分量,将分量值与对应的预设阈值进行比较;根据比较结果确定每个方位上是否存在所述待定位声源集中的声源,得到所述待定位声源集中声源的个数及各个声源所处的方位。In some optional implementations of this embodiment, the distribution parameter is a distribution probability, and the orientation determining unit is further configured to: for each component in the output orientation distribution vector, associate the component value with the corresponding Preset thresholds are compared; according to the comparison result, it is determined whether there is a sound source in the set of sound sources to be located in each orientation, and the number of sound sources in the set of sound sources to be located and the orientation of each sound source are obtained.
在本实施例的一些可选实现方式中,所述声源定位模型为神经网络模型。In some optional implementations of this embodiment, the sound source localization model is a neural network model.
请参考图4,图4示出了本公开的一个实施例的声源定位方法可以应用于其中的示例性系统架构。Please refer to FIG. 4 , which illustrates an exemplary system architecture to which the sound source localization method according to an embodiment of the present disclosure may be applied.
如图4所示,系统架构可以包括麦克风阵列401,传输介质402,电子设备403。传输介质402用以在麦克风阵列401和电子设备403之间进行数据传输的介质。传输介质402可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等,也可以是USB传输线路。As shown in FIG. 4 , the system architecture may include a microphone array 401 , a transmission medium 402 , and an electronic device 403 . The transmission medium 402 is a medium used for data transmission between the microphone array 401 and the electronic device 403 . The transmission medium 402 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc., and may also be a USB transmission line.
麦克风阵列401可以通过传输介质402与终端设备403交互,以接收或发送信号等。The microphone array 401 can interact with the terminal device 403 through the transmission medium 402 to receive or send signals and the like.
电子设备401可以是具有信号和数据处理的各种电子设备,可以是终端设备,例如智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器等等,也可以是服务器设备。The electronic device 401 can be various electronic devices with signal and data processing, and can be terminal devices, such as smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Video Expert Compression Standard Audio Layer 4) Player, etc., it can also be a server device.
需要说明的是,本实施例提供的声源定位方法可以由电子设备403执行,相应地,声源定位装置可以设置在电子设备403中。It should be noted that the sound source localization method provided in this embodiment may be executed by the electronic device 403 , and correspondingly, the sound source localization apparatus may be provided in the electronic device 403 .
应该理解,图4中的麦克风阵列、网络和电子设备的数目仅仅是示意性的。根据实现需要,可以具有任意数目的麦克风阵列、传输介质和电子设备。It should be understood that the number of microphone arrays, networks and electronic devices in Figure 4 is merely illustrative. There may be any number of microphone arrays, transmission media, and electronics depending on implementation needs.
下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如图4中的电子设备)的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,也可以包括服务器设备。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring next to FIG. 5 , it shows a schematic structural diagram of an electronic device (eg, the electronic device in FIG. 4 ) suitable for implementing an embodiment of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, Mobile terminals such as in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, etc., may also include server equipment. The electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
如图5所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into a random access memory according to a program stored in a read only memory (ROM) 502 or from a storage device 508 The program in the (RAM) 503 executes various appropriate operations and processes. In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504 .
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 . Communication means 509 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 5 shows an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算 机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509, or from the storage device 508, or from the ROM 502. When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包 括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系;根据所述声源定位模型中输出的方位分布向量,确定所述待定位声源集中各个声源的方位。。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: uses each array unit in the microphone array to separately collect the sound emitted by the sound source set to be located. The signal characteristic parameters are obtained, and the signal characteristic parameter vector is obtained; the signal characteristic parameter vector is input into the sound source localization model, and the sound source localization model is used to describe the use of each array unit in the microphone array when the sound source set emits sound. The mapping relationship between the signal feature parameter vector obtained by separately collecting the signal feature parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound source distribution in each azimuth; according to the sound source localization The azimuth distribution vector output in the model determines the azimuth of each sound source in the set of sound sources to be located. .
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Python、Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Python, Java, Smalltalk, C++, or a combination thereof , as well as conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一 个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,采集单元还可以被描述为“使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量的单元”。The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances. For example, the acquisition unit can also be described as "using each array unit in the microphone array to separately collect the sound signals from the sound source set to be located. eigenparameters to obtain the unit of the signal eigenparameter vector".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例,提供一种声源定位方法, 其特征在于,包括:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。According to one or more embodiments of the present disclosure, a method for localizing a sound source is provided, which includes: using each array unit in a microphone array to collect signal characteristic parameters of sounds emitted by a set of sound sources to be localized, respectively, to obtain a signal Characteristic parameter vector; according to the signal characteristic parameter vector, determine the azimuth distribution vector of the sound sources in the sound source set distributed in each azimuth; according to the azimuth distribution vector, determine the azimuth of each sound source in the to-be-located sound source set.
根据本公开的一个或多个实施例,其特征在于,所述根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量,包括:将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。According to one or more embodiments of the present disclosure, the determining, according to the signal characteristic parameter vector, the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth comprises: inputting the signal characteristic parameter vector to the sound source. In the source localization model, the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe that when the sound source set emits sound, each array unit in the microphone array is used to collect the emitted sound respectively. The mapping relationship between the signal characteristic parameter vector obtained from the signal characteristic parameter and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentration in each azimuth.
根据本公开的一个或多个实施例,其特征在于,所述方法还包括声源定位模型训练步骤,所述声源定位模型训练步骤包括:将样本声源集中声源分布在各个方位上的分布参数所形成的方位分布向量作为模型的输出向量,将所述麦克风阵列的各个阵列单元分别采集所对应的样本声源集发出的声音的信号特征参数得到的信号特征参数向量作为模型的输入向量,进行模型训练,得到所述声源定位模型。According to one or more embodiments of the present disclosure, the method further includes a sound source localization model training step, and the sound source localization model training step includes: distributing the sample sound source set and the sound sources in various azimuths. The azimuth distribution vector formed by the distribution parameters is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input vector of the model. , and perform model training to obtain the sound source localization model.
根据本公开的一个或多个实施例,其特征在于,所述使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量,包括:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数;基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量。According to one or more embodiments of the present disclosure, the step of using each array unit in the microphone array to separately collect signal characteristic parameters of the sound emitted by the sound source set to be located to obtain a signal characteristic parameter vector includes: using a microphone Each array unit in the array separately collects the initial signal characteristic parameters of the sound emitted by the sound source set to be located; based on the initial signal characteristic parameters, the coherent and scattered signal energy ratio characteristic parameters are extracted, and each component is obtained as the coherent and scattered signal energy ratio characteristic The signal feature parameter vector of the parameters.
根据本公开的一个或多个实施例,其特征在于,在所述基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所 述信号特征参数向量之前,所述方法还包括:对所述初始信号特征参数进行时频变换。According to one or more embodiments of the present disclosure, it is characterized in that, in the extraction of the coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, to obtain the coherent and scattered signal energy ratio characteristic parameters for each component Before obtaining the signal characteristic parameter vector, the method further includes: performing time-frequency transformation on the initial signal characteristic parameter.
根据本公开的一个或多个实施例,其特征在于,所述分布参数是分布概率,以及所述根据所述声源定位模型中输出的方位分布向量,确定所述待定位声源集中各个声源的方位,包括:针对所输出的方位分布向量中的每个分量,将分量值与对应的预设阈值进行比较;根据比较结果确定每个方位上是否存在所述待定位声源集中的声源,得到所述待定位声源集中声源的个数及各个声源所处的方位。According to one or more embodiments of the present disclosure, the distribution parameter is a distribution probability, and according to the azimuth distribution vector output in the sound source localization model, the determination of each sound source in the to-be-located sound source set is The azimuth of the source, including: for each component in the output azimuth distribution vector, comparing the component value with the corresponding preset threshold; determining whether there is a sound source in the to-be-located sound source set on each azimuth according to the comparison result source, to obtain the number of sound sources in the set of sound sources to be located and the orientation of each sound source.
根据本公开的一个或多个实施例,其特征在于,所述声源定位模型为神经网络模型。According to one or more embodiments of the present disclosure, the sound source localization model is a neural network model.
根据本公开的一个或多个实施例,提供一种声源定位装置,其特征在于,包括:采集单元,用于使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;向量确定单元,用于根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;方位确定单元,用于根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。According to one or more embodiments of the present disclosure, a sound source localization apparatus is provided, which is characterized by comprising: a collection unit configured to use each array unit in the microphone array to collect the signals of the sound emitted by the sound source set to be localized respectively The characteristic parameter is used to obtain the signal characteristic parameter vector; the vector determination unit is used to determine the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth according to the signal characteristic parameter vector; the azimuth determination unit is used to determine according to the azimuth distribution vector. The to-be-located sound sources focus on the azimuths of each sound source.
根据本公开的一个或多个实施例,其特征在于,所述向量确定单元进一步用于:将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。According to one or more embodiments of the present disclosure, the vector determination unit is further configured to: input the signal feature parameter vector into a sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the signal characteristic parameter vector obtained by using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted when the sound source set emits sound and the corresponding sound source set. The mapping relationship between the azimuth distribution vectors formed by the distribution parameters of the sound source in each azimuth.
根据本公开的一个或多个实施例,其特征在于,所述装置还包括声源定位模型训练单元,所述声源定位模型训练单元用于:将样本声源集中声源分布在各个方位上的分布参数所形成的方位分布向量作为模型的输出向量,将所述麦克风阵列的各个阵列单元分别采集所对应的样本声源集发出的声音的信号特征参数得到 的信号特征参数向量作为模型的输入向量,进行模型训练,得到所述声源定位模型。According to one or more embodiments of the present disclosure, the apparatus further includes a sound source localization model training unit, and the sound source localization model training unit is used for: distributing the concentrated sound sources of the sample sound sources in various azimuths The azimuth distribution vector formed by the distribution parameters is used as the output vector of the model, and the signal characteristic parameter vector obtained by each array unit of the microphone array respectively collecting the signal characteristic parameters of the sound emitted by the corresponding sample sound source set is used as the input of the model. vector, and perform model training to obtain the sound source localization model.
根据本公开的一个或多个实施例,其特征在于,所述采集单元包括:采集子单元,使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数;提取子单元,用于基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量。According to one or more embodiments of the present disclosure, the collection unit includes: a collection sub-unit, which uses each array unit in the microphone array to collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located; The subunit is configured to extract the coherent-to-scattered signal energy ratio characteristic parameter based on the initial signal characteristic parameter, and obtain the signal characteristic parameter vector whose components are the coherent-to-scattered signal energy ratio characteristic parameter.
根据本公开的一个或多个实施例,其特征在于,所述采集单元还包括:变换子单元,用于对所述初始信号特征参数进行时频变换。According to one or more embodiments of the present disclosure, the acquisition unit further includes: a transform subunit, configured to perform time-frequency transform on the initial signal characteristic parameter.
根据本公开的一个或多个实施例,其特征在于,所述分布参数是分布概率,以及,所述方位确定单元进一步用于:针对所输出的方位分布向量中的每个分量,将分量值与对应的预设阈值进行比较;根据比较结果确定每个方位上是否存在所述待定位声源集中的声源,得到所述待定位声源集中声源的个数及各个声源所处的方位。According to one or more embodiments of the present disclosure, the distribution parameter is a distribution probability, and the orientation determination unit is further configured to: for each component in the output orientation distribution vector, convert the component value Compare with the corresponding preset threshold; determine whether there is a sound source in the set of sound sources to be located in each azimuth according to the comparison result, and obtain the number of sound sources in the set of sound sources to be located and the location of each sound source. position.
根据本公开的一个或多个实施例,其特征在于,所述声源定位模型为神经网络模型。According to one or more embodiments of the present disclosure, the sound source localization model is a neural network model.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释 为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
Claims (10)
- 一种声源定位方法,其特征在于,包括:A sound source localization method, comprising:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;Using each array unit in the microphone array to collect the signal characteristic parameters of the sound emitted by the sound source set to be located, to obtain a signal characteristic parameter vector;根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量;Determine the azimuth distribution vector of the sound source in the sound source set in each azimuth according to the signal characteristic parameter vector;根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。According to the azimuth distribution vector, the azimuth of each sound source in the to-be-located sound source set is determined.
- 根据权利要求1所述的方法,其特征在于,所述根据信号特征参数向量确定声源集中声源在各个方位上分布的方位分布向量,包括:The method according to claim 1, wherein the determining, according to the signal characteristic parameter vector, the azimuth distribution vector of the sound source in the sound source set distributed in each azimuth, comprising:将所述信号特征参数向量输入至声源定位模型中,所述声源定位模型输出所述方位分布向量,其中,所述声源定位模型用于描述声源集发出声音时使用所述麦克风阵列中各个阵列单元分别采集所发出声音的信号特征参数而得到的信号特征参数向量与对应的声源集中声源分布在各个方位上的分布参数所形成的方位分布向量之间的映射关系。The signal feature parameter vector is input into the sound source localization model, and the sound source localization model outputs the azimuth distribution vector, wherein the sound source localization model is used to describe the use of the microphone array when the sound source set emits sound The mapping relationship between the signal feature parameter vector obtained by each array unit separately collecting the signal feature parameters of the emitted sound and the azimuth distribution vector formed by the distribution parameters of the corresponding sound source concentrated sound sources in each azimuth.
- 根据权利要求2所述的方法,其特征在于,所述方法还包括声源定位模型训练步骤,所述声源定位模型训练步骤包括:The method according to claim 2, wherein the method further comprises a sound source localization model training step, and the sound source localization model training step comprises:将样本声源集中声源分布在各个方位上的分布参数所形成的方位分布向量作为模型的输出向量,将所述麦克风阵列的各个阵列单元分别采集所对应的样本声源集发出的声音的信号特征参数得到的信号特征参数向量作为模型的输入向量,进行模型训练,得到所述声源定位模型。The azimuth distribution vector formed by the distribution parameters of the sound source distribution in each azimuth of the sample sound source set is used as the output vector of the model, and each array unit of the microphone array respectively collects the signal of the sound emitted by the corresponding sample sound source set The signal feature parameter vector obtained from the feature parameter is used as the input vector of the model, and the model is trained to obtain the sound source localization model.
- 根据权利要求1所述的方法,其特征在于,所述使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量,包括:The method according to claim 1, wherein the use of each array unit in the microphone array to collect signal characteristic parameters of the sound emitted by the sound source set to be located, to obtain a signal characteristic parameter vector, comprises:使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的初始信号特征参数;Use each array unit in the microphone array to separately collect the initial signal characteristic parameters of the sound emitted by the sound source set to be located;基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量。The coherent and scattered signal energy ratio characteristic parameters are extracted based on the initial signal characteristic parameters, and the signal characteristic parameter vector whose components are the coherent and scattered signal energy ratio characteristic parameters is obtained.
- 根据权利要求3所述的方法,其特征在于,在所述基于所述初始信号特征参数提取相干与散射信号能量比值特征参数,得到各个分量为相干与散射信号能量比值特征参数的所述信号特征参数向量之前,所述方法还包括:The method according to claim 3, characterized in that, in the extraction of coherent and scattered signal energy ratio characteristic parameters based on the initial signal characteristic parameters, the signal characteristics whose respective components are coherent and scattered signal energy ratio characteristic parameters are obtained. Before the parameter vector, the method further includes:对所述初始信号特征参数进行时频变换。Time-frequency transform is performed on the initial signal characteristic parameters.
- 根据权利要求1所述的方法,其特征在于,所述分布参数是分布概率,以及The method of claim 1, wherein the distribution parameter is a distribution probability, and所述根据所述声源定位模型中输出的方位分布向量,确定所述待定位声源集中各个声源的方位,包括:Determining the azimuth of each sound source in the set of sound sources to be located according to the azimuth distribution vector output in the sound source localization model, including:针对所输出的方位分布向量中的每个分量,将分量值与对应的预设阈值进行比较;For each component in the output azimuth distribution vector, compare the component value with the corresponding preset threshold;根据比较结果确定每个方位上是否存在所述待定位声源集中的声源,得到所述待定位声源集中声源的个数及各个声源所处的方位。According to the comparison result, it is determined whether there is a sound source in the set of sound sources to be located in each azimuth, and the number of sound sources in the set of sound sources to be located and the azimuth where each sound source is located is obtained.
- 根据权利要求1-5之一所述的方法,其特征在于,所述声源定位模型为神经网络模型。The method according to any one of claims 1-5, wherein the sound source localization model is a neural network model.
- 一种声源定位装置,其特征在于,包括:A sound source localization device, comprising:采集单元,用于使用麦克风阵列中的各个阵列单元分别采集待定位声源集发出的声音的信号特征参数,得到信号特征参数向量;a collection unit, used for using each array unit in the microphone array to collect signal characteristic parameters of the sound emitted by the set of sound sources to be located, respectively, to obtain a signal characteristic parameter vector;向量确定单元,用于根据信号特征参数向量确定声源集中声 源在各个方位上分布的方位分布向量;A vector determination unit, used for determining the azimuth distribution vector of the sound source in the sound source set in each azimuth according to the signal characteristic parameter vector;方位确定单元,用于根据所述方位分布向量,确定所述待定位声源集中各个声源的方位。An azimuth determination unit, configured to determine the azimuth of each sound source in the set of sound sources to be located according to the azimuth distribution vector.
- 一种电子设备,其特征在于,包括:An electronic device, comprising:至少一个处理器;at least one processor;存储装置,用于存储至少一个程序,storage means for storing at least one program,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中任一所述的方法。When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the method of any one of claims 1-7.
- 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一所述的方法。A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-7 is implemented.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011462430.7A CN112946576B (en) | 2020-12-10 | 2020-12-10 | Sound source positioning method and device and electronic equipment |
CN202011462430.7 | 2020-12-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022121800A1 true WO2022121800A1 (en) | 2022-06-16 |
Family
ID=76234798
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/135400 WO2022121800A1 (en) | 2020-12-10 | 2021-12-03 | Sound source positioning method and apparatus, and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112946576B (en) |
WO (1) | WO2022121800A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112946576B (en) * | 2020-12-10 | 2023-04-14 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647556A (en) * | 2018-03-02 | 2018-10-12 | 重庆邮电大学 | Sound localization method based on frequency dividing and deep neural network |
US20190025400A1 (en) * | 2017-07-24 | 2019-01-24 | Microsoft Technology Licensing, Llc | Sound source localization confidence estimation using machine learning |
CN110544490A (en) * | 2019-07-30 | 2019-12-06 | 南京林业大学 | sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics |
CN110800031A (en) * | 2017-06-27 | 2020-02-14 | 伟摩有限责任公司 | Detecting and responding to alerts |
CN111696570A (en) * | 2020-08-17 | 2020-09-22 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN112946576A (en) * | 2020-12-10 | 2021-06-11 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962272A (en) * | 2018-06-21 | 2018-12-07 | 湖南优浪语音科技有限公司 | Sound pick-up method and system |
CN110082724B (en) * | 2019-05-31 | 2021-09-21 | 浙江大华技术股份有限公司 | Sound source positioning method, device and storage medium |
CN111048106B (en) * | 2020-03-12 | 2020-06-16 | 深圳市友杰智新科技有限公司 | Pickup method and apparatus based on double microphones and computer device |
-
2020
- 2020-12-10 CN CN202011462430.7A patent/CN112946576B/en active Active
-
2021
- 2021-12-03 WO PCT/CN2021/135400 patent/WO2022121800A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110800031A (en) * | 2017-06-27 | 2020-02-14 | 伟摩有限责任公司 | Detecting and responding to alerts |
US20190025400A1 (en) * | 2017-07-24 | 2019-01-24 | Microsoft Technology Licensing, Llc | Sound source localization confidence estimation using machine learning |
CN108647556A (en) * | 2018-03-02 | 2018-10-12 | 重庆邮电大学 | Sound localization method based on frequency dividing and deep neural network |
CN110544490A (en) * | 2019-07-30 | 2019-12-06 | 南京林业大学 | sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics |
CN111696570A (en) * | 2020-08-17 | 2020-09-22 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN112946576A (en) * | 2020-12-10 | 2021-06-11 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112946576B (en) | 2023-04-14 |
CN112946576A (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022121801A1 (en) | Information processing method and apparatus, and electronic device | |
US11270690B2 (en) | Method and apparatus for waking up device | |
CN112364860B (en) | Training method and device of character recognition model and electronic equipment | |
WO2020207174A1 (en) | Method and apparatus for generating quantized neural network | |
WO2022121799A1 (en) | Sound signal processing method and apparatus, and electronic device | |
WO2023273579A1 (en) | Model training method and apparatus, speech recognition method and apparatus, and medium and device | |
CN110413812A (en) | Training method, device, electronic equipment and the storage medium of neural network model | |
WO2022037419A1 (en) | Audio content recognition method and apparatus, and device and computer-readable medium | |
CN113033580B (en) | Image processing method, device, storage medium and electronic equipment | |
CN111597825B (en) | Voice translation method and device, readable medium and electronic equipment | |
WO2022105622A1 (en) | Image segmentation method and apparatus, readable medium, and electronic device | |
US20240205634A1 (en) | Audio signal playing method and apparatus, and electronic device | |
CN112995712A (en) | Method, device and equipment for determining stuck factors and storage medium | |
WO2022121800A1 (en) | Sound source positioning method and apparatus, and electronic device | |
WO2022135131A1 (en) | Sound source positioning method and apparatus, and electronic device | |
CN116072108A (en) | Model generation method, voice recognition method, device, medium and equipment | |
EP4044044A1 (en) | Method and apparatus for processing information | |
WO2022116819A1 (en) | Model training method and apparatus, machine translation method and apparatus, and device and storage medium | |
WO2022012178A1 (en) | Method for generating objective function, apparatus, electronic device and computer readable medium | |
CN113778078A (en) | Positioning information generation method and device, electronic equipment and computer readable medium | |
WO2023155713A1 (en) | Method and apparatus for marking speaker, and electronic device | |
CN111797665A (en) | Method and apparatus for converting video | |
WO2022052889A1 (en) | Image recognition method and apparatus, electronic device, and computer-readable medium | |
CN113986958B (en) | Text information conversion method and device, readable medium and electronic equipment | |
CN111444384B (en) | Audio key point determining method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21902507 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21902507 Country of ref document: EP Kind code of ref document: A1 |