CN114926378B

CN114926378B - Method, system, device and computer storage medium for sound source tracking

Info

Publication number: CN114926378B
Application number: CN202210348317.9A
Authority: CN
Inventors: 丁华; 邵佳璐; 姚军; 吕国庆; 冯媛雨
Original assignee: Zhejiang Xitumeng Digital Technology Co ltd
Current assignee: Zhejiang Xitumeng Digital Technology Co ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2023-04-25
Anticipated expiration: 2042-04-01
Also published as: CN114926378A

Abstract

The present application relates to the field of sound identification, and in particular, to a method, system, apparatus, and computer storage medium for sound source tracking; the method comprises the following steps: acquiring a target audio signal in a target scene and an original scene image; generating a target sound source distribution map of the target audio signal; performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; determining sound source angle information corresponding to a target audio signal; determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at a preset position of the target fusion image; the method directly analyzes the information in the audio signal, and generates a fusion image to observe the position of the sound source point in real time, so that the hardware requirement of the sound source tracking system is reduced, the manufacturing cost of the sound source tracking system is reduced, and the implementation application range is increased.

Description

Method, system, device and computer storage medium for sound source tracking

Technical Field

The present application relates to the field of voice recognition, and in particular, to a method, system, apparatus, and computer storage medium for sound source tracking.

Background

The acoustic positioning system is widely used in fields of heavy industry, aerospace and the like, and is commonly used for detecting abnormal sound of equipment, environmental noise and the like; for example, noise distribution during the running of a train or a high-speed rail can be measured using an acoustic positioning system in order to provide a noise shielding mechanism for the train or the high-speed rail; the positioning of noise can be realized in the automobile industry, so that the structural design of the automobile is improved, and the use comfort of a user is improved.

The current mainstream sound source imaging technology is a non-real-time measurement display sound source system with low cost, but the current mainstream sound source imaging technology cannot realize real-time sound source detection, so that the application limitation is caused; the existing real-time measurement display sound source system needs a large number of audio acquisition devices to acquire audio and performs data calculation on a large number of audio data, so that huge data calculation amount is caused, further, the application requirement of a processor in the real-time measurement display sound source system is improved, and the overall equipment cost of the real-time measurement display sound source system is increased and the implementation application range is limited; therefore, a method of sound source tracking is proposed to solve the above-mentioned problems.

Disclosure of Invention

Aiming at the problems in the prior art, the aim of the application is to provide a sound source tracking method, which directly analyzes information in an audio signal and observes the position of a sound source point in real time by generating a fusion image, thereby realizing real-time tracking and positioning of a sound source, avoiding huge data calculation in the sound source positioning process, further reducing the hardware requirement of a sound source tracking system, further reducing the manufacturing cost of the sound source tracking system and improving the implementation application range.

In order to solve the above-mentioned problems, the present application provides a method of sound source tracking, the method comprising:

acquiring a target audio signal in a target scene, and acquiring an original scene image obtained by image acquisition of the target scene by an image acquisition device;

generating a target sound source distribution map of the target audio signal;

performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image;

when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal;

and determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at the preset position of a target fusion image, wherein the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and the target scene image acquired based on the image acquisition angle.

In another aspect, the present application further provides a sound source tracking apparatus, including:

the acquisition module is used for acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by the image acquisition device;

a target sound source distribution diagram generating module, configured to generate a target sound source distribution diagram of the target audio signal;

the fusion processing module is used for carrying out image fusion processing on the original scene image and the target sound source distribution diagram to obtain an original fusion image;

the sound source angle information determining module is used for determining sound source angle information corresponding to the target audio signal when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image;

the image acquisition angle determining module is used for determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that sound source position information corresponding to the target audio signal is at the preset position of a target fusion image, and the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and a target scene image acquired based on the image acquisition angle.

On the other hand, the application also provides a sound source tracking system, which comprises an audio acquisition device, a sound source tracking device, a display device and an image acquisition device;

the sound source acquisition device is in communication connection with the sound source tracking device and is used for acquiring target audio signals in a target scene and sending the target audio signals to the sound source tracking device;

the sound source tracking device is respectively in communication connection with the display device and the image acquisition device, and is used for acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by the image acquisition device; generating a target sound source distribution map of the target audio signal; performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; and determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at the preset position of a target fusion image, wherein the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and the target scene image acquired based on the image acquisition angle.

The image acquisition device is used for acquiring scene images and sending the scene images to the sound source tracking device;

the display device is used for displaying various fusion images.

In another aspect, the present application further provides an intelligent recognition device, where the device includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, where the at least one instruction or the at least one program is loaded and executed by the processor to implement a method for tracking a sound source as described above.

In another aspect, the present application further provides a computer storage medium having at least one instruction or at least one program stored therein, where the at least one instruction or the at least one program is loaded and executed by a processor to implement a method for sound source tracking as described above.

Due to the technical scheme, the sound source tracking method has the following beneficial effects:

according to the method for tracking the sound source, the information in the audio signal is directly analyzed, and the position of the sound source point is observed in real time by generating the fusion image, so that the real-time tracking and positioning of the sound source are realized, huge data calculation in the sound source positioning process is avoided, the hardware requirement of a sound source tracking system is further reduced, the manufacturing cost of the sound source tracking system is reduced, and the implementation application range is increased.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the following description will make a brief introduction to the drawings used in the description of the embodiments or the prior art. It is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic structural diagram of a sound source tracking system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for tracking sound source according to an embodiment of the present application;

fig. 3 is a schematic flow chart of acquiring a target audio signal in a target scene in a method for tracking a sound source according to an embodiment of the present application;

fig. 4 is a schematic flow chart of determining sound source angle information corresponding to a target audio signal in a method for tracking a sound source according to an embodiment of the present application;

FIG. 5 is a flow chart of a method for sound source tracking according to an embodiment of the present application;

fig. 6 is a schematic flow chart before acquiring a target audio signal in a target scene and an original scene image acquired by an image acquisition device in a method for tracking a sound source according to an embodiment of the present application;

fig. 7 is a schematic flow chart of an original fusion image obtained in a method for tracking a sound source according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a sound source tracking device according to an embodiment of the present application;

fig. 9 is a hardware block diagram of a method for tracking a sound source according to an embodiment of the present application.

1-audio acquisition device, 2-sound source tracking device, 3-display device, 4-image acquisition device, 5-power supply device, 6-drive arrangement.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the present application. In the description of the present application, it should be understood that the terms "upper," "lower," "left," "right," "top," "bottom," and the like indicate an orientation or a positional relationship based on that shown in the drawings, and are merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may include one or more of the feature, either explicitly or implicitly. Moreover, the terms "first," "second," and the like, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein.

Referring to fig. 1, a sound source tracking and positioning system provided in an embodiment of the present application is described, where the system includes:

the audio acquisition device 1 is in communication connection with the sound source tracking device 2, and the audio acquisition device 1 is used for acquiring a target audio signal in a target scene and transmitting the target audio signal to the sound source tracking device 2;

the sound source tracking device 2 is respectively in communication connection with the display device 3 and the image acquisition device 4, and the sound source tracking device 2 is used for acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by the image acquisition device 4; generating a target sound source distribution map of the target audio signal; performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; the image acquisition device 4 determines the image acquisition angle of the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at the preset position of the target fusion image, and the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and the target scene image acquired based on the image acquisition angle.

The image acquisition device 4 is used for acquiring scene images and transmitting the scene images to the sound source tracking device 2;

the display device 3 is used for displaying various fused images.

In the embodiment of the present application, the audio capturing device 1 may be a microphone array, the image capturing device 4 may be a camera, and the sound source tracking device 2 may be a processor.

In the embodiment of the application, the audio acquisition device 1 is an annular array formed by 8 microphones; the 8 microphones can be digital micro-electromechanical systems (MEMS) microphones, can be integrated with an acoustic sensor and an analog-to-digital converter, and directly output audio signals so as to facilitate subsequent direct signal processing; the microphone is connected with the sound source tracking device 2 through an I2S interface; wherein i2s_right is the interface right channel, i2s_left is the interface left channel, and mic1-8 represent the numbers of 8 microphones, respectively; the real-time positioning of the sound source can be realized through 8 small microphone arrays, the number of microphone channels is reduced, the number of data processing is reduced, the calculation force requirement of the sound source tracking device 2 is further reduced, and the light weight of the sound source tracking system is realized.

In the embodiment of the present application, the image acquisition device 4 may be connected to the sound source tracking device 2 through a DVP interface.

In the embodiment of the application, the sound source tracking device 2 may be a K210 microcontroller, which has an Audio Processor (APU) and a calculation module with beam forming, where the calculation module is used for calculating to implement audio direction detection and audio data preprocessing of an effective audio direction; by adopting the microprocessor K210 carrying the APU, the operation speed in the sound source tracking process is further improved, the real-time performance of sound source tracking is improved, and the research and development of the low-cost embedded sound source tracking system are realized.

In the embodiment of the present application, the display device 3 may be a display screen, or may be an electromechanical device carrying the display screen, for example, a display screen of a computer, a display screen of a mobile phone, or an LED display screen dedicated to displaying images, etc. The display device 3 is connected to the sound source tracking device 2 through the SPI.

In another embodiment, the display device 3 may be connected to the sound source tracking device 2 via a wireless local area network.

In an embodiment of the present application, the sound source tracking system further comprises power supply means 5 for providing electrical energy for the operation of the sound source tracking system.

In the embodiment of the application, the sound source tracking system further comprises a driving device 6, the driving device 6 can be a stepping motor, and the driving device 6 is connected with the image acquisition device 4 to control the image acquisition device 4 to run so as to track the position of a sound source point; the operation of the image acquisition device 4 is realized through the driving device 6, and the real-time movement tracking function is realized.

Referring to fig. 2-7, a method for tracking a sound source according to an embodiment of the present application is described, where the method includes:

s1, acquiring a target audio signal in a target scene, and acquiring an original scene image obtained by image acquisition of the target scene by an image acquisition device; the target scene can be a 180-degree space facing the audio acquisition device or a 360-degree space facing the audio acquisition device; the target audio signal refers to a signal of a sound source to be detected, wherein the signal can comprise other audio signals with surrounding audio intensity of the sound source smaller than that of the sound source point; the original scene image can be an image obtained by the image acquisition device for acquiring the image of the target scene at a first angle; wherein the first angle is any angle.

In another embodiment of the present application, the audio acquisition device may be electrically connected to a driving device, which may be a driving motor; the driving device drives the audio acquisition device to operate so as to realize audio acquisition facing to a 360-degree target scene.

In the embodiment of the present application, S1 includes:

s101, acquiring a plurality of target audio sub-signals acquired in a target scene by an audio acquisition device;

s103, performing signal integration processing on the plurality of target audio sub-signals to obtain target audio signals.

In the embodiment of the application, the audio acquisition device comprises acquisition channels in multiple directions; the plurality of acquisition channels may be virtual channels that the audio acquisition device outputs to the sound source tracking device; or the entity acquisition channels are in one-to-one correspondence with the number and the direction of the target audio sub-signals output to the sound source tracking device; thus, S101 includes:

acquiring a plurality of target audio sub-signals acquired in a target scene by an audio acquisition device based on acquisition channels in a plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

In some embodiments, the target audio sub-signal includes an audio acquisition direction and an audio intensity; for example, the target scene may be a 180 ° space facing the audio acquisition device, the target scene is uniformly divided into 1-12 directions, the audio acquisition device acquires audio information in the target space, and outputs 12 target audio sub-signals in 12 directions to the sound source tracking device; specifically, a rectangular coordinate system is established by using a target scene, wherein a y-axis is defined as a 0-degree axis, the angular interval between adjacent audio acquisition directions is 30 degrees, and the directions are numbered by using the forward direction of an x-axis to define clockwise, so that the included angle between the i-th direction and the y-axis is i multiplied by 30 degrees; that is, the target audio sub-signal includes angle information of i×30° and audio intensity in the direction of the angle information; the signal integration process is to vector-integrate the angle information and the audio intensities in the direction of the angle information of the plurality of target audio sub-signals into one audio direction and one audio intensity, that is, a target audio signal (sound source point).

S2, generating a target sound source distribution map of the target audio signal.

In the embodiment of the application, the obtained target audio signal includes the audio direction and the audio intensity, so that the target audio signal can be subjected to signal mapping based on the audio direction and the audio intensity of the target audio to obtain the target sound source distribution diagram corresponding to the target audio signal.

In a specific embodiment, image construction is carried out on a target audio signal based on the audio direction and the audio intensity of the target audio to obtain a corresponding black-and-white sound source distribution image; where the black and white sound source distribution image is a black and white gray scale image, specifically, the target audio image may be constructed on a 160 x 160 array image.

In another specific embodiment, pseudo-color mapping can be performed based on the black-and-white sound source distribution image to obtain a corresponding color sound source distribution map; the color sound source distribution diagram is a color rainbow diagram, and the discrimination capability of the details of the sound source distribution diagram is improved by generating a color target sound source distribution diagram, so that the accuracy of noise detection tracking positioning is improved.

S3, performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; the image fusion processing refers to that two or more than two images are fused to form an image, namely an original scene image and a target sound source distribution map are fused to form an original fusion image; the original scene image and the target sound source distribution diagram are subjected to image fusion, so that the sound source point can be displayed in an actual scene, and the convenience of sound source tracking, positioning and identification is improved.

S4, when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; the point with the highest audio intensity in the target sound source distribution diagram is the sound source position information corresponding to the target audio signal, and specifically, the sound source position information can correspond to the position with the heaviest color in the target sound source distribution diagram; the preset position can be the central position of the original fusion image, or can be any position in a circle taking the central position as the center of a circle and taking a preset value R as the radius in the original fusion image; the sound source angle information is relative position information of the target audio signal in the target space.

In the embodiment of the present application, S4 includes:

s401, performing interference screening processing on a plurality of target audio sub-signals based on the target audio signals to obtain a plurality of effective audio sub-signals; each of the plurality of valid audio sub-signals carries valid angle information;

s403, carrying out angle information integration processing on the effective angle information carried by each of the plurality of effective audio sub-signals to obtain sound source angle information.

In some embodiments, the target audio signal comprises an audio intensity, wherein the location of greatest audio intensity may be determined as a sound source point location; the interference screening treatment is to screen out target audio sub-signals with smaller influence on the position of the sound source point; for example, a target audio sub-signal in the target scene that is furthest from the sound source point location; the sound source angle information can be more close to the actual sound source position by integrating after interference screening processing, and the accuracy of tracking and positioning of the sound source is improved.

In another embodiment, the position of the sound source point can be determined by one or more position points with larger audio intensity, and the interference screening process is to perform segmentation processing on two adjacent sound source points; the image acquisition device can comprise a plurality of image acquisition sub-devices, can respectively calculate sound source angle information after the sound source point segmentation processing is carried out, and control the operation of the plurality of image acquisition sub-devices so as to realize synchronous real-time tracking of a plurality of sound sources.

S5, determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that sound source position information corresponding to the target audio signal is at a preset position of a target fusion image, wherein the target fusion image is obtained by carrying out image fusion on a target sound source distribution map and a target scene image acquired based on the image acquisition angle; the image acquisition angle can be a deflection angle of the image acquisition device relative to a preset position after moving, or can be a deflection coordinate of the image acquisition device relative to the preset position after moving; the target scene image can be a new scene image acquired after the image acquisition device moves; by arranging the sound source position information corresponding to the target audio signal at the preset position of the target fusion image, the convenience of observing the sound source point position can be improved, and the tracking of the sound source is realized.

In some embodiments, the target audio sub-signal includes an audio acquisition direction and an audio intensity; for example, the target scene may be a 180 ° space facing the audio acquisition device, uniformly dividing the target scene into 1-12 directions, and the audio acquisition device respectively acquires audio signals from 12 directions; establishing a rectangular coordinate system by using a target scene, wherein a y-axis is defined as a 0-degree axis, the angular interval between the directions of each audio acquisition is 30 degrees, and the directions are numbered by defining the directions clockwise by the positive directions of an x-axis, so that the angle between the ith direction and the y-axis is i multiplied by 30 degrees; the preset position can be on the y axis, and the angle value corresponding to the sound source angle information is equal to the image acquisition angle; the data calculation process is further reduced by directly enabling the angle value of the sound source angle information to be equal to the image acquisition angle, and the data calculation flow is simplified, so that the hardware calculation force requirement is reduced, and the manufacturing cost of the sound source tracking system is reduced.

In an embodiment of the present application, the method for tracking a sound source further includes:

s6, determining updated sound source angle information corresponding to the updated audio signals; that is to say, in this application, the audio signals are acquired in real time, so as to track the positions of the sound source points, wherein the time interval between two adjacent acquired audio signals is a preset time interval, for example, 0.01s.

In the embodiment of the application, the audio signals in the target scene are acquired in real time to obtain a plurality of updated audio sub-signals, and the updated audio sub-signals are subjected to signal integration to obtain updated audio signals; performing interference screening processing on the plurality of updated audio sub-signals based on the updated audio signals to obtain a plurality of updated effective audio sub-signals; the plurality of updated valid audio sub-signals each carry updated valid angle information; performing angle information integration processing on the updated effective angle information carried by each of the plurality of updated effective audio sub-signals to obtain updated sound source angle information; by updating the audio signals in the target scene in real time, the real-time tracking and positioning of the sound source point positions are realized, and the application scene of the sound source tracking system is expanded.

S7, determining an updated image acquisition angle of the image acquisition device based on the updated sound source angle information so as to enable the sound source position information corresponding to the updated audio signal to be at a preset position in the updated fusion image; by always keeping the updated sound source position information at the preset position, real-time tracking is realized, and the accuracy of the positioning of the sound source point position by the user is improved.

In the embodiment of the present application, S1 further includes:

001 Initializing and clearing the internal storage of the sound source tracking system, and debugging and homing the acquisition equipment of the sound source tracking system;

003 Acquiring a debugging audio signal in a debugging scene and a debugging scene image acquired by an image acquisition device; the debugging scene image comprises calibration sound source point position information; calibrating sound source point position information refers to the position of a known sound source in a debugging scene image;

005 Generating a debug sound source profile of the debug audio signal; the debugging sound source distribution diagram comprises position information of a detection sound source point; the detection of the sound source point position information refers to the position with the largest audio intensity in the debugging sound source distribution diagram;

007 Image fusion processing is carried out on the debugging scene image and the debugging sound source distribution diagram to obtain a debugging fusion image; the position of the detected sound source point position information in the debugging fusion image is matched with the position of the calibrated sound source point position information in the debugging scene image; through the debugging step before the positioning test, the position information of the detected sound source point and the position information of the calibrated sound source point are matched, so that the sound source point is displayed in an actual scene, and the corresponding sound source can be found accurately in the subsequent detection process.

009 Recording the image transformation relation between the debugging scene image and the debugging sound source distribution diagram in the fusion processing process; the transformation relation refers to the steps of offset rotation scaling and the like in the process of matching the position information of the detected sound source point and the position of the calibrated sound source point for the purpose of debugging the scene image and the debugging the sound source distribution diagram.

Correspondingly, S3 includes:

s301, performing image transformation on an original scene image and a target sound source distribution diagram based on an image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram; the combination of the detected sound source position and the actual scene is realized through the image transformation relation, and the recognition accuracy of sound source detection is improved.

S303, superposing the transformed scene image on the transformed sound source distribution map to obtain a superposed image;

s305, cutting the superimposed image to obtain an original fusion image.

In some embodiments, the size of the scene graph acquired by the image acquisition device is larger than the size of the sound source distribution map acquired by the audio acquisition device, and the scene graph needs to be adapted to the sound source distribution map; in other embodiments, the size of the scene graph acquired by the image acquisition device is smaller than the size of the sound source distribution map acquired by the audio acquisition device, and the sound source distribution map needs to be adapted to the scene graph.

In the embodiment of the present application, the image transformation relationship is also applied to the image fusion process in the real-time tracking process, which is not described herein.

Referring to fig. 8, a sound source tracking device according to an embodiment of the present application is described below, and includes:

the acquisition module 101 is configured to acquire a target audio signal in a target scene, and an original scene image obtained by image acquisition of the target scene by the image acquisition device.

A target sound source distribution map generation module 201 for generating a target sound source distribution map of a target audio signal;

the fusion processing module 301 is configured to perform image fusion processing on the original scene image and the target sound source distribution map, so as to obtain an original fusion image.

The sound source angle information determining module 401 is configured to determine sound source angle information corresponding to the target audio signal when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image.

The image acquisition angle determining module 501 is configured to determine an image acquisition angle of the image acquisition device for the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at a preset position of a target fusion image, where the target fusion image is obtained by performing image fusion on a target sound source distribution map and a target scene image acquired based on the image acquisition angle.

In some embodiments, the acquisition module comprises:

the sub-signal acquisition unit is used for acquiring a plurality of target audio sub-signals acquired in a target scene by the audio acquisition device.

And the signal integration unit is used for carrying out signal integration processing on the plurality of target audio sub-signals to obtain target audio signals.

In some embodiments, the sound source angle information determination module includes:

the interference screening unit is used for carrying out interference screening processing on the plurality of target audio sub-signals based on the target audio signals to obtain a plurality of effective audio sub-signals; each of the plurality of valid audio sub-signals carries valid angle information;

and the angle information integration unit is used for performing angle information integration processing on the effective angle information carried by each of the plurality of effective audio sub-signals to obtain sound source angle information.

In some embodiments, the sub-signal acquisition unit comprises:

a sub-signal acquisition sub-unit for acquiring a plurality of target audio sub-signals acquired in a target scene by the audio acquisition device based on acquisition channels in a plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

In some embodiments, the sound source tracking device further comprises:

the updating determining module is used for determining updated sound source angle information corresponding to the updated audio signal;

and the updated image acquisition angle module is used for determining an updated image acquisition angle of the image acquisition device based on the updated sound source angle information so as to enable the sound source position information corresponding to the updated audio signal to be in a preset position in the updated fusion image.

In some embodiments, the sound source tracking device further comprises:

the debugging audio signal acquisition module is used for acquiring the debugging audio signal in the debugging scene and the debugging scene image acquired by the image acquisition device; the debugging scene image comprises calibration sound source point position information;

the debugging sound source distribution diagram generation module is used for generating a debugging sound source distribution diagram of the debugging audio signal; the debugging sound source distribution diagram comprises position information of a detection sound source point;

the debugging image fusion processing module is used for carrying out image fusion processing on the debugging scene image and the debugging sound source distribution map to obtain a debugging fusion image; the position of the detected sound source point position information in the debugging fusion image is matched with the position of the calibrated sound source point position information in the debugging scene image;

the transformation relation recording module is used for recording the image transformation relation between the debugging scene image and the debugging sound source distribution diagram in the fusion processing process.

In some embodiments, the fusion processing module includes:

the transformation unit is used for carrying out image transformation on the original scene image and the target sound source distribution diagram based on the image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram;

the superposition unit is used for superposing the transformed scene image on the transformed sound source distribution map to obtain a superposed image;

and the cutting unit is used for cutting the superimposed image to obtain an original fusion image.

The embodiment of the application also provides intelligent identification equipment, which comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize the method for tracking the sound source.

The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one hard disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.

The method embodiments provided in the embodiments of the present application may be performed in an electronic device such as a mobile terminal, a computer terminal, a server, or a similar computing device. Fig. 9 is a block diagram of a hardware structure of an electronic device of an image processing method according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may vary considerably in configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 910 (the processor 910 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 930 for storing data, one or more storage media 920 (e.g., one or more mass storage devices) for storing applications 923 or data 922. Wherein memory 930 and storage medium 920 may be transitory or persistent storage. The program stored on the storage medium 920 may include one or more modules, each of which may include a series of instruction operations in the electronic device. Still further, the central processor 910 may be configured to communicate with a storage medium 920 and execute a series of instruction operations in the storage medium 920 on the electronic device 900. The electronic device 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input/output interfaces 940, and/or one or more operating systems 921, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The input-output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communications provider of the electronic device 900. In one example, the input-output interface 940 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is merely illustrative and is not intended to limit the configuration of the electronic device. For example, electronic device 900 may also include more or fewer components than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Embodiments of the present application also provide a storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program loaded and executed by a processor to implement a method of sound source tracking as described above.

The foregoing description has fully disclosed the embodiments of this application. It should be noted that any modifications to the specific embodiments of the present application may be made by those skilled in the art without departing from the scope of the claims of the present application. Accordingly, the scope of the claims of the present application is not limited to the foregoing detailed description.

Claims

1. A method of sound source tracking, comprising:

acquiring a target audio signal in a target scene and acquiring an original scene image obtained by image acquisition of the target scene by an image acquisition device;

generating a target sound source distribution map of the target audio signal;

determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that sound source position information corresponding to the target audio signal is at the preset position of a target fusion image, wherein the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and a target scene image acquired based on the image acquisition angle; the angle value corresponding to the sound source angle information is equal to the image acquisition angle;

before the target audio signal in the target scene and the original scene image acquired by the image acquisition device are acquired, the method further comprises:

acquiring a debugging audio signal in a debugging scene and a debugging scene image acquired by an image acquisition device; the debugging scene image comprises calibration sound source point position information;

generating a debugging sound source distribution diagram of the debugging audio signal; the debugging sound source distribution diagram comprises detection sound source point position information;

performing image fusion processing on the debugging scene image and the debugging sound source distribution map to obtain a debugging fusion image; the position of the detected sound source point position information in the debugging fusion image is matched with the position of the calibrated sound source point position information in the debugging scene image;

recording an image transformation relation between the debugging scene image and the debugging sound source distribution diagram in the fusion processing process;

the image fusion processing is carried out on the original scene image and the target sound source distribution diagram, and the obtaining of the original fusion image comprises the following steps:

performing image transformation on the original scene image and the target sound source distribution diagram based on the image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram;

superposing the transformed scene image on the transformed sound source distribution map to obtain a superposed image;

and cutting the superimposed image to obtain the original fusion image.

2. The method of claim 1, wherein said acquiring a target audio signal in a target scene comprises:

acquiring a plurality of target audio sub-signals acquired in a target scene by an audio acquisition device;

and carrying out signal integration processing on the plurality of target audio sub-signals to obtain the target audio signals.

3. The method of claim 2, wherein determining sound source angle information corresponding to the target audio signal comprises:

performing interference screening processing on the plurality of target audio sub-signals based on the target audio signals to obtain a plurality of effective audio sub-signals; each of the plurality of valid audio sub-signals carries valid angle information;

and carrying out angle information integration processing on the effective angle information carried by each of the plurality of effective audio sub-signals to obtain the sound source angle information.

4. A method of sound source tracking according to claim 2, wherein the audio acquisition means comprises a plurality of directional acquisition channels;

the acquiring a plurality of target audio sub-signals acquired by the audio acquisition device in a target scene comprises:

acquiring a plurality of target audio sub-signals acquired in a target scene by the audio acquisition device based on the acquisition channels in a plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

5. A method of sound source tracking according to claim 1, wherein the method further comprises:

determining updated sound source angle information corresponding to the updated audio signal;

and determining an updated image acquisition angle of the image acquisition device based on the updated sound source angle information, so that the sound source position information corresponding to the updated audio signal is at a preset position in the updated fusion image.

6. A sound source tracking device, characterized in that the sound source tracking device comprises:

the image acquisition angle determining module is used for determining an image acquisition angle of the image acquisition device on the target scene based on the sound source angle information, so that sound source position information corresponding to the target audio signal is at the preset position of a target fusion image, and the target fusion image is obtained by carrying out image fusion on the target sound source distribution map and a target scene image acquired based on the image acquisition angle; the angle value corresponding to the sound source angle information is equal to the image acquisition angle;

the transformation relation recording module is used for recording the image transformation relation between the debugging scene image and the debugging sound source distribution diagram in the fusion processing process;

the fusion processing module comprises:

7. The sound source tracking system is characterized by comprising an audio acquisition device, a sound source tracking device, a display device and an image acquisition device;

the audio acquisition device is in communication connection with the sound source tracking device and is used for acquiring target audio signals in a target scene and sending the target audio signals to the sound source tracking device;

the sound source tracking device is respectively connected with the display device and the image acquisition device in a communication way, and is used for executing the sound source tracking method according to any one of claims 1-5;

the image acquisition device is used for acquiring a scene image and transmitting the scene image to the sound source tracking device;

the display device is used for displaying various fusion images.

8. A computer storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, loaded and executed by a processor to implement the method of sound source tracking of any of claims 1-5.