CN114926378A

CN114926378A - Method, system, device and computer storage medium for sound source tracking

Info

Publication number: CN114926378A
Application number: CN202210348317.9A
Authority: CN
Inventors: 丁华; 邵佳璐; 姚军; 吕国庆; 冯媛雨
Original assignee: Zhejiang Xitumeng Digital Technology Co ltd
Current assignee: Zhejiang Xitumeng Digital Technology Co ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-08-19
Anticipated expiration: 2042-04-01
Also published as: CN114926378B

Abstract

The present application relates to the field of voice recognition, and more particularly, to a method, system, apparatus, and computer storage medium for sound source tracking; the method comprises the following steps: acquiring a target audio signal in a target scene and an original scene image; generating a target sound source distribution map of the target audio signal; carrying out image fusion processing on the original scene image and the target sound source distribution diagram to obtain an original fusion image; determining sound source angle information corresponding to a target audio signal; determining an image acquisition angle of an image acquisition device to a target scene based on the sound source angle information so as to enable sound source position information corresponding to the target audio signal to be at a preset position of the target fusion image; the information in the audio signal is directly analyzed, and the position of the sound source point is observed in real time by generating the fusion image, so that the hardware requirement of the sound source tracking system is reduced, the manufacturing cost of the sound source tracking system is reduced, and the implementation application range is widened.

Description

Method, system, device and computer storage medium for sound source tracking

Technical Field

The present application relates to the field of voice recognition, and more particularly, to a method, system, apparatus, and computer storage medium for sound source tracking.

Background

The acoustic positioning system is widely used in the fields of heavy industry or aerospace and the like, and is commonly used for detecting abnormal sound of equipment or detecting environmental noise and the like; for example, the acoustic positioning system can be used for measuring the noise distribution in the running process of a train or a high-speed rail so as to provide a noise shielding mechanism for the train or the high-speed rail; the positioning of noise can be realized in the automobile industry, and then the structural design of the automobile is improved, and the use comfort of a user is improved.

The current mainstream sound source imaging technology is a non-real-time measurement and display sound source system with low cost, but the system cannot realize real-time sound source detection, so that the application limitation is caused; the existing real-time measurement and display sound source system needs a large number of audio acquisition devices to acquire audio and perform data calculation on a large number of audio data, so that huge data calculation amount is caused, and further, the application requirement of a processor in the real-time measurement and display sound source system is improved, so that the overall equipment cost of the real-time measurement and display sound source system is increased, and the implementation and application range is limited; therefore, a method for tracking a sound source is proposed to solve the above problem.

Disclosure of Invention

In view of the above problems in the prior art, an object of the present application is to provide a method for tracking a sound source, in which information in an audio signal is directly analyzed, and a fused image is generated to observe a position of the sound source in real time, so that not only is real-time tracking and positioning of the sound source realized, but also a huge data calculation amount in a sound source positioning process is avoided, and further a hardware requirement of a sound source tracking system is reduced, thereby reducing a manufacturing cost of the sound source tracking system and increasing an implementation application range.

In order to solve the above problem, the present application provides a method of sound source tracking, the method comprising:

acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by an image acquisition device;

generating a target sound source distribution map of the target audio signal;

performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image;

when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal;

and determining the image acquisition angle of the target scene acquired by the image acquisition device based on the sound source angle information so that the sound source position information corresponding to the target audio signal is in the preset position of the target fusion image, and performing image fusion on the target fusion image obtained by acquiring the target scene image based on the image acquisition angle and the target sound source distribution diagram.

In another aspect, the present application also provides an apparatus for tracking a sound source, the apparatus including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by an image acquisition device;

a target sound source distribution map generation module for generating a target sound source distribution map of the target audio signal;

the fusion processing module is used for carrying out image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image;

the sound source angle information determining module is used for determining sound source angle information corresponding to the target audio signal when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image;

and the image acquisition angle determining module is used for determining the image acquisition angle of the target scene of the image acquisition device based on the sound source angle information so as to enable the sound source position information corresponding to the target audio signal to be in the preset position of the target fusion image, and the target fusion image is obtained by carrying out image fusion on the target sound source distribution diagram and the target scene image acquired based on the image acquisition angle.

On the other hand, the application also provides a sound source tracking system, wherein the sound source positioning system comprises an audio acquisition device, a sound source tracking device, a display device and an image acquisition device;

the sound source collecting device is in communication connection with the sound source tracking device and is used for collecting a target audio signal in a target scene and sending the target audio signal to the sound source tracking device;

the sound source tracking device is respectively in communication connection with the display device and the image acquisition device, and is used for acquiring a target audio signal in a target scene and acquiring an original scene image obtained by image acquisition of the target scene by the image acquisition device; generating a target sound source distribution map of the target audio signal; performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; and determining the image acquisition angle of the target scene acquired by the image acquisition device based on the sound source angle information so that the sound source position information corresponding to the target audio signal is in the preset position of the target fusion image, and performing image fusion on the target fusion image obtained by acquiring the target scene image based on the image acquisition angle and the target sound source distribution diagram.

The image acquisition device is used for acquiring a scene image and sending the scene image to the sound source tracking device;

the display device is used for displaying various fused images.

In another aspect, the present application further provides an intelligent identification device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for tracking a sound source as described above.

In another aspect, the present application further provides a computer storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the sound source tracking method as described above.

Due to the technical scheme, the sound source tracking method has the following beneficial effects:

according to the method for tracking the sound source, the information in the audio signal is directly analyzed, and the position of the sound source is observed in real time by generating the fusion image, so that the real-time tracking and positioning of the sound source are realized, huge data calculation amount in the sound source positioning process is avoided, the hardware requirement of the sound source tracking system is further reduced, the manufacturing cost of the sound source tracking system is reduced, and the implementation application range is enlarged.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic structural diagram of a sound source tracking system provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for tracking a sound source according to an embodiment of the present application;

fig. 3 is a schematic flowchart of acquiring a target audio signal in a target scene in a sound source tracking method according to an embodiment of the present application;

fig. 4 is a schematic flowchart illustrating a process of determining sound source angle information corresponding to a target audio signal in a sound source tracking method according to an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a method for sound source tracking according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a method for sound source tracking according to an embodiment of the present application before a target audio signal in a target scene and an original scene image acquired by an image acquisition device are acquired;

FIG. 7 is a schematic flow chart of obtaining an original fusion image in a sound source tracking method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a sound source tracking apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of a hardware structure of a sound source tracking method according to an embodiment of the present application.

1-audio acquisition device, 2-sound source tracking device, 3-display device, 4-image acquisition device, 5-power supply device and 6-driving device.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the present application. In the description of the present application, it is to be understood that the terms "upper", "lower", "left", "right", "top", "bottom", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing the present application and simplifying the description, and do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Moreover, the terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.

Referring to fig. 1, a sound source tracking and positioning system provided by an embodiment of the present application is described, the system including:

the audio acquisition device 1 is in communication connection with the sound source tracking device 2, and the audio acquisition device 1 is used for acquiring a target audio signal in a target scene and sending the target audio signal to the sound source tracking device 2;

the sound source tracking device 2 is respectively in communication connection with the display device 3 and the image acquisition device 4, and the sound source tracking device 2 is used for acquiring a target audio signal in a target scene and acquiring an original scene image obtained by image acquisition of the target scene by the image acquisition device 4; generating a target sound source distribution map of the target audio signal; carrying out image fusion processing on the original scene image and the target sound source distribution diagram to obtain an original fusion image; when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; and determining the image acquisition angle of the image acquisition device 4 to the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at the preset position of the target fusion image, and the target fusion image is obtained by carrying out image fusion on the target sound source distribution diagram and the target scene image acquired based on the image acquisition angle.

The image acquisition device 4 is used for acquiring a scene image and sending the scene image to the sound source tracking device 2;

the display device 3 is used to display various fusion images.

In the embodiment of the present application, the audio collecting device 1 may be a microphone array, the image collecting device 4 may be a camera, and the sound source tracking device 2 may be a processor.

In the embodiment of the application, the audio acquisition device 1 is an annular array formed by 8 microphones; the 8 microphones can all be digital Micro Electro Mechanical System (MEMS) microphones, can be integrated with a sound sensor and an analog-to-digital converter, and directly output audio signals so as to be convenient for subsequent direct signal processing; the microphone is connected with the sound source tracking device 2 through an I2S interface; wherein, I2S _ right is an interface right channel, I2S _ left is an interface left channel, and mic1-8 represent the numbers of 8 microphones respectively; the sound source can be positioned in real time through 8 small microphone arrays, the number of microphone channels is reduced, the number of data processing is reduced, the computational power requirement of the sound source tracking device 2 is further reduced, and the light weight of the sound source tracking system is realized.

In the embodiment of the present application, the image capturing apparatus 4 may be connected to the sound source tracking apparatus 2 through a DVP interface.

In the embodiment of the present application, the sound source tracking device 2 may be a K210 microcontroller, which is internally provided with an Audio Processor (APU) and a computing module with beamforming for computing to implement audio direction detection and audio data preprocessing of effective audio directions; by adopting the microprocessor K210 carrying the APU, the operation speed in the sound source tracking process is increased, the real-time performance of sound source tracking is improved, and the research and development of a low-cost embedded sound source tracking system are realized.

In the embodiment of the present application, the display device 3 may be a display screen, or may be an electromechanical device carrying the display screen, for example, a display screen of a computer, a display screen of a mobile phone, or an LED display screen dedicated to displaying images. The display device 3 is connected to the sound source tracking device 2 through the SPI.

In another embodiment, the display device 3 may be connected to the sound source tracking device 2 through a wireless local area network.

In the embodiment of the present application, the sound source tracking system further comprises a power supply device 5 for supplying electrical energy for the operation of the sound source tracking system.

In the embodiment of the present application, the sound source tracking system further includes a driving device 6, where the driving device 6 may be a stepping motor, and the driving device 6 is connected to the image acquisition device 4 to control the image acquisition device 4 to operate, so as to track the position of the sound source point; the operation of the image acquisition device 4 is realized through the driving device 6, and the real-time movement tracking function is realized.

With reference to fig. 2 to 7, a method for tracking a sound source provided in an embodiment of the present application is described, where the method includes:

s1, acquiring a target audio signal in a target scene, and acquiring an original scene image obtained by image acquisition of the target scene by an image acquisition device; the target scene can be a 180-degree space facing the audio acquisition device, and can also be a 360-degree space facing the audio acquisition device; the target audio signal refers to a signal of a sound source to be detected, wherein the target audio signal may include other audio signals of which the audio intensity around the sound source is smaller than that of the sound source point; the original scene image can be an image obtained by image acquisition of a target scene by an image acquisition device at a first angle; wherein the first angle is any angle.

In another embodiment of the present application, the audio acquisition device may be electrically connected to a driving device, and the driving device may be a driving motor; the driving device drives the audio acquisition device to operate so as to realize the audio acquisition facing a 360-degree target scene.

In the embodiment of the present application, S1 includes:

s101, acquiring a plurality of target audio sub-signals acquired by an audio acquisition device in a target scene;

s103, performing signal integration processing on the plurality of target audio sub-signals to obtain target audio signals.

In the embodiment of the application, the audio acquisition device comprises acquisition channels in multiple directions; the plurality of acquisition channels may be virtual channels output by the audio acquisition device to the sound source tracking device; or entity acquisition channels corresponding to the number and direction of target audio sub-signals output to the sound source tracking device one by one; therefore, S101 includes:

acquiring a plurality of target audio sub-signals acquired by an audio acquisition device in a target scene based on acquisition channels in a plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

In some embodiments, the target audio sub-signal comprises an audio capture direction and an audio intensity; for example, the target scene may be a 180 ° space facing the audio acquisition device, the target scene is uniformly divided into 1-12 directions, the audio acquisition device acquires audio information in the target space and outputs 12 target audio sub-signals of 12 directions to the sound source tracking device; specifically, a rectangular coordinate system is established by using a target scene, wherein a y axis is defined as 0 degree, the angular distance between adjacent audio acquisition directions is 30 degrees, direction numbers are defined clockwise by the positive direction of an x axis, and the included angle between the ith direction and the y axis is i multiplied by 30 degrees; that is to say the target audio sub-signal comprises the angle information of i × 30 ° and the audio intensity in the direction of the angle information; the signal integration process is to perform vector integration calculation on the angle information and the audio intensity in the angle information direction of a plurality of target audio sub-signals into one audio direction and audio intensity, namely, a target audio signal (sound source point).

And S2, generating a target sound source distribution diagram of the target audio signal.

In the embodiment of the application, the obtained target audio signal includes an audio direction and an audio intensity, so that the target audio signal can be subjected to signal mapping based on the audio direction and the audio intensity of the target audio signal to obtain a target sound source distribution map corresponding to the target audio signal.

In a specific embodiment, image construction is carried out on a target audio signal based on the audio direction and the audio intensity of a target audio, and a corresponding black-and-white sound source distribution image is obtained; in which the black-and-white sound source distribution image is a black-and-white grayscale image, specifically, the target audio image may be constructed on a 160 × 160 array image.

In another specific embodiment, pseudo-color mapping can be performed based on the black-and-white sound source distribution image to obtain a corresponding color sound source distribution map; the color sound source distribution diagram is a color rainbow diagram, and the discrimination capability of the details of the sound source distribution diagram is improved by generating the color target sound source distribution diagram, so that the accuracy of noise detection, tracking and positioning is improved.

S3, carrying out image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; the image fusion processing means that two or more images are fused to form one image, namely, an original scene image and a target sound source distribution diagram are fused to form an original fusion image; the original scene image and the target sound source distribution diagram are subjected to image fusion, so that the sound source points can be displayed in an actual scene, and the convenience of sound source tracking, positioning and identifying is improved.

S4, when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining the sound source angle information corresponding to the target audio signal; the point with the highest audio intensity in the target sound source distribution map is the sound source position information corresponding to the target audio signal, and specifically, the sound source position information may correspond to the position with the heaviest color in the target sound source distribution map; the preset position can be the central position of the original fusion image, or any position in a circle which takes the central position as the center of the circle and takes the preset value R as the radius in the original fusion image; the sound source angle information is relative position information of the target audio signal in the target space.

In the embodiment of the present application, S4 includes:

s401, carrying out interference screening processing on a plurality of target audio sub-signals based on the target audio signals to obtain a plurality of effective audio sub-signals; the plurality of effective audio sub-signals carry effective angle information respectively;

and S403, carrying out angle information integration processing on the effective angle information carried by each effective audio sub-signal to obtain sound source angle information.

In some embodiments, the target audio signal comprises an audio intensity, wherein the location where the audio intensity is greatest may be determined as the sound source point location; the interference screening processing is to screen out target audio sub-signals with small influence on the position of the sound source point; for example, the target audio sub-signal that is farthest away from the sound source point location in the target scene; the method integrates the information after interference screening processing, so that the angle information of the sound source is closer to the actual sound source position, and the tracking and positioning accuracy of the sound source is improved.

In another embodiment, the position of the sound source point can be determined from one or more position points with higher audio intensity, and the interference screening processing is to divide two adjacent sound source points; the image acquisition device can comprise a plurality of image acquisition sub-devices, the sound source angle information can be respectively calculated after the sound source point segmentation processing is carried out, and the plurality of image acquisition sub-devices are controlled to operate, so that the synchronous real-time tracking of a plurality of sound sources is realized.

S5, determining an image acquisition angle of the image acquisition device to a target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at a preset position of a target fusion image, and the target fusion image is obtained by carrying out image fusion on a target sound source distribution graph and a target scene image acquired based on the image acquisition angle; the image acquisition angle can be a deflection angle of the image acquisition device relative to a preset position after moving, and can also be a deflection coordinate of the image acquisition device relative to the preset position after moving; the target scene image can be a new scene image acquired by the image acquisition device after being moved; through the sound source position information corresponding to the target audio signal at the preset position of the target fusion image, the convenience of observing the sound source point position can be improved, and the sound source is tracked.

In some embodiments, the target audio sub-signal comprises an audio capture direction and an audio intensity; for example, the target scene may be a 180 ° space facing the audio acquisition device, the target scene is uniformly divided into 1-12 directions, and the audio acquisition device acquires audio signals from 12 directions respectively; establishing a rectangular coordinate system by using a target scene, wherein a y axis is defined as a 0-degree axis, the angular distance between the directions of audio acquisition is 30 degrees, and the direction serial number is clockwise defined by using the positive direction of an x axis, so that the angle between the ith direction and the y axis is i multiplied by 30 degrees; the preset position can be on the y axis, and the angle value corresponding to the sound source angle information is equal to the image acquisition angle; data calculation processing is further reduced by directly equating the moderate angle value of the sound source angle information to the image acquisition angle, and the data calculation flow is simplified, so that the hardware calculation force requirement is reduced, and the manufacturing cost of the sound source tracking system is reduced.

In an embodiment of the present application, the method for sound source tracking further includes:

s6, determining updated sound source angle information corresponding to the updated audio signal; that is, in the present application, audio signals are acquired in real time, and tracking of the position of a sound source point is achieved, where a time interval between two adjacent acquired audio signals is a preset time interval, for example, 0.01 s.

In the embodiment of the application, audio signals in a target scene are collected in real time to obtain a plurality of updated audio sub-signals, and the updated audio sub-signals are subjected to signal integration to obtain updated audio signals; performing interference screening processing on the plurality of updated audio sub-signals based on the updated audio signal to obtain a plurality of updated effective audio sub-signals; the plurality of update valid audio sub-signals each carry update valid angle information; carrying out angle information integration processing on the updated effective angle information carried by each of the plurality of updated effective audio sub-signals to obtain updated sound source angle information; by updating the audio signals in the target scene in real time, the real-time tracking and positioning of the sound source point position are realized, and the application scene of the sound source tracking system is expanded.

S7, determining an updated image acquisition angle of the image acquisition device based on the updated sound source angle information, so that the sound source position information corresponding to the updated audio signal is at a preset position in the updated fusion image; by keeping the updated sound source position information at the preset position all the time, real-time tracking is achieved, and the accuracy of positioning of the user on the position of the sound source position is improved.

In the embodiment of the present application, S1 further includes:

001) initializing and clearing internal storage of the sound source tracking system, and debugging and homing acquisition equipment of the sound source tracking system;

003) acquiring a debugging audio signal in a debugging scene and a debugging scene image acquired by an image acquisition device; debugging the scene image comprises calibrating sound source point position information; the position information of the calibrated sound source point refers to the position of a known sound source in a debugging scene image;

005) generating a debugging sound source distribution map of the debugging audio signal; the debugging sound source distribution diagram comprises position information of a detection sound source point; the detection sound source point position information refers to the position with the maximum audio intensity in the debugging sound source distribution diagram;

007) carrying out image fusion processing on the debugging scene image and the debugging sound source distribution diagram to obtain a debugging fusion image; the position of the detection sound source point position information in the debugging fusion image is matched with the position of the calibration sound source point position information in the debugging scene image; through the debugging step before the positioning test, the position information of the detection sound source point is matched with the position information of the calibration sound source point, so that the sound source point position is displayed in an actual scene, and a corresponding sound source can be accurately found in the subsequent detection process.

009) Recording the image transformation relation between a debugging scene image and a debugging sound source distribution map in the fusion processing process; the transformation relation refers to the steps of offset rotation scaling and the like in the process of matching the position information of the detection sound source point with the position of the calibration sound source point in order to debug the scene image and the debugging sound source distribution diagram.

Correspondingly, S3 includes:

s301, carrying out image transformation on the original scene image and the target sound source distribution diagram based on the image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram; the combination of the detected sound source position and the actual scene is realized through the image transformation relation, and the identification accuracy of sound source detection is improved.

S303, superposing the transformed scene image on the transformed sound source distribution map to obtain a superposed image;

s305, performing cutting processing on the superposed image to obtain an original fusion image.

In some embodiments, the size of the scene graph collected by the image collection device is larger than the size of the sound source distribution graph collected by the audio collection device, and at this time, the scene graph needs to be adapted to the sound source distribution graph; in other embodiments, the size of the scene graph collected by the image collection device is smaller than the size of the sound source distribution map collected by the audio collection device, and the sound source distribution map needs to be adapted to the scene graph.

In the embodiment of the present application, the image transformation relationship is also applied to the image fusion process in the real-time tracking process, and details are not repeated herein.

Referring to fig. 8, a sound source tracking apparatus in an embodiment of the present application is described below, the apparatus including:

the acquiring module 101 is configured to acquire a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by an image acquisition device.

A target sound source distribution map generating module 201, configured to generate a target sound source distribution map of a target audio signal;

and the fusion processing module 301 is configured to perform image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image.

The sound source angle information determining module 401 is configured to determine sound source angle information corresponding to the target audio signal when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image.

The image acquisition angle determining module 501 is configured to determine an image acquisition angle of the image acquisition device for the target scene based on the sound source angle information, so that the sound source position information corresponding to the target audio signal is at a preset position of the target fusion image, and the target fusion image is obtained by performing image fusion on the target sound source distribution map and the target scene image acquired based on the image acquisition angle.

In some embodiments, the obtaining module comprises:

and the sub-signal acquisition unit is used for acquiring a plurality of target audio sub-signals acquired by the audio acquisition device in a target scene.

And the signal integration unit is used for performing signal integration processing on the plurality of target audio sub-signals to obtain target audio signals.

In some embodiments, the sound source angle information determination module comprises:

the interference screening unit is used for carrying out interference screening processing on the plurality of target audio sub-signals based on the target audio signals to obtain a plurality of effective audio sub-signals; the plurality of effective audio sub-signals carry effective angle information respectively;

and the angle information integration unit is used for carrying out angle information integration processing on the effective angle information carried by the effective audio sub-signals respectively to obtain the sound source angle information.

In some embodiments, the sub-signal acquisition unit includes:

the sub-signal acquisition sub-unit is used for acquiring a plurality of target audio sub-signals acquired by the audio acquisition device in a target scene based on acquisition channels in a plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

In some embodiments, the sound source tracking device further comprises:

the updating determining module is used for determining the updated sound source angle information corresponding to the updated audio signal;

and the image updating acquisition angle module is used for determining an image updating acquisition angle of the image acquisition device based on the updated sound source angle information so as to update the preset position of the sound source position information corresponding to the audio signal in the updated fusion image.

In some embodiments, the sound source tracking device further comprises:

the debugging audio signal acquisition module is used for acquiring debugging audio signals in a debugging scene and debugging scene images acquired by the image acquisition device; the debugging scene image comprises position information of a calibration sound source point;

the debugging sound source distribution diagram generating module is used for generating a debugging sound source distribution diagram of the debugging audio signal; the debugging sound source distribution diagram comprises position information of a detection sound source point;

the debugging image fusion processing module is used for carrying out image fusion processing on the debugging scene image and the debugging sound source distribution map to obtain a debugging fusion image; the position of the detection sound source point position information in the debugging fusion image is matched with the position of the calibration sound source point position information in the debugging scene image;

and the transformation relation recording module is used for recording the image transformation relation between the debugging scene image and the debugging sound source distribution map in the fusion processing process.

In some embodiments, the fusion processing module comprises:

the transformation unit is used for carrying out image transformation on the original scene image and the target sound source distribution diagram based on the image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram;

the superposition unit is used for superposing the transformed scene image on the transformed sound source distribution map to obtain a superposed image;

and the cutting unit is used for cutting the superposed image to obtain an original fusion image.

The embodiment of the present application further provides an intelligent recognition device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the sound source tracking method as described above.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one hard disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The method provided by the embodiment of the application can be executed in electronic equipment such as a mobile terminal, a computer terminal, a server or a similar arithmetic device. Fig. 9 is a block diagram of a hardware structure of an electronic device of an image processing method according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 910 (the processor 910 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 930 for storing data, and one or more storage media 920 (e.g., one or more mass storage devices) for storing applications 923 or data 922. Memory 930 and storage media 920 may be, among other things, transient or persistent storage. The program stored in the storage medium 920 may include one or more modules, each of which may include a series of instruction operations for the electronic device. Still further, the central processor 910 may be configured to communicate with the storage medium 920, and execute a series of instruction operations in the storage medium 920 on the electronic device 900. The electronic device 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input-output interfaces 940, and/or one or more operating systems 921, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The input/output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 900. In one example, the input/output Interface 940 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration and is not intended to limit the structure of the electronic device. For example, electronic device 900 may also include more or fewer components than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Embodiments of the present application also provide a storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method for sound source tracking as described above.

The foregoing description has disclosed fully embodiments of the present application. It should be noted that those skilled in the art can make modifications to the embodiments of the present application without departing from the scope of the claims of the present application. Accordingly, the scope of the claims of the present application is not to be limited to the particular embodiments described above.

Claims

1. A method of sound source tracking, comprising:

generating a target sound source distribution map of the target audio signal;

2. The method of claim 1, wherein the acquiring a target audio signal in a target scene comprises:

acquiring a plurality of target audio sub-signals acquired by an audio acquisition device in a target scene;

and performing signal integration processing on the plurality of target audio sub-signals to obtain the target audio signals.

3. The method of claim 2, wherein the determining the sound source angle information corresponding to the target audio signal comprises:

performing interference screening processing on the plurality of target audio sub-signals based on the target audio signal to obtain a plurality of effective audio sub-signals; the plurality of effective audio sub-signals respectively carry effective angle information;

and carrying out angle information integration processing on the effective angle information carried by the plurality of effective audio sub-signals respectively to obtain the sound source angle information.

4. The method of claim 2, wherein the audio acquisition device comprises a plurality of directional acquisition channels;

the acquiring of the plurality of target audio sub-signals acquired by the audio acquisition device in the target scene includes:

acquiring a plurality of target audio sub-signals acquired by the audio acquisition device in a target scene based on the acquisition channels in the plurality of directions; the plurality of target audio sub-signals carry respective corresponding channel information.

5. The method of sound source tracking according to claim 1, further comprising:

determining updated sound source angle information corresponding to the updated audio signal;

and determining an updated image acquisition angle of the image acquisition device based on the updated sound source angle information so as to enable the sound source position information corresponding to the updated audio signal to be at a preset position in the updated fusion image.

6. The method of claim 1, wherein before the acquiring the target audio signal in the target scene and the original scene image acquired by the image acquisition device, the method further comprises:

acquiring a debugging audio signal in a debugging scene and a debugging scene image acquired by an image acquisition device; the debugging scene image comprises position information of a calibration sound source point;

generating a debugging sound source distribution map of the debugging audio signal; the debugging sound source distribution diagram comprises position information of a detection sound source point;

performing image fusion processing on the debugging scene image and the debugging sound source distribution map to obtain a debugging fusion image; the position of the position information of the detection sound source point in the debugging fusion image is matched with the position of the position information of the calibration sound source point in the debugging scene image;

and recording the image transformation relation between the debugging scene image and the debugging sound source distribution diagram in the fusion processing process.

7. The method for tracking a sound source according to claim 6, wherein the performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image comprises:

performing image transformation on the original scene image and the target sound source distribution diagram based on the image transformation relation to obtain a transformed scene image and a transformed sound source distribution diagram;

superposing the transformed scene image on the transformed sound source distribution diagram to obtain a superposed image;

and cutting the superposed image to obtain the original fusion image.

8. A sound source tracking apparatus, characterized in that the sound source localization apparatus comprises:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target audio signal in a target scene and an original scene image obtained by image acquisition of the target scene by an image acquisition device;

a target sound source distribution map generation module, configured to generate a target sound source distribution map of the target audio signal;

9. A sound source tracking system is characterized in that the sound source positioning system comprises an audio acquisition device, a sound source tracking device, a display device and an image acquisition device;

the sound source tracking device is respectively in communication connection with the display device and the image acquisition device, and is used for acquiring a target audio signal in a target scene and acquiring an original scene image obtained by image acquisition of the target scene by the image acquisition device; generating a target sound source distribution map of the target audio signal; performing image fusion processing on the original scene image and the target sound source distribution map to obtain an original fusion image; when the sound source position information corresponding to the target audio signal is not at the preset position of the original fusion image, determining sound source angle information corresponding to the target audio signal; and determining the image acquisition angle of the target scene acquired by the image acquisition device based on the sound source angle information so that the sound source position information corresponding to the target audio signal is in the preset position of the target fusion image, and performing image fusion on the target fusion image obtained by acquiring the target sound source distribution diagram and the target scene image based on the image acquisition angle.

the display device is used for displaying various fusion images.

10. A computer storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a method of sound source tracking according to any one of claims 1 to 7.