CN114863943A

CN114863943A - Self-adaptive positioning method and device for environmental noise source based on beam forming

Info

Publication number: CN114863943A
Application number: CN202210778085.0A
Authority: CN
Inventors: 曹祖杨; 周航; 侯佩佩; 张鑫; 李佳罗; 闫昱甫; 洪全付; 陶慧芳; 方吉
Original assignee: Hangzhou Crysound Electronics Co Ltd
Current assignee: Hangzhou Crysound Electronics Co Ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-08-05
Anticipated expiration: 2042-07-04
Also published as: CN114863943B

Abstract

The invention discloses a method and a device for self-adaptive positioning of environmental noise sources based on beam forming, wherein the method comprises the steps of acquiring an environmental recording acquired by a microphone array in a target environment, and extracting a voiceprint plum-spectrum feature of the environmental recording based on a mel-frequency cepstrum; determining the noise source type corresponding to each voiceprint plum spectrum characteristic based on a KNN algorithm, and selecting all noise characteristics corresponding to the noise source type from a preset noise database; and carrying out beam forming positioning on the environment recording based on each noise characteristic to obtain a target noise position. This application has realized through the voiceprint plum blossom register feature who draws the environmental recording to after this determines the noise type according to the KNN algorithm, come the reverse screening to the environmental recording based on the noise characteristic of noise type, and then beam forming fixes a position the noise source position, can make sound source positioning system can carry out the accurate positioning voluntarily when all kinds of noise sources produce, and the location precision is high.

Description

Self-adaptive positioning method and device for environmental noise source based on beam forming

Technical Field

The present invention relates to the field of sound source localization technologies, and in particular, to a method and an apparatus for adaptively locating an ambient noise source based on beamforming.

Background

With the development of urban construction, more and more facilities and more noise are generated in urban areas, and noise pollution caused by environmental noise needs to be monitored. At present, the difficulty of law enforcement and evidence collection exists in the monitoring of environmental noise, because the noise of a monitoring area exceeds the standard, a plurality of noise manufacturing units are possible near a monitoring point, and the noise source cannot be positioned from the recording, so that effective supervision cannot be carried out. For a noise source with fixed frequency, the traditional wave velocity forming method can position to a certain extent. However, in environmental monitoring, the noise sources are various, the frequency is relatively wide, and the frequency is relatively wide, so that the traditional wave velocity forming method cannot achieve automatic positioning. In summary, no method capable of accurately positioning the overproof noise source in the environmental monitoring exists at present.

Disclosure of Invention

In order to solve the above problem, embodiments of the present application provide a method and an apparatus for adaptive positioning of environmental noise sources based on beamforming.

In a first aspect, an embodiment of the present application provides a beamforming-based adaptive positioning method for environmental noise sources, where the method includes:

acquiring an environment recording acquired by a microphone array in a target environment, and extracting a voiceprint plum spectrum characteristic of the environment recording based on a mel-frequency cepstrum;

determining a noise source type corresponding to each voiceprint plum spectrum characteristic based on a KNN algorithm, and selecting all noise characteristics corresponding to the noise source type from a preset noise database;

and carrying out beam forming positioning on the environment sound recording based on each noise characteristic to obtain a target noise position.

Preferably, the extracting the voiceprint mei spectral features of the environmental recording based on the mel-frequency cepstrum includes:

calculating a fast Fourier transform spectrum corresponding to the environment sound recording;

and introducing the fast Fourier transform frequency spectrum into a Mel frequency filter to obtain the voiceprint Mel spectrum characteristics.

Preferably, the determining the noise source type corresponding to each voiceprint plum spectrum feature based on the KNN algorithm includes:

obtaining a classified sample voiceprint feature plane corresponding to a preset noise database, and mapping each voiceprint plum spectrum feature to the classified sample voiceprint feature plane;

calculating the distance between the mapped voiceprint plum spectrum features and the voiceprint features of the samples in the classified sample voiceprint feature plane, and selecting a first preset number of first sample voiceprint features according to the sequence of the distance from small to large;

and counting the number of each noise source type in the first sample voiceprint characteristic, and determining the noise source type with the maximum number as the noise source type corresponding to the voiceprint plum spectrum characteristic.

Preferably, the performing beamforming positioning on the environment sound recording based on each of the noise features to obtain a target noise position includes:

filtering the environment recording based on each noise feature, and removing sound features which are not matched with each noise feature in the environment recording to obtain noise recording;

and carrying out beam forming positioning on the noise record to obtain a target noise position.

Preferably, the performing beamforming positioning on the noise recording to obtain a target noise position includes:

calculating a first relative reception delay between microphones of the microphone array for the noise recording based on a cross-correlation method;

determining a target plane corresponding to the noise recording, and dividing the target plane into a second preset number of area positions;

simulating a second relative reception delay between each of the microphones for the zone location, respectively;

and determining a target second relative receiving delay with the minimum difference with the first relative receiving delay, wherein the target area position corresponding to the target second relative receiving delay is the target noise position.

Preferably, the method further comprises:

and acquiring an environment image acquired by the monitoring dome camera in the target environment, and generating sound image evidence information of the noise source by combining the environment image and the second noise position.

In a second aspect, an embodiment of the present application provides a beamforming-based adaptive positioning apparatus for environmental noise source, where the apparatus includes:

the acquisition module is used for acquiring an environment recording acquired by a microphone array in a target environment and extracting a voiceprint mei-spectrum feature of the environment recording based on a mel cepstrum;

the selection module is used for determining the noise source type corresponding to each voiceprint plum spectrum characteristic based on a KNN algorithm and selecting all noise characteristics corresponding to the noise source type from a preset noise database;

and the positioning module is used for carrying out beam forming positioning on the environment sound recording based on each noise characteristic to obtain a target noise position.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method as provided in the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as provided in the first aspect or any one of the possible implementations of the first aspect.

The beneficial effects of the invention are as follows: through the voiceprint plum-blossom-shaped spectrum characteristic of the extracted environment recording, after the noise type is determined according to the KNN algorithm, the environment recording is reversely screened based on the noise characteristic of the noise type, and then the position of a noise source is positioned by beam forming, so that the sound source positioning system can automatically perform accurate positioning when various noise sources are generated, and the positioning accuracy is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a beamforming-based adaptive positioning method for environmental noise source according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an adaptive positioning apparatus for environmental noise source based on beamforming according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the present application, which may be combined or interchanged with one another, and therefore the present application is also to be construed as encompassing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then this application should also be construed to include embodiments that include one or more of all other possible combinations of A, B, C, D, even though such embodiments may not be explicitly recited in the text below.

The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.

Referring to fig. 1, fig. 1 is a schematic flowchart of a beamforming-based adaptive positioning method for environmental noise sources according to an embodiment of the present application. In an embodiment of the present application, the method includes:

s101, acquiring an environment recording acquired by a microphone array in a target environment, and extracting a voiceprint mei-spectrum feature of the environment recording based on a mel-frequency cepstrum.

The execution main body can be a cloud server of a sound source positioning system.

In the embodiment of the application, the cloud server firstly acquires the environment recording acquired by the microphone array in the target environment needing noise monitoring, and the microphone array can be a spherical microphone array. After the environmental recording is collected, the cloud server extracts the voiceprint Mei spectral features from the environmental recording in a Meier cepstrum mode so as to identify and determine the noise subsequently.

In one possible implementation, the extracting the voiceprint mei spectral features of the environmental recording based on the mel-frequency cepstrum includes:

In the embodiment of the present application, in order to obtain the voiceprint mei spectrum feature from the environmental recording, firstly, a fast fourier transform spectrum, that is, an FFT spectrum, in the environmental recording needs to be calculated, and a calculation formula thereof is as follows:

where x (N) is a finite length of discrete signal, i.e., an environmental recording, N =0, 1, …, N-1;

is cos (2)

kn/N)+isin(2

kn/N)) euler equation;

is a complex spectrum data composed of the amplitude and phase of k/N periodic signal.

After the fast Fourier transform spectrum is calculated, the fast Fourier transform spectrum passes through a Mel frequency filter, so that the voiceprint plum spectrum characteristic can be obtained, and as the Mel filter coefficient of the Mel frequency filter is determined, the voiceprint plum spectrum characteristic obtained in the process can also be calculated and expressed by the following formula:

wherein the content of the first and second substances,

in order to be able to perform the FFT on the spectrum data,

is the mel-filter coefficient and S is the voiceprint mei spectral feature.

S102, determining the noise source type corresponding to each voiceprint plum spectrum feature based on a KNN algorithm, and selecting all noise features corresponding to the noise source type from a preset noise database.

In the embodiment of the application, the obtained voiceprint plum spectral features are classified through a KNN algorithm, and the noise source type corresponding to each voiceprint plum spectral feature is determined. After the noise source type is determined, the noise position in the environmental recording is not directly determined, but all the noise characteristics corresponding to the obtained noise source type are determined in a preset noise database, so that the environmental recording is subjected to reverse screening according to the noise characteristics, the omission of the noise characteristics is avoided, and the accuracy of final positioning is ensured.

In an embodiment, the determining, based on the KNN algorithm, a noise source class corresponding to each of the voiceprint mei spectral features includes:

In the embodiment of the present application, the specific calculation process of the KNN algorithm is to map the calculated voiceprint features to a voiceprint feature plane of classified samples in a noise library, calculate the distance between the current voiceprint features and the voiceprint features of various samples on the mapped feature plane, count each sample type for the first K samples with the smallest distance, and finally obtain the noise type of the signal as the type with the largest count. Illustratively, when K =5, the five noise library sample types closest to the signal on the feature plane are [ 1-industrial noise, 2-industrial noise, 3-human voice, 4-vehicle noise, 5-industrial noise ], respectively, and the signal is classified as industrial noise.

S103, carrying out beam forming positioning on the environment sound recording based on each noise characteristic to obtain a target noise position.

In the embodiment of the application, after the noise characteristics of the noise source types in the environmental recording are determined, the environmental recording is reversely screened through the noise characteristics, beam forming and positioning are performed after screening is completed, and then the position of the target noise corresponding to the noise source generating the overproof noise in the environmental recording is determined.

In one possible embodiment, step S103 includes:

In the embodiment of the application, the environmental recording is filtered according to the obtained noise characteristics so as to screen out the sound characteristics which can be matched with the noise characteristics in the environmental recording, and the rest unmatched sound characteristics are removed, so that the noise recording in the environmental recording is obtained. The target noise location is then obtained by beamforming the noise recording. Compared with the traditional mode, the method and the device have the advantages that the characteristics corresponding to the noise are not directly determined through the environment recording, the environment recording is screened out based on the noise characteristics corresponding to the noise source of the type after the type of the noise source is determined through the environment recording, so that all the noise characteristics in the environment recording can be screened out, and the situation that some characteristics are not identified or omitted and the positioning is inaccurate or cannot be positioned in the traditional mode of directly determining the characteristics can be avoided.

In one embodiment, the performing beamforming positioning on the noise record to obtain a target noise position includes:

In the embodiment of the present application, the specific process of beamforming is that, since the microphone array is composed of a plurality of microphones, for a sound feature transmitted from a certain position, there is a delay in the time when the sound feature is received between the microphones, so a first relative receiving delay between the microphones with respect to a noise recording is first calculated through a cross-correlation method. Specifically, the calculation formula is as follows:

wherein the content of the first and second substances,

and

for a time series of signals received by the microphone,

in order to be able to displace the time,

is a cross correlation series.

At the maximum in the cross-correlation sequence, the two time series are aligned, the index of the maximumValue multiplied by

I.e. the delay found by the cross-correlation.

In addition, based on the orientation of the noise recording, a target plane can be selected and determined, and a region (for example, a visible plane 10m by 10m ahead, and 10 regions divided into 1m by 1 m) can be cut for the target plane, so that a second relative receiving delay corresponding to the microphones when the sound is transmitted from the region can be sequentially calculated according to the cross-correlation method. Finally, the first relative receiving delay is compared with each second relative receiving delay, and when the difference between the first relative receiving delay and each second relative receiving delay is smaller, the positions of the first relative receiving delay and each second relative receiving delay are closer, so that the position of the target area corresponding to the target second relative receiving delay with the smallest difference is determined as the position of the target noise.

In one embodiment, the method further comprises:

In the embodiment of the application, the target environment is provided with a monitoring ball machine to acquire environment image information besides the microphone array, and the determined target noise position and the environment image are combined for drawing, so that a specific building from which a noise source is transmitted can be determined, and noise source acoustic image evidence information is generated and acquired, so that follow-up and management and control can be performed later.

The beamforming-based adaptive positioning apparatus for environmental noise source provided by the embodiment of the present application will be described in detail below with reference to fig. 2. It should be noted that, the beamforming-based environmental noise source adaptive positioning apparatus shown in fig. 2 is used for executing the method of the embodiment shown in fig. 1 of the present application, and for convenience of description, only the portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 1 of the present application.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an adaptive positioning apparatus for environmental noise source based on beamforming according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:

the acquisition module 201 is configured to acquire an environmental recording acquired by a microphone array in a target environment, and extract a voiceprint mei-spectral feature of the environmental recording based on a mel-frequency cepstrum;

a selecting module 202, configured to determine a noise source type corresponding to each voiceprint plum spectrum feature based on a KNN algorithm, and select all noise features corresponding to the noise source type from a preset noise database;

and the positioning module 203 is configured to perform beamforming positioning on the environment audio record based on each noise feature to obtain a target noise position.

In one implementation, the obtaining module 201 includes:

the first calculation unit is used for calculating a fast Fourier transform spectrum corresponding to the environment sound recording;

and the second calculating unit is used for introducing the fast Fourier transform frequency spectrum into a Mel frequency filter to obtain the voiceprint plum spectrum characteristic.

In one possible implementation, the selecting module 202 includes:

the first acquisition unit is used for acquiring a classified sample voiceprint feature plane corresponding to the preset noise database and mapping each voiceprint plum spectrum feature to the classified sample voiceprint feature plane;

the third calculating unit is used for calculating the distance between the mapped voiceprint plum spectrum features and the voiceprint features of the samples in the classified sample voiceprint feature plane, and selecting a first preset number of first sample voiceprint features according to the sequence of the distance from small to large;

and the counting unit is used for counting the number of each noise source type in the first sample voiceprint characteristic, and determining the noise source type with the maximum number as the noise source type corresponding to the voiceprint plum spectrum characteristic.

In one possible implementation, the positioning module 203 includes:

the screening unit is used for filtering the environment recording based on each noise characteristic, removing the sound characteristic which is not matched with each noise characteristic in the environment recording, and obtaining the noise recording;

and the positioning unit is used for carrying out beam forming positioning on the noise record to obtain a target noise position.

In one embodiment, the positioning unit comprises:

a computing element for computing a first relative reception delay for the noise recording between microphones of the microphone array based on a cross-correlation method;

the first determining element is used for determining a target plane corresponding to the noise recording and dividing the target plane into a second preset number of area positions;

an analog element for respectively simulating a second relative reception delay between each of the microphones with respect to the zone position;

and a second determining element, configured to determine a target second relative receiving delay with a smallest difference from the first relative receiving delay, where a target region position corresponding to the target second relative receiving delay is a target noise position.

In one embodiment, the apparatus further comprises:

and the combination module is used for acquiring an environment image acquired by the monitoring dome camera in the target environment and generating sound image evidence information of the noise source by combining the environment image and the second noise position.

It is clear to a person skilled in the art that the solution according to the embodiments of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, a Field-Programmable Gate Array (FPGA), an Integrated Circuit (IC), or the like.

Each processing unit and/or module in the embodiments of the present application may be implemented by an analog circuit that implements the functions described in the embodiments of the present application, or may be implemented by software that executes the functions described in the embodiments of the present application.

Referring to fig. 3, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, where the electronic device may be used to implement the method in the embodiment shown in fig. 1. As shown in fig. 3, the electronic device 300 may include: at least one central processor 301, at least one network interface 304, a user interface 303, a memory 305, at least one communication bus 302.

Wherein a communication bus 302 is used to enable the connection communication between these components.

The user interface 303 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 303 may further include a standard wired interface and a wireless interface.

The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

The central processor 301 may include one or more processing cores. The central processor 301 connects various parts within the entire electronic device 300 using various interfaces and lines, and performs various functions of the terminal 300 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 305 and calling data stored in the memory 305. Alternatively, the central Processing unit 301 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The CPU 301 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the cpu 301, but may be implemented by a single chip.

The Memory 305 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 305 includes a non-transitory computer-readable medium. The memory 305 may be used to store instructions, programs, code sets, or instruction sets. The memory 305 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 305 may alternatively be at least one storage device located remotely from the central processor 301. As shown in fig. 3, memory 305, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.

In the electronic device 300 shown in fig. 3, the user interface 303 is mainly used for providing an input interface for a user to obtain data input by the user; and the central processor 301 may be configured to invoke the beamforming-based ambient noise source adaptive positioning application stored in the memory 305 and specifically perform the following operations:

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A beamforming-based adaptive positioning method for environmental noise sources, the method comprising:

2. The method of claim 1, wherein the extracting the voiceprint mei spectral features of the environmental audio recording based on the mel-frequency cepstrum comprises:

3. The method according to claim 1, wherein the determining a noise source class corresponding to each voiceprint plum spectral feature based on a KNN algorithm comprises:

4. The method of claim 1, wherein the beamforming positioning the environmental audio record based on each of the noise features to obtain a target noise location comprises:

5. The method of claim 4, wherein the beamforming positioning the noise recording to obtain a target noise position comprises:

6. The method of claim 1, further comprising:

7. An apparatus for adaptive positioning of ambient noise sources based on beamforming, the apparatus comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-6 are implemented when the computer program is executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.