CN111323751B

CN111323751B - Sound source positioning method, device and storage medium

Info

Publication number: CN111323751B
Application number: CN202010217565.0A
Authority: CN
Inventors: 赵玉垒; 浦宏杰; 修平平; 朱赛男; 鄢仁祥
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2022-08-02
Anticipated expiration: 2040-03-25
Also published as: CN111323751A

Abstract

The application relates to a sound source positioning method, a sound source positioning device and a storage medium, which belong to the technical field of computers, and the method comprises the following steps: determining a first coordinate value of the position of a sound source in a first coordinate system when a target sound signal emitted by the sound source is acquired; acquiring a coordinate conversion relation and a correction matrix; converting the first coordinate value to a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger the image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value; the problem that when the whistle coordinate position is converted through the coordinate conversion matrix, the converted whistle position monitoring result is not accurate enough due to the inaccuracy of the coordinate conversion matrix can be solved; the correction matrix can correct errors in coordinate conversion of the coordinate conversion relationship, so that the accuracy of the determined position of the sound source can be improved.

Description

Sound source positioning method, device and storage medium

Technical Field

The application relates to a sound source positioning method, a sound source positioning device and a storage medium, and belongs to the technical field of computers.

Background

The vehicles in urban areas whistle can seriously disturb residents, so most areas stipulate that the vehicles in urban areas are forbidden to whistle. In order to ensure the monitoring of the illegal whistle, the whistle position needs to be positioned, the whistle position is transmitted to the license plate snapshot device, and the license plate snapshot device snapshots and stores the license plate number of the illegal whistle vehicle.

In a typical sound source positioning method, when a whistling monitoring device and a license plate snapshot device are installed, the distance and the deflection angle of the whistling monitoring device relative to the license plate snapshot device are obtained; constructing a coordinate transformation matrix by using the distance and the deflection angle; when the whistle monitoring device collects the whistle signal, the coordinate conversion matrix is used for converting the position of the whistle signal in the own coordinate system of the whistle monitoring device into a common coordinate system shared by the whistle monitoring device and the license plate snapshot device, so that a whistle position monitoring result is obtained.

However, the accuracy of the coordinate transformation matrix affects the transformation result, so that an error exists between the result after the coordinate system transformation and the actual whistle position.

Disclosure of Invention

The application provides a sound source positioning method, a sound source positioning device and a storage medium, which can solve the problem that when the whistle coordinate position is converted through a coordinate conversion matrix, the converted whistle position monitoring result is not accurate enough due to the inaccuracy of the coordinate conversion matrix. The application provides the following technical scheme:

in a first aspect, a sound source localization method is provided, the method including:

when a target sound signal emitted by the sound source is collected, determining a first coordinate value of the position of the sound source in a first coordinate system, wherein the first coordinate system is established based on the position of an audio collection assembly, and the audio collection assembly is used for collecting the target sound signal;

acquiring a coordinate conversion relation, wherein the coordinate conversion relation is used for converting the position of the sound source from the first coordinate system to a second coordinate system, the second coordinate system is established based on the position of the audio acquisition assembly and the position of an image acquisition assembly, and the image acquisition assembly is used for acquiring the image of the sound source;

acquiring a correction matrix, wherein the correction matrix is used for correcting errors in coordinate conversion of the coordinate conversion relation;

and converting the first coordinate value to the second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger an image acquisition assembly to acquire an image of the sound source corresponding to the second coordinate value.

Optionally, the obtaining a correction matrix includes:

when acquiring n times of calibration voice signals emitted from each of m calibration points, determining a first calibration coordinate value of the calibration voice signal on each calibration point in a first coordinate system; both m and n are positive integers;

converting each first calibration coordinate value to the second coordinate system by using the coordinate conversion relation to obtain a second calibration coordinate value;

acquiring real coordinate values of the m calibration points in the second coordinate system;

determining the correction matrix based on the second calibration coordinate values and the real coordinate values.

Optionally, said determining said correction matrix based on said second calibration coordinate values and said real coordinate values comprises:

acquiring a matrix model of the correction matrix;

and multiplying the estimated position matrix formed by each second calibration coordinate value by the matrix model, taking the actual position matrix formed by each corresponding real coordinate value as a multiplication result, and determining the solution of the matrix model based on a least square method to obtain the correction matrix.

Optionally, when acquiring the n times of calibration sound signals emitted from each of the m calibration points, determining a first calibration coordinate value of the calibration sound signal at each calibration point in the first coordinate system includes:

and for each calibration point, calculating the average value of the first calibration coordinate values of the calibration sound signals of n times sent out from the calibration point to obtain the first calibration coordinate values of the calibration sound signals on the calibration point in a first coordinate system, wherein n is an integer greater than 1.

Optionally, the obtaining a correction matrix includes:

reading the stored correction matrix; after the audio acquisition assembly is installed, the calibration matrix emits calibration sound signals at m calibration points; is determined based on a difference between real coordinate values and second calibration coordinate values of the calibration sound signal in the second coordinate system.

Optionally, the obtaining the coordinate transformation relationship includes:

determining a deflection angle of the first coordinate system relative to the second coordinate system;

acquiring the origin position of the coordinate origin of the first coordinate system in the second coordinate system;

and taking the origin position and the deflection angle as deflection parameters of a preset coordinate conversion formula to obtain the coordinate conversion relation.

Optionally, the coordinate conversion formula includes:

wherein the content of the first and second substances,

a matrix formed by the position of the origin of coordinates of the first coordinate system in the second coordinate system;

for representing the first coordinate value, wherein r is a length of a line connecting a position of the sound source and a coordinate origin of the first coordinate system,

an included angle between the projection of the connecting line on the x ' o ' y ' plane of the first coordinate system and the positive direction of the x ' axis is shown, and theta is an included angle between the connecting line and the z ' axis of the first coordinate system; α, β, γ are the deflection angles.

In a second aspect, there is provided a sound source localization apparatus, the apparatus comprising:

the first coordinate determination module is used for determining a first coordinate value of the position of the sound source in a first coordinate system when a target sound signal emitted by the sound source is acquired, wherein the first coordinate system is established based on the position of an audio acquisition assembly, and the audio acquisition assembly is used for acquiring the target sound signal;

a conversion relation obtaining module, configured to obtain a coordinate conversion relation, where the coordinate conversion relation is used to convert the position of the sound source from the first coordinate system to a second coordinate system, the second coordinate system is established based on the position of the audio acquisition component and the position of an image acquisition component, and the image acquisition component is used to acquire an image of the sound source;

a correction matrix obtaining module, configured to obtain a correction matrix, where the correction matrix is used to correct an error in coordinate conversion performed by the coordinate conversion relationship;

and the second coordinate determination module is used for converting the first coordinate value to the second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger an image acquisition assembly to acquire an image of the sound source corresponding to the second coordinate value.

In a third aspect, there is provided a sound source localization apparatus, the apparatus comprising a processor and a memory; the memory stores therein a program that is loaded and executed by the processor to implement the sound source localization method according to the first aspect.

In a fourth aspect, there is provided a computer-readable storage medium having a program stored therein, the program being loaded and executed by the processor to implement the sound source localization method of the first aspect.

The beneficial effect of this application lies in: determining a first coordinate value of the position of a sound source in a first coordinate system when a target sound signal emitted by the sound source is collected; acquiring a coordinate conversion relation, wherein the coordinate conversion relation is used for converting the position of a sound source from a first coordinate system to a second coordinate system; acquiring a correction matrix; converting the first coordinate value to a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger the image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value; the problem that when the whistle coordinate position is converted through the coordinate conversion matrix, the converted whistle position monitoring result is not accurate enough due to the inaccuracy of the coordinate conversion matrix can be solved; the correction matrix can correct errors in coordinate conversion of the coordinate conversion relation, so that the accuracy of the determined position of the sound source can be improved;

in addition, the coordinate conversion relation can be obtained through the deflection condition of the audio acquisition assembly relative to the image acquisition assembly, so that the influence of the installation offset of the audio acquisition assembly on the conversion of the positioning result can be reduced by correcting the coordinate conversion relation;

in addition, the correction matrix is determined and stored after the audio acquisition assembly is installed, the correction matrix can be repeatedly used in the subsequent sound source positioning process, the output positioning result of the sound source positioning can be rapidly completed, and the sound source positioning efficiency is improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a schematic diagram of a sound source localization system according to an embodiment of the present application;

FIG. 2 is a flow chart of a sound source localization method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a first coordinate system provided by one embodiment of the present application;

FIG. 4 is a schematic illustration of a first coordinate system and a second coordinate system provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of calibration points provided by one embodiment of the present application;

FIG. 6 is a block diagram of a sound source localization apparatus provided in one embodiment of the present application;

fig. 7 is a block diagram of a sound source localization apparatus according to an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Fig. 1 is a schematic structural diagram of a sound source localization system according to an embodiment of the present application, as shown in fig. 1, the system at least includes: an audio capture component 110 and an image capture component 120.

The audio capturing component 110 may be a microphone or a microphone array or other devices having a function of capturing sound signals, and the embodiment does not limit the type of the audio capturing component 110. The structure of the microphone array may be circular, rectangular, multi-arm spiral, spherical, etc., and the structure of the microphone array is not limited in this embodiment.

Optionally, the audio capture component 110 is configured to: when the target sound signal is collected, determining a first coordinate value of the position of a sound source of the target sound signal in a first coordinate system; acquiring a coordinate conversion relation; acquiring a correction matrix; and converting the first coordinate value to a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger the image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value.

Wherein the first coordinate system is established based on the position of the audio acquisition component. The first coordinate value may be a coordinate value of a spherical coordinate system; of course, the coordinate values may be in a cartesian coordinate system, and the expression form of the first coordinate value is not limited in this embodiment. Optionally, the origin of coordinates of the first coordinate system is a position where the audio capturing component is located.

The second coordinate system is established based on the locations of the audio capture component 110 and the image capture component 120. In other words, the second coordinate system is a coordinate system common to the audio capture component 110 and the image capture component 120. Optionally, one coordinate plane of the second coordinate system is parallel to the ground. The second coordinate value may be a coordinate value of a spherical coordinate system; of course, the coordinate values may be in a cartesian coordinate system, and the expression form of the second coordinate values is not limited in this embodiment.

Wherein the coordinate transformation relationship is used to transform the position of the sound source from a first coordinate system to a second coordinate system. The correction matrix is used for correcting errors in coordinate conversion in the coordinate conversion relation.

The audio capture component 110 is communicatively coupled to the image capture component 120 via a wired or wireless connection.

The image capturing assembly 120 may be a video camera, a still camera, or other devices with image capturing function, and the present embodiment does not limit the type of the image capturing assembly 120.

The image collecting component 120 is configured to collect an image of the sound source corresponding to the second coordinate value.

It should be added that after the audio acquisition component 110 acquires the first coordinate value, the first coordinate value may also be sent to other devices to trigger the other devices to execute acquiring the coordinate transformation relationship and the correction matrix; and converting the first coordinate value into a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value. The other devices may be a computer, a mobile phone, an image capturing component 120, a tablet computer, a server, etc., and the present embodiment does not limit the types of the other devices.

In addition, the application scenarios of the sound source localization system and the corresponding sound source localization method include, but are not limited to, at least one of the following:

1. and (5) a whistling monitoring scene. That is, the audio acquisition component 110 locates the blast position and triggers the image acquisition component 120 to snap an image of the vehicle at the blast position.

2. And monitoring scenes in a classroom. That is, the audio acquisition component 110 locates the student speaking location and triggers the image acquisition component 120 to snap an image of the student at the speaking location.

3. A video conference scene. That is, the speaking location at which the audio capture component 110 is currently speaking to the participant triggers the image capture component 120 of the participant at that speaking location.

Of course, the sound source localization system and the sound source localization method can also be applied to other similar scenes, and the application is not listed here.

Fig. 2 is a flowchart of a sound source localization method according to an embodiment of the present application, where the method is applied to the sound source localization system shown in fig. 1, and a main execution subject of each step is an audio acquisition component 110 in the system. The method at least comprises the following steps:

step 201, when a target sound signal emitted by a sound source is collected, determining a first coordinate value of the position of the sound source in a first coordinate system.

The first coordinate system is established based on a position of an audio acquisition component used to acquire a target sound signal.

Optionally, the first coordinate system is a three-dimensional coordinate system, and the origin of coordinates is a position of the audio capturing component.

Illustratively, the audio acquisition component may determine a first coordinate value of a position of a sound source of the target sound signal in a first coordinate system using a sound source localization method of a microphone array, the sound source localization method including a beam former based method, a high-resolution spectrum estimation based method, a delay inequality based method, and the like, and the embodiment does not limit the type of the sound source localization method.

Referring to a schematic diagram between a first coordinate system (including a coordinate origin o ', an x' axis, a y 'axis, and a z' axis) and a position of a sound source shown in fig. 3, a first coordinate value of the sound source s in the first coordinate system is

The first coordinate value is represented by a coordinate value of a spherical coordinate system. In practical implementation, the first coordinate value may also be represented by a coordinate value of a cartesian coordinate system, that is:

(

r × cos cos cos θ) the present application does not limit the manner in which the first coordinate value is expressed. Wherein the content of the first and second substances,

the included angle (also called horizontal angle) between the projection of a connecting line between the position of the sound source and the coordinate origin of the first coordinate system on the x 'o' y 'plane and the positive direction of the x' axis; theta is an included angle (also called a pitch angle) between a connecting line between the position of the sound source and the origin of coordinates of the first coordinate system and the z' -axis; r is the length of the connecting line between the position of the sound source and the origin of coordinates of the first coordinate system.

Step 202, obtaining a coordinate transformation relation.

The coordinate transformation relationship is used to transform the position of the sound source from a first coordinate system to a second coordinate system.

The second coordinate system is established based on the position of the audio capture component and the position of the image capture component, i.e., the second coordinate system is a coordinate system common to both the audio capture component and the image capture component. At this time, the coordinate values determined by the audio acquisition component are also applicable to the image acquisition component. The image acquisition assembly is used for acquiring images of the sound source.

The first coordinate system has included angles (alpha ', beta ', gamma ') of three directions relative to the second coordinate system, wherein alpha ' is an included angle of an x ' axis of the first coordinate system relative to an x axis of the second coordinate system; beta 'is the included angle of the y' axis of the first coordinate system relative to the y axis of the second coordinate system; γ 'is the angle of the z' axis of the first coordinate system relative to the z axis of the second coordinate system. Refer to fig. 4, which shows a relative positional relationship diagram between a first coordinate system and a second coordinate system (including a coordinate origin o, an x-axis, a y-axis, and a z-axis).

In one example, the coordinate transformation relationship is stored in a storage medium, such as: the coordinate transformation relation is pre-written in a Read-Only Memory (ROM) through RS232/485 or Ethernet, and the audio acquisition component 110 reads the coordinate transformation relation from the ROM.

In another example, the coordinate transformation relationship is obtained through a plurality of calibration processes. At this time, a coordinate conversion relationship is acquired including: determining a deflection angle of the first coordinate system relative to the second coordinate system; acquiring the origin position of the coordinate origin of the first coordinate system in the second coordinate system; and taking the original point position and the deflection angle as deflection parameters of a preset coordinate conversion formula to obtain a coordinate conversion relation.

Optionally, the coordinate conversion formula comprises:

wherein the content of the first and second substances,

for representing a first coordinate value, where r is the length of the line between the position of the sound source and the origin of coordinates of the first coordinate system,

is the included angle between the projection of the connecting line on the x ' o ' y ' plane of the first coordinate system and the positive direction of the x ' axis, and theta is the included angle between the connecting line and the z ' axis of the first coordinate system; alpha, beta and gamma are deflection angles.

Step 203, acquiring a correction matrix.

The correction matrix is used for correcting errors in coordinate conversion in the coordinate conversion relation.

In one example, a correction matrix is obtained, comprising: when acquiring n times of calibration voice signals emitted from each of m calibration points, determining a first calibration coordinate value of the calibration voice signal on each calibration point in a first coordinate system; converting each first calibration coordinate value to a second coordinate system by using a coordinate conversion relation to obtain a second calibration coordinate value; acquiring real coordinate values of the m calibration points in a second coordinate system; and determining a correction matrix based on the second calibration coordinate value and the real coordinate value. m and n are both positive integers.

Wherein determining a correction matrix based on the second calibration coordinate value and the real coordinate value comprises: acquiring a matrix model of a correction matrix; and multiplying the estimated position matrix formed by each second calibration coordinate value by the matrix model, taking the actual position matrix formed by each corresponding real coordinate value as a product result, and determining the solution of the matrix model based on a least square method to obtain a correction matrix.

When acquiring n times of calibration sound signals emitted from each of m calibration points, determining a first calibration coordinate value of the calibration sound signal at each calibration point in a first coordinate system, including:

and for each calibration point, calculating the average value of the first calibration coordinate values of the calibration sound signals emitted from the calibration point for n times to obtain the first calibration coordinate values of the calibration sound signals on the calibration point in the first coordinate system, wherein n is an integer greater than 1.

Optionally, m and n are both integers greater than 1. Referring to the calibration point location diagram of fig. 5, fig. 5 includes 18 calibration points, and the 18 calibration points are within the effective range for the audio capture assembly to capture audio and the image capture assembly to capture images. The sound source is controlled to emit n times (e.g., 10 times) calibration sound signals at each calibration point, respectively. At this time, the audio acquisition assembly acquires the calibration sound signal emitted from each calibration point. It is assumed that the first calibration coordinate values of the n calibration sound signals obtained at a certain calibration point are represented by the following formula:

averaging the n first calibration coordinate values of each of the m calibration points to obtain the following formula:

the true coordinate values of the respective calibration points are represented by the following formula:

assume that the matrix model of the correction matrix is represented by:

will be provided with

And calculating the solution of the formula based on a least square method to obtain a correction matrix.

In this embodiment, a matrix with a correction matrix of 3 × 3 is taken as an example for explanation, and in actual implementation, the correction matrix may be a matrix with other dimensions, and the dimension of the correction matrix is not limited in this embodiment.

In another example, a correction matrix is obtained, comprising: the stored correction matrix is read. The correction matrix is used for emitting calibration sound signals at m calibration points after the audio acquisition assembly is installed; is determined based on a difference between the real coordinate values and the second calibration coordinate values of the calibration sound signal in the second coordinate system. That is, after the audio capture component is installed, the correction matrix is calculated by the first example and then stored in the storage medium. Such as: stored in the ROM.

And 204, converting the first coordinate value to a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger the image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value.

Since the coordinate conversion relationship takes the first coordinate value as a variable, the second coordinate value can be obtained by inputting the current first coordinate value to the coordinate conversion relationship and multiplying the obtained value by the correction matrix.

Illustratively, α, β, γ in the above coordinate conversion formula are determined by inputting the first coordinate value as

The second coordinate value can be obtained.

In summary, in the sound source positioning method provided in this embodiment, when a target sound signal emitted by a sound source is collected, a first coordinate value of a position of the sound source in a first coordinate system is determined; acquiring a coordinate conversion relation, wherein the coordinate conversion relation is used for converting the position of a sound source from a first coordinate system to a second coordinate system; acquiring a correction matrix; converting the first coordinate value to a second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger the image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value; the problem that when the whistle coordinate position is converted through the coordinate conversion matrix, the converted whistle position monitoring result is not accurate enough due to the inaccuracy of the coordinate conversion matrix can be solved; the correction matrix can correct errors in coordinate conversion of the coordinate conversion relationship, so that the accuracy of the determined position of the sound source can be improved.

In addition, the coordinate conversion relation can be obtained through the deflection condition of the audio acquisition assembly relative to the image acquisition assembly, so that the influence of the installation offset of the audio acquisition assembly on the conversion of the positioning result can be reduced by correcting the coordinate conversion relation.

Fig. 6 is a block diagram of a sound source localization apparatus according to an embodiment of the present application, and this embodiment is described by taking an example of the application of the apparatus to the audio acquisition component 110 in the sound source localization system shown in fig. 1. The device at least comprises the following modules: a first coordinate determination module 610, a transformation relation acquisition module 620, a correction matrix acquisition module 630, and a second coordinate determination module 640.

A first coordinate determination module 610, configured to determine, when a target sound signal emitted by the sound source is acquired, a first coordinate value of a position of the sound source in a first coordinate system, where the first coordinate system is established based on a position of an audio acquisition component, and the audio acquisition component is configured to acquire the target sound signal;

a transformation relation obtaining module 620, configured to obtain a coordinate transformation relation, where the coordinate transformation relation is used to transform the position of the sound source from the first coordinate system to a second coordinate system, where the second coordinate system is established based on the position of the audio acquisition component and the position of an image acquisition component, and the image acquisition component is used to acquire an image of the sound source;

a correction matrix obtaining module 630, configured to obtain a correction matrix, where the correction matrix is used to correct an error when the coordinate conversion is performed on the coordinate conversion relationship;

and a second coordinate determining module 640, configured to convert the first coordinate value to the second coordinate system by using the coordinate conversion relationship and the correction matrix to obtain a second coordinate value, so as to trigger an image acquisition component to acquire an image of a sound source corresponding to the second coordinate value.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the sound source positioning device provided in the above embodiment, when performing sound source positioning, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the sound source positioning device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the sound source positioning device and the sound source positioning method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 7 is a block diagram of a sound source localization apparatus provided in an embodiment of the present application, which may be an apparatus including the audio acquisition component 110 in the sound source localization system shown in fig. 1, such as: a whistling monitoring device, a smartphone, a tablet computer, a laptop computer, a desktop computer, or a server. The sound source localization apparatus may also be referred to as a user equipment, a portable terminal, a laptop terminal, a desktop terminal, a control terminal, etc., which is not limited in this embodiment. The apparatus includes at least a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as: 4 core processors, 8 core processors, etc. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement a sound source localization method as provided by method embodiments herein.

In some embodiments, the sound source positioning device may further include: a peripheral interface and at least one peripheral. The processor 701, memory 702, and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the sound source positioning device may also include fewer or more components, and the embodiment is not limited thereto.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the sound source localization method of the above-mentioned method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the sound source localization method of the above-mentioned method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A sound source localization method, characterized in that the method comprises:

converting the first coordinate value to the second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger an image acquisition assembly to acquire an image of a sound source corresponding to the second coordinate value;

the acquiring a correction matrix includes:

determining the correction matrix based on the second calibration coordinate values and the real coordinate values;

the step of determining the correction matrix based on the second calibration coordinate values and the real coordinate values includes:

acquiring a matrix model of the correction matrix;

2. The method of claim 1, wherein determining a first calibration coordinate value of the calibration sound signal at each of the m calibration points in the first coordinate system when acquiring the n calibration sound signals emitted at each of the m calibration points comprises:

3. The method of claim 1, wherein obtaining the correction matrix comprises:

4. The method according to any one of claims 1 to 3, wherein the obtaining the coordinate transformation relationship comprises:

5. The method of claim 4, wherein the coordinate conversion formula comprises:

wherein the content of the first and second substances,

an included angle between the projection of the connecting line on the x ' o ' y ' plane of the first coordinate system and the positive direction of the x ' axis is shown, and theta is an included angle between the connecting line and the z ' axis of the first coordinate system; α, β, γ being said deflectionAnd (4) an angle.

6. A sound source localization apparatus, comprising:

a correction matrix obtaining module, configured to obtain a correction matrix, where the correction matrix is used to correct an error in coordinate conversion performed by the coordinate conversion relationship; the acquiring a correction matrix includes: when acquiring n times of calibration voice signals emitted from each of m calibration points, determining a first calibration coordinate value of the calibration voice signal on each calibration point in a first coordinate system; both m and n are positive integers; converting each first calibration coordinate value to the second coordinate system by using the coordinate conversion relation to obtain a second calibration coordinate value; acquiring real coordinate values of the m calibration points in the second coordinate system; determining the correction matrix based on the second calibration coordinate values and the real coordinate values; the step of determining the correction matrix based on the second calibration coordinate values and the real coordinate values includes: acquiring a matrix model of the correction matrix; multiplying the estimated position matrix formed by each second calibration coordinate value by the matrix model, taking the actual position matrix formed by each corresponding real coordinate value as a multiplication result, and determining the solution of the matrix model based on a least square method to obtain the correction matrix;

and the second coordinate determination module is used for converting the first coordinate value to the second coordinate system by using the coordinate conversion relation and the correction matrix to obtain a second coordinate value so as to trigger an image acquisition assembly to acquire the image of the sound source corresponding to the second coordinate value.

7. A sound source localization arrangement, the arrangement comprising a processor and a memory; the memory stores therein a program that is loaded and executed by the processor to implement the sound source localization method according to any one of claims 1 to 5.

8. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to implement the sound source localization method according to any one of claims 1 to 5.