CN111276143B

CN111276143B - Sound source positioning method, sound source positioning device, voice recognition control method and terminal equipment

Info

Publication number: CN111276143B
Application number: CN202010072723.8A
Authority: CN
Inventors: 李千伟; 徐林浩; 何天翼
Original assignee: Beijing China Tsp Technology Co ltd
Current assignee: Beijing China Tsp Technology Co ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2023-04-25
Anticipated expiration: 2040-01-21
Also published as: CN111276143A

Abstract

The invention provides a sound source positioning method, a sound source positioning device, a voice recognition control method and terminal equipment, wherein the sound source positioning method comprises the following steps: when the voice wake-up instance is detected to be awakened, determining the number of the voice wake-up instances; when at least two voice awakening examples are awakened, determining voice awakening example results corresponding to each voice awakening example, wherein the voice awakening example results comprise awakening scores and awakening energy; and calculating according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance to obtain the sound source azimuth positioning. The sound source positioning method can distinguish the multipath sound source information, so that the target sound source information is positioned, and a foundation is laid for selecting proper target sound source information to perform voice recognition.

Description

Sound source positioning method, sound source positioning device, voice recognition control method and terminal equipment

Technical Field

The present invention relates to the field of speech recognition, and in particular, to a sound source positioning method, a sound source positioning device, a speech recognition control method, and a terminal device.

Background

With the high-speed development of the internet, the life of people is more and more intelligent, and the cool dazzling functions of the smart phone, the voice control and the like enable the life of people to be more and more convenient. The automobile is not exceptional as an important tool for people to travel daily, and the vehicle-mounted control terminal further embodies the intellectualization of the automobile, wherein the vehicle body is controlled by replacing hands with the language with the highest intellectualization of voice control.

In the conventional voice control, the multi-person conversation scene cannot be perfectly adapted, for example, when a driver in a vehicle wakes up voice, if a driver gives an instruction, the driver is very easy to cause confusion of vehicle control.

Disclosure of Invention

In view of the above, the invention provides a sound source positioning method, a sound source positioning device, a voice recognition control method and terminal equipment, which can distinguish multi-path sound source information so as to position target sound source information, thereby laying a foundation for selecting proper target sound source information to perform voice recognition.

The sound source positioning method comprises the following steps:

when the voice wake-up instance is detected to be awakened, determining the number of the voice wake-up instances;

when at least two voice awakening examples are awakened, determining voice awakening example results corresponding to each voice awakening example, wherein the voice awakening example results comprise awakening scores and awakening energy;

and calculating according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance to obtain the sound source azimuth positioning.

In one embodiment, the sound source localization method further comprises:

when only one voice wake-up example is waken up, calculating the wake-up time interval between the current wake-up time and the last wake-up time;

judging whether the wake-up time interval is larger than a preset time threshold value or not;

if yes, the direction corresponding to the voice wake-up instance is used as the sound source azimuth positioning.

In one embodiment the sound source localization method further comprises:

and returning to the step of determining the number of voice wakeup instances when the voice wakeup instances are detected to be awakened when the wakeup time interval is smaller than or equal to the preset time threshold value.

In one embodiment, the step of obtaining the sound source azimuth positioning according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance comprises the following steps:

determining a wake-up score maximum value in wake-up scores corresponding to all voice wake-up examples and wake-up energy corresponding to the wake-up score maximum value;

determining a wake-up energy maximum value in wake-up energy corresponding to each voice wake-up instance and a wake-up score corresponding to the wake-up energy maximum value;

calculating a wake-up score proportion difference value according to the wake-up score corresponding to the wake-up score maximum value and the wake-up energy maximum value;

calculating a wake-up energy proportion difference value according to the wake-up energy corresponding to the wake-up energy maximum value and the wake-up score maximum value;

and determining the sound source azimuth positioning according to the wake-up score proportion difference value and the wake-up energy proportion difference value.

In one embodiment the step of determining the sound source position location from the wake-up score ratio difference and the wake-up energy ratio difference comprises:

judging whether the wake-up score proportion difference value is larger than a first preset proportion threshold value and whether the wake-up energy proportion difference value is smaller than a second preset proportion threshold value, wherein the first preset proportion threshold value is larger than the second preset proportion threshold value;

if yes, the beam direction of the voice wake-up instance corresponding to the maximum value of the wake-up score is used as the sound source azimuth positioning;

if not, the beam direction of the voice wake-up example corresponding to the maximum wake-up energy value is used as the sound source azimuth positioning.

In one embodiment, the above sound source localization method is applied to a vehicle-mounted control terminal, and the above sound source localization method further includes:

creating a narrow beam noise reduction instance;

acquiring initial audio information and sending the initial audio information to a narrow beam noise reduction instance;

and waking up the corresponding voice wake-up instance according to the first audio information output by the narrow-beam noise reduction instance.

A sound source localization apparatus, the sound source localization apparatus comprising:

the number acquisition unit is used for determining the number of the voice wakeup instances when the voice wakeup instances are detected to be awakened;

the score and energy acquisition unit is used for determining voice awakening example results corresponding to each voice awakening example, wherein the voice awakening example results comprise awakening scores and awakening energy;

and the sound source position determining unit is used for calculating and obtaining sound source position positioning according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance.

A voice recognition control method adopts a sound source positioning method, and the voice recognition control method further comprises the following steps:

setting the beam direction corresponding to the created narrow beam noise reduction example according to the sound source azimuth positioning;

creating a speech recognition instance;

and sending the second audio information output by the narrow-beam noise reduction example to the voice recognition example to obtain a corresponding voice recognition result.

A terminal device comprising a memory for storing a computer program and a processor running the computer program to cause the terminal device to perform a sound source localization method.

A readable storage medium storing a computer program which, when executed by a processor, performs a sound source localization method.

According to the sound source positioning method, when the voice awakening examples are detected to be awakened, the number of the voice awakening examples is determined, when at least two paths of voice awakening examples are awakened, voice awakening example results corresponding to each path of voice awakening examples are determined, the voice awakening example results comprise awakening scores and awakening energies, sound source azimuth positioning is obtained according to the awakening scores and the awakening energies corresponding to the paths of voice awakening examples, multipath sound source information can be distinguished, and therefore target sound source information can be positioned, and a foundation is laid for selecting proper target sound source information to conduct voice recognition.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope of the present invention. Like elements are numbered alike in the various figures.

FIG. 1 is a flow chart of a method of sound source localization provided in one embodiment;

FIG. 2 is a flow chart of a method for locating a sound source according to another embodiment;

FIG. 3 is a flow chart of a method for determining the location of a sound source provided in another embodiment;

FIG. 4 is a flow chart of a method for determining the location of a sound source provided in yet another embodiment;

FIG. 5 is a flow chart of a sound source localization method provided in yet another embodiment;

FIG. 6 is a block diagram of a sound source localization apparatus provided in one embodiment;

fig. 7 is a flowchart of a voice recognition control method according to an embodiment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

Hereinafter, various embodiments of the present disclosure will be more fully described. The present disclosure is capable of various embodiments and of modifications and variations therein. However, it should be understood that: there is no intention to limit the various embodiments of the disclosure to the specific embodiments disclosed herein, but rather the disclosure is to be interpreted to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of the various embodiments of the disclosure.

The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present invention, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the invention belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the invention.

As shown in fig. 1, a flow chart of a sound source localization method is provided, and the sound source localization method includes the following steps:

step S110, when the voice wake-up instance is detected to be awakened, the number of the voice wake-up instances is determined.

The sound source localization method is often suitable for scenes with multiple paths of sound source information, such as a vehicle-mounted application terminal of a vehicle, wherein the vehicle is often provided with multiple paths of sound source information, and only one path of sound source information is target sound source information, so that when the voice wake-up example is detected to be wake-up, the number of the voice wake-up examples is determined, and further processing is performed.

Wherein the voice wake instance, after being created by the system, typically further processes the incoming source audio information.

Step S120, when at least two voice wake-up examples are wake-up, determining a voice wake-up example result corresponding to each voice wake-up example, wherein the voice wake-up example result comprises a wake-up score and wake-up energy.

When at least two voice wake-up examples are wake-up, each voice is usually corresponding to one voice wake-up example, and each voice wake-up example is usually required to calculate and compare each voice with preset voice information in a database so as to determine the matching degree corresponding to each voice wake-up example, and further determine a corresponding wake-up score value according to the corresponding matching degree; in addition, the audio energy value corresponding to each voice wake-up example is determined to obtain the corresponding wake-up energy, and the corresponding voice wake-up example result is obtained according to the wake-up score and the wake-up energy.

And step S130, obtaining the sound source azimuth positioning according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance.

And the corresponding sound source azimuth positioning can be obtained by further calculating according to the wake-up score and the wake-up energy corresponding to each voice wake-up example.

In one embodiment, as shown in fig. 2, the above sound source localization method further includes:

in step S140, when only one voice wake-up instance is woken up, a wake-up time interval between the current wake-up time and the last wake-up time is calculated.

When only one voice wake-up example is waken up, only the wake-up time interval between the current wake-up time and the last wake-up time is calculated, because only one sound source information is needed at this time, only whether the wake-up time interval between the current wake-up time and the last wake-up time is reasonable or not is further judged, for example, whether the wake-up time interval is too short or not is judged.

Step S150, judging whether the wake-up time interval is larger than a preset time threshold, if yes, proceeding to step S160.

Step S160, the corresponding direction of the voice wake-up instance is used as the sound source azimuth positioning.

When the wake-up time interval is greater than a preset time threshold, it indicates that the wake-up time interval is normal, and the direction corresponding to the voice wake-up instance needs to be used as the sound source azimuth positioning.

In one embodiment, the preset time interval is 300ms, which may be set according to specific practical needs.

In one embodiment, the above sound source localization method further includes:

when the wake-up time interval is less than or equal to the preset time threshold, the step S110 is returned.

When the wake-up time interval is less than or equal to the preset time threshold, it indicates that the wake-up time interval is abnormal, and the too short time interval is usually an abnormal condition, and the process needs to return to step S110.

In one embodiment, the vehicle includes a child passenger, and the child usually has a simulation habit, after the main driving sends out the control command, the child quickly simulates the sound source to send out, so that the disturbance of the vehicle control is very easy to be caused at the moment, when the wake-up time interval is less than or equal to the preset time threshold, the processing is not performed at the moment, and the step S110 is finished, so that the system continues to wait for the sound source information of the next effective control command.

In one embodiment, as shown in fig. 3, step S130 includes:

s131, determining a wake-up score maximum value and wake-up energy corresponding to the wake-up score maximum value in wake-up scores corresponding to each voice wake-up instance.

The wake-up scores corresponding to the voice wake-up examples are compared one by one to determine the corresponding wake-up score maximum value, so that the wake-up energy corresponding to the wake-up score maximum value can be determined.

S132, determining the maximum value of the wake-up energy in the wake-up energy corresponding to each voice wake-up instance and the wake-up score corresponding to the maximum value of the wake-up energy.

The wake-up energy corresponding to each voice wake-up example is compared one by one to determine the corresponding wake-up energy maximum value, and further the wake-up score corresponding to the wake-up energy maximum value can be determined.

S133, calculating a wake-up score proportion difference value according to the wake-up score corresponding to the wake-up score maximum value and the wake-up energy maximum value.

The absolute value of the difference between the wake-up scores corresponding to the maximum value of the wake-up scores and the maximum value of the wake-up energy is usually required to be obtained, and then the corresponding absolute value is compared with the maximum value of the wake-up scores to obtain the corresponding wake-up score proportion difference.

S134, calculating a wake-up energy proportion difference value according to the wake-up energy corresponding to the wake-up energy maximum value and the wake-up score maximum value.

The absolute value of the difference between the wake-up energy maximum value and the wake-up energy corresponding to the wake-up score maximum value is usually required to be obtained, and then the corresponding absolute value is compared with the wake-up energy maximum value to obtain the corresponding wake-up energy proportion difference.

S135, determining the sound source azimuth positioning according to the wake-up score proportion difference value and the wake-up energy proportion difference value.

The wake-up score proportion difference value and the wake-up energy proportion difference value are comprehensively considered, so that the sound source azimuth positioning can be further determined.

In one embodiment, as shown in fig. 4, step S135 includes:

s135a, judging whether the wake-up score proportion difference value is larger than a first preset proportion threshold value and the wake-up energy proportion difference value is smaller than a second preset proportion threshold value, if so, entering step S135b, and if not, entering step S135c.

When evaluating the result of each voice wake-up example, it is first required to ensure that the voice matching degree of each voice path meets the condition, in other words, when the wake-up score proportion difference is greater than the first preset proportion threshold, it is indicated that the wake-up score corresponding to the wake-up energy maximum is too low compared with the wake-up score maximum, the difference is too large, the corresponding voice matching degree is relatively low, at this moment, it is further required to determine whether the wake-up energy proportion difference between the wake-up energy corresponding to the wake-up score maximum and the wake-up energy maximum is too large, if the wake-up score proportion difference is smaller than the second preset proportion threshold, it is indicated that the corresponding wake-up energy meets the requirement, and further the voice wake-up example result corresponding to the wake-up score maximum also meets the requirement.

Step S135b, the beam direction of the voice wake-up instance corresponding to the maximum wake-up score is used as the sound source azimuth positioning.

After the preset condition in step S135a is satisfied, the beam direction of the voice wake-up instance corresponding to the maximum wake-up score may be preferably used as the sound source azimuth positioning.

Step S135c, the beam direction of the voice wake-up instance corresponding to the maximum wake-up energy is used as the sound source azimuth positioning.

After the preset condition in step S135a is not satisfied, the beam direction of the voice wake-up instance corresponding to the wake-up energy maximum value may be directly used as the sound source azimuth positioning, because when the preset condition in step S135a is not satisfied, the possibility that the beam direction of the voice wake-up instance corresponding to the wake-up energy maximum value is the direction corresponding to the target sound source is the greatest.

In one embodiment, the above sound source localization method is applied to a vehicle-mounted control terminal, and for a vehicle, there are a main driving and a secondary driving, and generally within an acquisition range, there are two voice wake-up examples, assuming that a left voice wake-up example result is leftReful, a left wake-up score is represented by leftReful. Score, left wake-up energy is represented by leftReful. Power, a right voice wake-up example result is assumed to be iotaght tReful, a right wake-up score is represented by iotaght. Score, and right wake-up energy is represented by iotaght tReful. Power.

Wherein, the score values of the wake-up results are compared firstly, namely, the leftREDULAST. Score and the lightREDULAST. Score are compared, the wake-up score of the wake-up result (leftREDULAST or lightRELAST) with higher score is saved to MaxScore. NScore and the corresponding audio energy is saved to MaxScore. FPower;

then, comparing the energy values of the wake-up results, namely comparing the leftResult.power with the lightResult.power, and saving the wake-up score of the wake-up result (leftResult or lightResult) with higher energy to MaxPower.nScore and saving the corresponding audio energy to MaxPower.fPower;

calculating a wake score ratio difference score diffnormal=fabs (maxscore. Nsccore-maxpower. Nsccore)/maxscore. Nsccore;

calculating a wake-up energy ratio difference powerdiffnormal=fabs (maxpower. Fpower-maxscore. Fpower)/maxpower. Fpower);

if the ScoerDiffnormal >0.25 is satisfied and the PowerDiffnormal <0.15f is satisfied, the beam direction of the voice wake-up instance corresponding to the maximum value of the wake-up score is used as the sound source azimuth positioning; otherwise, the beam direction of the voice wake-up example corresponding to the maximum value of the wake-up energy is used as the sound source azimuth positioning.

Wherein the first preset ratio threshold is 0.25 and the second preset ratio threshold is 0.15.

By adopting the sound source positioning method, the sound source direction (primary driving or secondary driving) can be accurately positioned, sound in other directions is prevented from being recorded, interference of noisy sounds is avoided, and a foundation is laid for a subsequent voice recognition process.

In one embodiment, as shown in fig. 5, the above sound source localization method is applied to a vehicle-mounted control terminal, and the above sound source localization method further includes:

step S170, a narrow beam noise reduction instance is created.

Wherein the voice wakeup instance needs to be created and awakened before the voice wakeup instance is detected, generally, a narrow beam noise reduction instance needs to be created first.

Step S180, the initial audio information is acquired and sent to a narrow beam noise reduction instance.

After the narrow beam noise reduction example is created and started, the system sends the initial audio information to the narrow beam noise reduction example for further processing to preliminarily filter noise impurities.

Step S190, waking up the corresponding voice wake-up instance according to the first audio information output by the narrow beam noise reduction instance.

In one embodiment, the narrow beam noise reduction instance outputs four paths of audio information, wherein the first path of audio information and the second path of audio information are output as the first audio information to the corresponding voice wake-up instance to wake up and start the corresponding voice wake-up instance, and each path of voice wake-up instance corresponds to one path of audio information.

Further, as shown in fig. 6, there is also provided a sound source localization apparatus 200, the sound source localization apparatus 200 including:

the number acquisition unit 210 is configured to determine the number of voice wakeup instances when it is detected that the voice wakeup instances are awakened.

The score and energy obtaining unit 220 is configured to determine a voice wake instance result corresponding to each voice wake instance, where the voice wake instance result includes a wake score and wake energy.

The sound source position determining unit 230 is configured to calculate a sound source position according to the wake-up score and the wake-up energy corresponding to each voice wake-up instance.

In addition, as shown in fig. 7, there is also provided a voice recognition control method, a sound source localization method adopted by the voice recognition control method, the voice recognition control method further comprising:

step S310, setting the beam direction corresponding to the created narrow beam noise reduction example according to the sound source azimuth positioning.

After the sound source positioning direction is obtained, the beam direction corresponding to the created narrow beam noise reduction example can be set according to the sound source azimuth positioning to lay a foundation for the subsequent voice recognition process.

Step S320, a speech recognition instance is created.

Step S330, the second audio information output by the narrow beam noise reduction instance is sent to the speech recognition instance to obtain a corresponding speech recognition result.

In one embodiment, the narrow-beam noise reduction example outputs four paths of audio information, and the third path of audio information and the fourth path of audio information output by the narrow-beam noise reduction example are used as second audio information to be sent to the voice recognition example to perform a voice recognition process so as to obtain a corresponding voice recognition result.

According to the voice recognition method, on the basis of the voice localization method, the sound source azimuth localization is obtained according to the wake-up score and the wake-up energy corresponding to each voice wake-up example, and the multipath sound source information can be distinguished, so that the target sound source information is localized, and then the appropriate target sound source information is selected for voice recognition to obtain the corresponding voice recognition result, so that the sound source direction (primary driving or secondary driving) can be precisely localized, the voice in other directions is prevented from being recorded, the interference of noisy sounds is avoided, the voice recognition rate is improved, and the user experience is increased.

Furthermore, a terminal device is provided, comprising a memory for storing a computer program and a processor for running the computer program to cause the terminal device to execute the sound source localization method.

Further, a readable storage medium storing a computer program is provided, which when executed by a processor performs a sound source localization method.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules or units in various embodiments of the invention may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention.

Claims

1. A sound source localization method, characterized in that the sound source localization method comprises:

when at least two voice wake-up examples are waken, determining voice wake-up example results corresponding to each voice wake-up example, wherein the voice wake-up example results comprise wake-up scores and wake-up energy;

calculating a wake-up energy proportion difference value according to the wake-up energy maximum value and the wake-up energy corresponding to the wake-up score maximum value;

2. The sound source localization method of claim 1, further comprising:

if yes, the corresponding direction of the voice wake-up instance is used as the sound source azimuth positioning.

3. The sound source localization method according to claim 2, further comprising:

and returning to the step of determining the number of the voice wakeup instances when the voice wakeup instances are detected to be awakened when the wakeup time interval is smaller than or equal to the preset time threshold value.

4. The sound source localization method of claim 1, wherein the determining the sound source location from the wake-up score ratio difference and the wake-up energy ratio difference comprises:

if yes, the beam direction of the voice wake-up instance corresponding to the maximum wake-up score is used as the sound source azimuth positioning;

if not, the beam direction of the voice wake-up example corresponding to the wake-up energy maximum value is used as the sound source azimuth positioning.

5. The sound source localization method according to claim 1, being applied to an in-vehicle control terminal, the sound source localization method further comprising:

creating a narrow beam noise reduction instance;

acquiring initial audio information and sending the initial audio information to the narrow beam noise reduction instance;

and waking up a corresponding voice wake-up instance according to the first audio information output by the narrow-beam noise reduction instance.

6. A sound source localization device, the sound source localization device comprising:

the score and energy acquisition unit is used for determining voice awakening example results corresponding to each path of voice awakening examples, wherein the voice awakening example results comprise awakening scores and awakening energy;

the sound source position determining unit is used for determining a wake-up score maximum value in wake-up scores corresponding to all voice wake-up examples and wake-up energy corresponding to the wake-up score maximum value;

7. A speech recognition control method, characterized in that the speech recognition control method adopts the sound source localization method according to any one of the above claims 1 to 5, the speech recognition control method further comprising:

setting a beam direction corresponding to the created narrow beam noise reduction example according to the sound source azimuth positioning;

creating a speech recognition instance;

8. A terminal device comprising a memory for storing a computer program and a processor that runs the computer program to cause the terminal device to perform the sound source localization method of any one of claims 1 to 5.

9. A readable storage medium, characterized in that the readable storage medium stores a computer program which, when executed by a processor, performs the sound source localization method of any one of claims 1 to 5.