CN112669865A

CN112669865A - Switching method, device and equipment of main microphone and readable storage medium

Info

Publication number: CN112669865A
Application number: CN202110278261.XA
Authority: CN
Inventors: 廖焕柱; 杨国全; 王克彦; 曹亚曦
Original assignee: Zhejiang Huachuang Video Signal Technology Co Ltd
Current assignee: Zhejiang Huachuang Video Signal Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-04-16
Anticipated expiration: 2041-03-16
Also published as: CN112669865B

Abstract

The invention discloses a switching method, a device, equipment and a readable storage medium of a main microphone, wherein the method comprises the following steps: acquiring audio data for playing by a loudspeaker to obtain first audio data, acquiring the audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquiring noise frequency domain energy corresponding to each microphone; preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data; calculating first frequency domain energy of the first frequency domain data and second frequency domain energy of each second frequency domain data; when the first frequency domain energy is smaller than a first preset threshold value, selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones; and when the current main microphone is determined to be different from the candidate main microphone, switching the candidate main microphone into the main microphone.

Description

Switching method, device and equipment of main microphone and readable storage medium

Technical Field

The present application relates to the field of signal processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for switching a main microphone.

Background

With the popularization of 5G networks, the demand of network video conferences is larger and larger. In network video conferencing, video conferencing terminals and telephone conferencing terminals are typically used in conjunction with a microphone-speaker integrated device. Microphone speaker unitary devices are shown in fig. 1, having multiple microphones and a speaker. The microphone and loudspeaker integrated device has the functions of receiving sound, playing the received sound through the loudspeaker and collecting local speaking sound through the microphone. The conventional switching scheme of the main microphone is as follows: and calculating the root mean square value of the picked signal intensity of each microphone, comparing the root mean square values with each other, selecting a candidate main microphone, and switching the candidate main microphone into the main microphone. However, in this method, the switching of the primary microphone is performed without considering whether the speaker is playing audio data, which easily causes the microphone with the strongest echo signal to be selected as the primary microphone, and the noise of the microphone is not considered, which easily causes the selected primary microphone to be the microphone with the largest noise.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device, and a readable storage medium for switching a primary microphone, so as to solve the technical problem that an existing primary microphone switching method easily causes a microphone with a strongest echo signal to be selected as a primary microphone and/or easily causes the selected primary microphone to be a microphone with the largest noise.

In order to solve the above problem, in a first aspect, an embodiment of the present invention provides a method for switching a main microphone, including: acquiring audio data for playing by a loudspeaker to obtain first audio data, acquiring the audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquiring noise frequency domain energy corresponding to each microphone; preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data; calculating first frequency domain energy of the first frequency domain data and second frequency domain energy of each second frequency domain data; when the first frequency domain energy is smaller than a first preset threshold value, selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones; and when the current main microphone is determined to be different from the candidate main microphone, switching the candidate main microphone into the main microphone.

Optionally, when audio data for speaker playing is collected, the number of frames of the collected audio data is 1 frame; when the audio data collected by each microphone is collected, the number of frames of the collected audio data is 1 frame.

Optionally, the obtaining noise frequency domain energy corresponding to each microphone includes: for each microphone: if the number of frames of the audio data acquired by the microphone is greater than or equal to the preset number, determining the latest frame with the preset number in the acquired frames, preprocessing each frame of the audio data with the latest preset number to obtain corresponding third frequency domain data, calculating third frequency domain energy of the third frequency domain data, and taking the minimum third frequency domain energy as noise frequency domain energy corresponding to the microphone.

Optionally, when the first frequency domain energy is smaller than a first preset threshold, selecting one candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones, including: when the first frequency domain energy is smaller than a first preset threshold value and at least one second frequency domain energy is larger than a second preset threshold value, calculating the signal-to-noise ratio corresponding to each microphone according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy; selecting a microphone with the largest signal-to-noise ratio as a candidate main microphone; or when the first frequency domain energy is smaller than a first preset threshold value and each second frequency domain energy is smaller than a second preset threshold value, selecting the microphone with the minimum noise frequency domain energy as a candidate main microphone.

Optionally, after determining that the current primary microphone is not the same as the candidate primary microphone and before switching the candidate primary microphone to the primary microphone, the switching method of the primary microphone further includes: updating the count corresponding to the candidate main microphone to be the current count plus one; if the updated count reaches a threshold, switching the candidate main microphone to the main microphone; and if the updated count is smaller than the threshold value, returning to execute the steps of collecting audio data used for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

Optionally, after switching the candidate primary microphone to the primary microphone, the switching method of the primary microphone further includes: and resetting the count corresponding to each candidate main microphone except the main microphone, returning to execute the steps of collecting audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

Optionally, the switching method of the main microphone further includes: and when the first frequency domain energy is larger than a first preset threshold value, resetting the count corresponding to each selected candidate main microphone, returning to execute the steps of collecting audio data for playing by a loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining the noise frequency domain energy corresponding to each microphone.

In a second aspect, an embodiment of the present invention further provides a switching device for a main microphone, including: the acquisition unit is used for acquiring audio data used for playing by a loudspeaker to obtain first audio data, acquiring the audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquiring noise frequency domain energy corresponding to each microphone; the preprocessing unit is used for preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data; a first calculating unit, configured to calculate first frequency domain energy corresponding to the first frequency domain data and second frequency domain energy corresponding to each second frequency domain data; the second calculation unit is used for selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones when the first frequency domain energy is smaller than a first preset threshold; and the switching unit is used for switching the candidate main microphone into the main microphone when the current main microphone is determined to be different from the candidate main microphone.

In a third aspect, an embodiment of the present invention provides a microphone-speaker integrated apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method of switching a primary microphone as in the first aspect or any implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for switching the main microphone according to the first aspect or any implementation manner of the first aspect.

According to the switching method, the switching device, the switching equipment and the readable storage medium of the main microphone provided by the embodiment of the invention, as the playing sound of the loudspeaker playing the audio data can be collected by each microphone, the selection of the candidate main microphone can be carried out by calculating the first frequency domain energy corresponding to the loudspeaker and under the condition that the first frequency domain energy is smaller than the first preset threshold value, namely the loudspeaker does not play the audio data, so that the selected candidate main microphone can be prevented from being the microphone with the strongest echo signal, and further the switched main microphone can be prevented from being the microphone with the strongest echo signal; and the noise frequency domain energy corresponding to each microphone is acquired, the second frequency domain energy corresponding to each microphone is calculated, and a candidate main microphone is selected from the microphones according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy, so that the candidate main microphone can be selected when the microphones are in different states, the sound quality of the selected candidate main microphone is the highest, the selected candidate main microphone can be prevented from being the microphone with the largest noise, and the main microphone after switching is prevented from being the microphone with the largest noise.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Fig. 1 is a schematic structural diagram of a microphone-speaker integrated device according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating a switching method of a main microphone according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a switching device of a main microphone according to an embodiment of the present invention;

fig. 4 is a schematic hardware configuration diagram of another microphone-speaker integrated device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a switching method of a main microphone, which is applied to microphone and loudspeaker integrated equipment shown in figure 1. The microphone-speaker integrated apparatus includes a plurality of microphones 11 and a speaker 12. Fig. 1 only shows 4 microphones by way of example, but not limited thereto, and in the embodiment of the present invention, the number of microphones of the microphone-speaker integrated device is not limited. As shown in fig. 2, the switching method of the main microphone includes:

s101, collecting audio data used for playing by a loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

Specifically, the speaker 12 is used to play audio data transmitted from a remote end. The microphone 11 is used to collect local audio data. The audio data collected by each microphone 11 includes a local speaking sound and/or a playback sound (echo) when the speaker plays the audio data. The microphone 11 may be a directional microphone. In the embodiment of the invention, for the microphone and loudspeaker integrated equipment shown in fig. 1, four directional microphones with the directional angles of 90-100 degrees can be used for sound pickup in four directions, the four directional microphones are spaced by 90 degrees to form a square matrix, and the four directional microphones cover 360-degree omnidirectional sound pickup. The directivity of the directional microphone is 8. By adopting the 8-shaped structure, the sound in the direction (180-270 degrees) of the loudspeaker 12 can be suppressed, and the suppressed decibel is larger than 10db, so that the signal-to-return ratio of the microphone (the ratio of the decibel of the local speaking sound collected by the microphone 11 to the decibel of the sound being played in the loudspeaker 12 collected by the microphone 11) can be improved. The audio data is collected for playback by speaker 12 at the time or prior to the audio data being played by speaker 12. The audio data collected by each microphone 11 is collected at the time when each microphone 11 collects the audio data or after the audio data is collected. The noise frequency domain energy corresponding to each microphone is used for representing the noise magnitude corresponding to each microphone. The noise frequency domain energy corresponding to each microphone can be obtained by analyzing and calculating the audio data collected by each microphone within a period of time most recent before the current moment.

S102, preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data; specifically, the method for preprocessing the first audio data and each second audio data may include: firstly, windowing is carried out on the first audio data and each second audio data, then fast Fourier transform and self-adaptive filtering processing are carried out, and the first audio data and each second audio data are converted into corresponding first frequency domain data and each second frequency domain data.

S103, calculating first frequency domain energy of the first frequency domain data and second frequency domain energy of each second frequency domain data; specifically, the squares of the amplitudes corresponding to the frequencies in the first frequency domain data may be accumulated to obtain the first frequency domain energy of the first frequency domain data. And accumulating the squares of the amplitudes corresponding to the frequencies in the second frequency domain data to obtain second frequency domain energy of the second frequency domain data. The first frequency domain energy may be used to distinguish whether the speaker 12 is in a state of playing audio data or a mute state. The second frequency domain energy may be used to distinguish whether the microphone 11 is in a state of acquiring audio data or a mute state.

S104, when the first frequency domain energy is smaller than a first preset threshold value, selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones; specifically, the first preset threshold may be obtained empirically. When the first frequency domain energy is less than the first predetermined threshold, it indicates that the speaker 12 is in a mute state. When the loudspeaker 12 is in a mute state, at this time, whether each microphone 11 is in an audio data collecting state or in a mute state can be judged through the second frequency domain energy corresponding to each microphone 11, the noise magnitude of each microphone 11 can be determined through the noise frequency domain energy corresponding to each microphone 11, and the signal-to-noise ratio corresponding to each microphone 11 can be determined through the ratio of the second frequency domain energy corresponding to each microphone 11 to the noise frequency domain energy. A candidate primary microphone can thus be determined from the microphones 11 and selected according to the state of the microphones, the noise level of the microphones and the signal-to-noise ratio of the microphones.

In the embodiment of the present invention, a candidate primary microphone is selected when the first frequency domain energy is smaller than the first preset threshold, which can ensure that the selected candidate primary microphone is selected when the speaker 12 is in a mute state, and can avoid selecting one microphone capable of receiving the strongest echo (i.e., the microphone with the strongest echo signal) as the candidate primary microphone, thereby avoiding selecting the microphone with the strongest echo signal as the primary microphone.

And S105, when the current main microphone is determined to be different from the candidate main microphone, switching the candidate main microphone into the main microphone.

According to the switching method of the main microphones provided by the embodiment of the invention, as the playing sound of the loudspeaker when playing the audio data can be collected by each microphone, the selection of the candidate main microphone can be carried out by calculating the first frequency domain energy corresponding to the loudspeaker and under the condition that the first frequency domain energy is smaller than the first preset threshold value, namely the loudspeaker does not play the audio data, so that the selected candidate main microphone can be prevented from being the microphone with the strongest echo signal, and further the main microphone after switching is prevented from being the microphone with the strongest echo signal; and the noise frequency domain energy corresponding to each microphone is acquired, the second frequency domain energy corresponding to each microphone is calculated, and a candidate main microphone is selected from the microphones according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy, so that the candidate main microphone can be selected when the microphones are in different states, the sound quality of the selected candidate main microphone is the highest, the selected candidate main microphone can be prevented from being the microphone with the largest noise, and the main microphone after switching is prevented from being the microphone with the largest noise.

In an alternative embodiment, when audio data for speaker playback is collected in order to improve the switching sensitivity of the main microphone, the number of frames of collected audio data is 1 frame. When the audio data collected by each microphone is collected, the number of frames of the collected audio data is also 1 frame. I.e. the first audio data/second audio data is 1 frame. Preferably, the duration of one frame is 10 ms.

In an alternative embodiment, in step S101, the step of obtaining noise frequency domain energy corresponding to each microphone may include: for each microphone: if the number of frames of the audio data acquired by the microphone is greater than or equal to the preset number, determining the latest frame with the preset number in the acquired frames, preprocessing each frame of the audio data with the latest preset number to obtain corresponding third frequency domain data, calculating third frequency domain energy of the third frequency domain data, and taking the minimum third frequency domain energy as noise frequency domain energy corresponding to the microphone.

Specifically, when the audio data for playing through the speaker and the audio data collected through each microphone are collected, the collected audio data are both 1 frame, so that the calculated frequency domain energy of each second frequency domain data is the energy of the corresponding 1 frame of audio data. Therefore, the noise frequency domain energy corresponding to each microphone may correspond to the energy of 1 frame of audio data. For each microphone 11, when the number of frames of the audio data acquired by the microphone is greater than or equal to the preset number, the smallest third frequency domain energy may be selected from the third frequency domain energies corresponding to each frame of the audio data of the latest preset number of frames as the noise frequency domain energy corresponding to the microphone.

In the embodiment of the invention, when the noise frequency domain energy corresponding to each microphone is obtained, because each microphone may be unstable in the early stage, the noise frequency domain energy corresponding to each microphone is collected only when the frame number of the audio data collected by the microphone is greater than or equal to the preset number by counting the frame number of the audio data collected by the microphone, so that the main microphone can be switched after the frame number of the audio data collected by each microphone is greater than or equal to the preset number, namely after each microphone is stable, and the accuracy of the main microphone during switching can be ensured. And selecting the minimum third frequency domain energy from the third frequency domain energies corresponding to each frame of audio data of the latest preset number of frames as the noise frequency domain energy corresponding to the microphone, so that more accurate noise frequency domain energy can be obtained.

In an optional embodiment, in step S104, when the first frequency domain energy is smaller than the first preset threshold, selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones may include: when the first frequency domain energy is smaller than a first preset threshold value and at least one second frequency domain energy is larger than a second preset threshold value, calculating the signal-to-noise ratio corresponding to each microphone according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy; selecting a microphone with the largest signal-to-noise ratio as a candidate main microphone; or when the first frequency domain energy is smaller than a first preset threshold value and each second frequency domain energy is smaller than a second preset threshold value, selecting the microphone with the minimum noise frequency domain energy as a candidate main microphone.

Specifically, when the first frequency domain energy is smaller than the first preset threshold, it indicates that the speaker 12 is in a mute state. When at least one second frequency domain energy is larger than a second preset threshold value, it indicates that at least one microphone 11 collects local speaking sound, and it indicates that the local speaking sound exists, the signal-to-noise ratio of each microphone 11 can be judged, and the microphone 11 with the largest signal-to-noise ratio is selected as a candidate main microphone. The calculation formula of the signal-to-noise ratio is as follows: SNR = Ps/Pn, SNR is the signal-to-noise ratio, Ps is the second frequency domain energy, and Pn is the noise frequency domain energy.

When the energy of each second frequency domain is smaller than the second preset threshold, it indicates that each microphone 11 does not collect the local speaking sound, and it indicates that there is no speaking locally, the noise magnitudes of the microphones 11 may be compared, that is, the noise frequency domain energies of the microphones may be compared, and the microphone 11 with the smallest noise frequency domain energy is selected as a candidate main microphone.

In the embodiment of the invention, when the first frequency domain energy is smaller than a first preset threshold and at least one second frequency domain energy is larger than a second preset threshold, calculating the signal-to-noise ratio corresponding to each microphone according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy; selecting a microphone with the largest signal-to-noise ratio as a candidate main microphone, wherein the sound quality of the selected candidate main microphone is the highest; when the first frequency domain energy is smaller than the first preset threshold and each second frequency domain energy is smaller than the second preset threshold, the microphone with the minimum noise frequency domain energy is selected as a candidate main microphone, the noise of the selected candidate main microphone is minimum, and the microphone with the maximum noise can be prevented from being selected.

In an optional embodiment, in step S105, after determining that the current primary microphone is not the same as the candidate primary microphone and before switching the candidate primary microphone to the primary microphone, the method for switching the primary microphone further includes: updating the count corresponding to the candidate main microphone to be the current count plus one; if the updated count reaches a threshold, switching the candidate main microphone to the main microphone; and if the updated count is smaller than the threshold value, returning to execute the steps of collecting audio data used for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

Specifically, the switching of the primary microphone may be performed by counting the candidate primary microphones. That is, after selecting a candidate main microphone each time and determining that the current main microphone is different from the candidate main microphone, the count corresponding to the candidate main microphone selected each time is updated to be the current count plus one, and if the updated count of the candidate main microphone selected currently reaches the threshold, the candidate main microphone selected currently is switched to be the main microphone. And if the updated count of the currently selected candidate main microphone is smaller than the threshold value, returning to the step of collecting audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, acquiring noise frequency domain energy corresponding to each microphone, and continuously selecting one candidate main microphone from each microphone.

In the embodiment of the invention, after the current main microphone is determined to be different from the candidate main microphone and before the candidate main microphone is switched to the main microphone, the main microphone is switched by counting the candidate main microphones, so that the jitter processing during the switching of the main microphone can be realized, and the frequent switching of the main microphone is avoided.

In an optional embodiment, after step S105, the method for switching the main microphone further includes: and resetting the count corresponding to each candidate main microphone except the main microphone, returning to execute the steps of collecting audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

Specifically, after the main microphone is selected from the candidate main microphones, since the count corresponding to each candidate main microphone assumed by the main microphone still exists, in order to prevent frequent switching of the main microphones, the count corresponding to each candidate main microphone other than the main microphone should be reset so that the count corresponding to each candidate main microphone other than the main microphone is zero. And returning to the step of collecting the audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone, so that a candidate main microphone can be continuously selected from each microphone to realize the next switching of the main microphone.

In an optional embodiment, the switching method of the main microphone further comprises: and when the first frequency domain energy is larger than a first preset threshold value, resetting the count corresponding to each selected candidate main microphone, returning to execute the steps of collecting audio data for playing by a loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining the noise frequency domain energy corresponding to each microphone.

Specifically, when the first frequency domain energy is greater than the first preset threshold, it indicates that the speaker 12 is playing audio data, and is in a playing state, in order to prevent the selected candidate primary microphone from being the microphone 11 that receives the strongest echo, that is, the microphone 11 that has the strongest echo signal, at this time, one candidate primary microphone is not selected, and the count corresponding to each selected candidate primary microphone is reset. And returning to the step of collecting the audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone, so that a candidate main microphone can be continuously selected from each microphone to realize the next switching of the main microphone.

An embodiment of the present invention further provides a switching device for a main microphone, as shown in fig. 3, including:

the acquiring unit 21 is configured to acquire audio data for playing by a speaker to obtain first audio data, acquire audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquire noise frequency domain energy corresponding to each microphone; the detailed description of the specific implementation manner is given in step S101 of the above method embodiment, and is not repeated herein.

A preprocessing unit 22, configured to preprocess the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data; the detailed description of the specific implementation manner is given in step S102 of the above method embodiment, and is not repeated herein.

A first calculating unit 23, configured to calculate first frequency domain energy corresponding to the first frequency domain data and second frequency domain energy corresponding to each second frequency domain data; the detailed description of the specific implementation manner is given in step S103 of the above method embodiment, and is not repeated herein.

A second calculating unit 24, configured to select a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to each microphone when the first frequency domain energy is smaller than a first preset threshold; the detailed description of the specific implementation manner is given in step S104 of the above method embodiment, and is not repeated herein.

And a switching unit 25, configured to switch the candidate main microphone to the main microphone when it is determined that the current main microphone is different from the candidate main microphone. The detailed description of the specific implementation manner is given in step S105 of the above method embodiment, and is not repeated herein.

According to the switching device for the main microphones provided by the embodiment of the invention, as the playing sound of the loudspeaker when playing the audio data can be collected by each microphone, the selection of the candidate main microphone can be carried out by calculating the first frequency domain energy corresponding to the loudspeaker and under the condition that the first frequency domain energy is smaller than the first preset threshold value, namely the loudspeaker does not play the audio data, so that the selected candidate main microphone can be prevented from being the microphone with the strongest echo signal, and further the switched main microphone can be prevented from being the microphone with the strongest echo signal; and the noise frequency domain energy corresponding to each microphone is acquired, the second frequency domain energy corresponding to each microphone is calculated, and a candidate main microphone is selected from the microphones according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy, so that the candidate main microphone can be selected when the microphones are in different states, the sound quality of the selected candidate main microphone is the highest, the selected candidate main microphone can be prevented from being the microphone with the largest noise, and the main microphone after switching is prevented from being the microphone with the largest noise.

Based on the same inventive concept as the switching method of the main microphone in the foregoing embodiment, the present invention further provides a microphone-speaker integrated device, as shown in fig. 4, including: a processor 31 and a memory 32, wherein the processor 31 and the memory 32 may be connected by a bus or other means, and the connection by the bus is illustrated in fig. 4 as an example.

The processor 31 may be a Central Processing Unit (CPU). The Processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 32, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the switching method of the primary microphone in the embodiment of the present invention. The processor 31 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 32, that is, implements the switching method of the main microphone in the above method embodiment.

The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 31, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, and these remote memories may be connected to the processor 31 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more of the modules described above are stored in the memory 32 and, when executed by the processor 31, perform the method of switching the primary microphone as in the embodiment shown in fig. 2.

The details of the microphone and speaker integrated device may be understood with reference to the corresponding description and effects in the embodiment shown in fig. 2, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of switching a primary microphone, comprising:

acquiring audio data for playing by a loudspeaker to obtain first audio data, acquiring the audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquiring noise frequency domain energy corresponding to each microphone;

preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data;

calculating first frequency domain energy of the first frequency domain data and second frequency domain energy of each second frequency domain data;

when the first frequency domain energy is smaller than a first preset threshold value, selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones;

and when the current main microphone is determined to be different from the candidate main microphone, switching the candidate main microphone into a main microphone.

2. The switching method of a main microphone according to claim 1,

when audio data used for playing by a loudspeaker are collected, the frame number of the collected audio data is 1 frame;

when the audio data collected by each microphone is collected, the number of frames of the collected audio data is 1 frame.

3. The method of claim 2, wherein the obtaining noise frequency domain energy corresponding to each microphone comprises:

for each microphone:

if the number of frames of the audio data acquired by the microphone is greater than or equal to the preset number, determining the latest frame with the preset number in the acquired frames, preprocessing each frame of the audio data with the latest preset number to obtain corresponding third frequency domain data, calculating third frequency domain energy of the third frequency domain data, and taking the minimum third frequency domain energy as noise frequency domain energy corresponding to the microphone.

4. The method of claim 1, wherein when the first frequency domain energy is smaller than a first predetermined threshold, selecting a candidate primary microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones comprises:

when the first frequency domain energy is smaller than a first preset threshold value and at least one second frequency domain energy is larger than a second preset threshold value, calculating the signal-to-noise ratio corresponding to each microphone according to the noise frequency domain energy corresponding to each microphone and the second frequency domain energy; selecting the microphone with the largest signal-to-noise ratio as a candidate main microphone; or

And when the first frequency domain energy is smaller than a first preset threshold value and each second frequency domain energy is smaller than a second preset threshold value, selecting the microphone with the minimum noise frequency domain energy as a candidate main microphone.

5. The method of any one of claims 1-4, wherein after the determining that the current primary microphone is not the same as the candidate primary microphone and before switching the candidate primary microphone to the primary microphone, the method further comprises:

updating the count corresponding to the candidate main microphone to be the current count plus one;

if the updated count reaches a threshold, switching the candidate main microphone to a main microphone; and if the updated count is smaller than the threshold value, returning to execute the steps of collecting the audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

6. The method of switching a primary microphone according to claim 5, further comprising, after switching the candidate primary microphone to a primary microphone:

resetting the corresponding count of each candidate main microphone except the main microphone, returning to execute the steps of collecting the audio data for playing by the loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining the noise frequency domain energy corresponding to each microphone.

7. The method of switching a primary microphone according to claim 5, further comprising:

when the first frequency domain energy is larger than a first preset threshold value, resetting the count corresponding to each selected candidate main microphone, returning to execute the steps of collecting audio data used for playing by a loudspeaker to obtain first audio data, collecting the audio data collected by each microphone to obtain second audio data corresponding to each microphone, and obtaining noise frequency domain energy corresponding to each microphone.

8. A switching device for a main microphone, comprising:

the acquisition unit is used for acquiring audio data used for playing by a loudspeaker to obtain first audio data, acquiring the audio data acquired by each microphone to obtain second audio data corresponding to each microphone, and acquiring noise frequency domain energy corresponding to each microphone;

the preprocessing unit is used for preprocessing the first audio data and each second audio data to obtain corresponding first frequency domain data and each second frequency domain data;

a first calculating unit, configured to calculate first frequency domain energy corresponding to the first frequency domain data and second frequency domain energy corresponding to each second frequency domain data;

the second calculation unit is used for selecting a candidate main microphone from the microphones according to the noise frequency domain energy and the second frequency domain energy corresponding to the microphones when the first frequency domain energy is smaller than a first preset threshold;

and the switching unit is used for switching the candidate main microphone into a main microphone when the current main microphone is determined to be different from the candidate main microphone.

9. A microphone-speaker integrated apparatus, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method of switching a primary microphone according to any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the method of switching a primary microphone according to any one of claims 1-7.