CN110310655B - Microphone signal processing method, device, equipment and storage medium - Google Patents
Microphone signal processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110310655B CN110310655B CN201910324799.2A CN201910324799A CN110310655B CN 110310655 B CN110310655 B CN 110310655B CN 201910324799 A CN201910324799 A CN 201910324799A CN 110310655 B CN110310655 B CN 110310655B
- Authority
- CN
- China
- Prior art keywords
- signal
- processing
- voice
- module
- noise reduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 95
- 230000009467 reduction Effects 0.000 claims abstract description 49
- 238000001514 detection method Methods 0.000 claims abstract description 47
- 230000001629 suppression Effects 0.000 claims abstract description 35
- 238000004891 communication Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 7
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The invention provides a microphone signal processing method, a device, equipment and a storage medium, wherein a signal is divided into three parts after linear echo cancellation processing and beam forming processing are carried out, a first nonlinear echo suppression processing is carried out after a first noise reduction processing is carried out on one part, and then voice existence detection is carried out to obtain a voice existence detection result X; the second path is subjected to second noise reduction processing and then is subjected to first automatic gain control processing to obtain a voice recognition signal Y for voice recognition; combining X and Y into two sound channels for the speech recognition APP to use; and the third path is subjected to third noise reduction processing and then is subjected to second nonlinear echo suppression processing to further suppress residual echo, and then is subjected to second automatic gain control processing to obtain a voice application signal Z for recording or communication APP. The invention branches the signal into three paths aiming at different requirements of the voice recognition APP and other voice APPs, has flexible structure, can independently adjust parameters and algorithms for processing two parts of signals, and does not influence each other.
Description
Technical Field
The present invention relates to the field of speech signal processing, and more particularly, to a method, an apparatus, a device, and a storage medium for processing a microphone signal.
Background
In speech recognition applications, some pre-processing of the microphone signal is required, such as Beamforming (Beamforming), echo cancellation (AEC), Noise Reduction (NR), Automatic Gain Control (AGC), Dereverberation (DR), voice presence detection (VAD), etc. In an operating system, the software of voice recognition is usually a general APP, which can directly acquire a voice signal from a sound card device and perform recognition, while beam forming, echo cancellation, dereverberation and the like are highly related to hardware design, and are not well independently placed in application software, and each application software needs to be independently implemented, repeatedly calculated, some information is even unavailable, and the universality is poor. Some of the prior art solutions are therefore implemented in the firmware of the microphone module, which has the following disadvantages: the calculation amount is large, and the module cost is high. Or in the drive, which has the following disadvantages: resources are limited, such as floating point operations, locks, task scheduling, sleeping, etc.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a microphone signal processing method, a microphone signal processing device, microphone signal processing equipment and a microphone signal processing storage medium.
In a first aspect, an embodiment of the present invention provides a microphone signal processing method, including the following steps:
s1: carrying out linear echo cancellation (AEC) on the multi-path microphone signals and the reference signals together, and canceling out loudspeaker sounds picked up from the microphone;
s2: the multi-path microphone signals after the linear echo cancellation processing are processed by beam forming (Beamforming), one part of the beam formed signals is divided into three,
after first noise reduction processing, performing first nonlinear echo suppression processing on one path of signals to further suppress residual echo, and then performing voice presence detection (VAD) to obtain a voice presence detection result X;
the second path of signal is subjected to second noise reduction processing and then is subjected to first Automatic Gain Control (AGC) processing to obtain a voice recognition signal Y for voice recognition;
combining the voice existence detection result X and the voice recognition signal Y into two sound channels for being provided for the voice recognition APP to use;
the two different first and second noise reduction algorithms are used here because the speech signal used for speech recognition will severely affect the recognition rate if the noise is reduced too much or not well processed; the noise reduction of VAD needs to be strong, otherwise normal operation of VAD is affected. The reason why the nonlinear echo suppression part is only used on the VAD channel is that the nonlinear echo suppression part influences the voice recognition rate but is very helpful for VAD detection; after the two paths of processing are separated, the voice recognition effect and the VAD effect can be ensured, the debugging and the optimization are more convenient, and the parameters can not be mutually coupled.
And performing second nonlinear echo suppression processing on the third path of signals after third noise reduction processing to further suppress residual echo, and then performing second automatic gain control processing to obtain a voice application signal Z for recording or communication APP.
Preferably, in step S1, the reference signal is obtained from a speaker or from sound card driving/voice playing software.
Preferably, in step S1, the adaptive filter is used to perform linear echo cancellation processing on each microphone signal and the reference signal together.
Preferably, in step S2, when the multi-path microphone signal is processed by beamforming, the angle of arrival (DOA) needs to be known, and the DOA is calculated according to a preset estimation method of the DOA.
Preferably, in step S2, the voice existence detection result X and the voice recognition signal Y are combined into two channels, and the specific method is as follows: the speech presence detection result X is placed solely on one of the channels and the speech recognition signal Y is placed solely on the other channel. If the left channel stores a voice signal, the right channel stores VAD information, 0 indicates no voice, and non-0 indicates voice.
Preferably, in step S2, the voice existence detection result X and the voice recognition signal Y are combined into two channels, and the specific method is as follows: a certain bit of the speech recognition signal Y is used to store the presence detection result X. For example, the presence detection result X is stored using the lowest bit of the speech recognition signal Y, and when the lowest bit (bit) is 0, it indicates no speech, and when the lowest bit is 1, it indicates speech. The normal voice signal is 16bit or 24bit, and when the lowest 1bit is replaced by 0 or 1, the voice signal can be submerged by noise, and the original recognition rate is hardly influenced.
Preferably, the multi-path microphone signals are acquired from the multiple microphone hardware through the sound card driver and are sent to the signal processing service program, the signal processing service program processes according to the method, the processed signals are stored in the virtual sound card driver, and the virtual sound card driver simulates multiple audio input ports for providing the processed microphone signals for the voice recognition APP and other APPs respectively. For example, an audio stream formed by combining the speech presence detection result X and the speech recognition signal Y is provided for the speech recognition APP, and an audio stream of the speech application signal Z is provided for other APPs such as the recording APP and the communication APP.
The signal processing service program + virtual sound card driver is adopted in the following structural forms:
1. the universality is strong, the upper layer interfaces are uniform, each APP does not need to be independently processed, and repeated calculation is avoided;
2. the independence is strong, the whole set of processing method is executed in a signal processing service program, and the development limit is less; the algorithm and the code of the signal processing service program can be independently debugged, updated and deployed;
3. the signal processing service program is placed in the application-level service program, so that the development difficulty is low, the resource limitation is less, and the debugging is convenient;
4. the VAD and the signal processing are put together, more information can be obtained, such as a reference signal, various intermediate data in the signal processing process and the like, and the VAD effect is better after the information is utilized.
In a second aspect, an embodiment of the present invention provides a microphone signal processing apparatus, including:
a linear echo cancellation module: the linear echo cancellation device is used for carrying out linear echo cancellation processing on a plurality of paths of microphone signals and a reference signal together and canceling out loudspeaker sound picked in a microphone;
a beam forming module: the system comprises a linear echo cancellation module, a beam forming module and a control module, wherein the linear echo cancellation module is used for outputting signals of multiple microphones;
a first noise reduction module: the device is used for carrying out noise reduction processing on one path of signals formed by the wave beams;
a first nonlinear echo suppression module: the first noise reduction module is used for carrying out nonlinear echo suppression processing on the signal output by the first noise reduction module;
a voice presence detection module: the voice presence detection module is used for detecting the voice presence of the signal output by the first nonlinear echo suppression module to obtain a voice presence detection result X;
a second noise reduction module: the noise reduction processing is carried out on the other path of signals formed by the wave beams;
a first automatic gain control module: the automatic gain control module is used for carrying out automatic gain control on the signal output by the second noise reduction module to obtain a voice recognition signal Y for voice recognition;
a signal merging module: the voice recognition system is used for combining a voice existence detection result X and a voice recognition signal Y into a left sound channel and a right sound channel which are provided for a voice recognition APP to use;
a third noise reduction module: the noise reduction processing is carried out on the beam-formed third path signal;
a second nonlinear echo suppression module: the nonlinear echo suppression module is used for carrying out nonlinear echo suppression processing on the signal output by the third noise reduction module;
a second automatic gain control module: and the second automatic gain control processing module is used for carrying out second automatic gain control processing on the signal output by the second nonlinear echo suppression module to obtain a voice application signal Z for recording or communication APP.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements any one of the steps of the method when executing the program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, performs the steps of any one of the methods described above.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
1. aiming at different requirements of a voice recognition APP and other voice APPs, the signal is branched into three paths, one path of signal is subjected to voice existence detection, the other path of signal is subjected to voice signal processing comprising noise reduction, nonlinear echo suppression and automatic gain control, parameters and algorithms of the three signal processing parts can be independently adjusted, and mutual influence is avoided;
2. the information of the voice existence detection result X is directly mixed into the voice recognition signal Y, an additional channel is not needed to be added to provide VAD information, the implementation is convenient, and the implementation framework and the structure of the original system are not needed to be changed.
Drawings
Fig. 1 is a flowchart of a microphone signal processing method according to embodiment 1 of the present invention.
Fig. 2 is a schematic diagram of a left channel storing a voice signal and a right channel storing VAD information according to embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of a microphone signal processing apparatus according to embodiment 2 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a microphone signal processing method, including the following steps:
s1: carrying out linear echo cancellation (AEC) on the multi-path microphone signals and the reference signals together, and canceling out loudspeaker sounds picked up from the microphone;
s2: the multi-path microphone signals after the linear echo cancellation processing are processed by beam forming (Beamforming), one part of the beam formed signals is divided into three,
and performing first noise reduction processing on one path of signal, performing first nonlinear echo suppression processing on the other path of signal, further suppressing residual echo, and performing voice presence detection (VAD) to obtain a voice presence detection result X. The reference signal in step S1 is also needed for nonlinear echo suppression. The linear echo cancellation part usually cannot completely cancel the loudspeaker sound picked up in the microphone, so that voice presence detection (VAD) is more reliably performed conveniently, and then voice presence detection is performed to obtain a voice presence detection result X;
the second path of signal is subjected to second noise reduction processing and then is subjected to first Automatic Gain Control (AGC) processing to obtain a voice recognition signal Y for voice recognition;
combining the voice existence detection result X and the voice recognition signal Y into two sound channels for being provided for the voice recognition APP to use;
the two different first and second noise reduction algorithms are used here because the speech signal used for speech recognition will severely affect the recognition rate if the noise is reduced too much or not well processed; the noise reduction of VAD needs to be strong, otherwise normal operation of VAD is affected. The reason why the nonlinear echo suppression part is only used on the VAD channel is that the nonlinear echo suppression part influences the voice recognition rate but is very helpful for VAD detection; after the two paths of processing are separated, the voice recognition effect and the VAD effect can be ensured, the debugging and the optimization are more convenient, and the parameters can not be mutually coupled.
When the noise reduction algorithm is executed, a noise estimation value needs to be known, and the noise estimation value is obtained through calculation according to a preset noise estimation method. Here, a conventional noise estimation method may be used.
And performing second nonlinear echo suppression processing on the third path of signals after third noise reduction processing to further suppress residual echo, and then performing second automatic gain control processing to obtain a voice application signal Z for recording or communication APP.
In step S1, the reference signal is obtained from a speaker, and the reference signal is obtained from a speaker, or obtained from sound card driving/voice playing software.
In step S1, the adaptive filter is used to perform linear echo cancellation processing on each microphone signal and the reference signal.
In step S2, when the multi-path microphone signal is processed for beamforming, the angle of arrival (DOA) needs to be known, and the DOA is calculated according to a preset estimation method of the DOA.
In step S2, the speech presence detection result X and the speech recognition signal Y are combined into two sound channels, and the specific method is as follows: the speech presence detection result X is placed solely on one of the channels and the speech recognition signal Y is placed solely on the other channel. As shown in fig. 2, the left channel stores a voice signal, the right channel stores VAD information, 0 indicates no voice, and non-0 indicates voice.
In step S2, the voice presence detection result X and the voice recognition signal Y are combined into two sound channels, and the specific method may further be: a certain bit of the speech recognition signal Y is used to store the presence detection result X. For example, the presence detection result X is stored using the lowest bit of the speech recognition signal Y, and when the lowest bit (bit) is 0, it indicates no speech, and when the lowest bit is 1, it indicates speech. The normal voice signal is 16bit or 24bit, and when the lowest 1bit is replaced by 0 or 1, the voice signal can be submerged by noise, and the original recognition rate is hardly influenced.
Preferably, the multi-path microphone signals are acquired from the multiple microphone hardware through the sound card driver and are sent to the signal processing service program, the signal processing service program processes according to the method, the processed signals are stored in the virtual sound card driver, and the virtual sound card driver simulates multiple audio input ports for providing the processed microphone signals for the voice recognition APP and other APPs respectively. For example, an audio stream formed by combining the speech presence detection result X and the speech recognition signal Y is provided for the speech recognition APP, and an audio stream of the speech application signal Z is provided for other APPs such as the recording APP and the communication APP.
The signal processing service program + virtual sound card is adopted, and the following reasons exist:
1. the universality is strong, the upper layer interfaces are uniform, each APP does not need to be independently processed, and repeated calculation is avoided;
2. the independence is strong, and the algorithm and the code of the signal processing service program can be debugged, updated and deployed independently;
3. the signal processing service program is placed in the application-level service program, so that the development difficulty is low, the resource limitation is less, and the debugging is convenient;
4. the VAD and the signal processing are put together, more information can be obtained, such as a reference signal, various intermediate data in the signal processing process and the like, and the VAD effect is better after the information is utilized.
The scheme of the embodiment can be used in a video conference machine/a preschool education machine. In consideration of recording, remote education, voice control and other functions, the whole machine needs to have a microphone input and a loudspeaker to output sound. The effect of recording and speech recognition can be seriously affected by the requirement of longer pickup distance and the interference of loudspeaker signals. Therefore, a pre-processing module of the microphone signal is needed to remove the loudspeaker echo signal and the noise signal in the environment contained in the microphone signal, and adjust the signal amplitude to a proper amplitude to send to the recording software or the voice recognition module for recognition. Meanwhile, in order to ensure that the microphone signal is not sent to the voice recognition module when no voice exists, VAD is needed to detect whether the voice signal exists at present, and only when the voice signal exists, the microphone data is sent to the voice recognition module for recognition. The speech recognition module and recording software can work independently at the user application level without concern for portions of speech signal processing. This arrangement allows the use of a very low cost (because there is no signal processing) microphone module, with the signal processing part being located on the main CPU of the system.
Example 2
As shown in fig. 3, embodiment 2 of the present invention provides a microphone signal processing apparatus, including:
a linear echo cancellation module: the linear echo cancellation device is used for carrying out linear echo cancellation processing on a plurality of paths of microphone signals and a reference signal together and canceling out loudspeaker sound picked in a microphone;
a beam forming module: the system comprises a linear echo cancellation module, a beam forming module and a control module, wherein the linear echo cancellation module is used for outputting signals of multiple microphones;
a first noise reduction module: the device is used for carrying out noise reduction processing on one path of signals formed by the wave beams;
a first nonlinear echo suppression module: the first noise reduction module is used for carrying out nonlinear echo suppression processing on the signal output by the first noise reduction module;
a voice presence detection module: the voice presence detection module is used for detecting the voice presence of the signal output by the first nonlinear echo suppression module to obtain a voice presence detection result X;
a second noise reduction module: the noise reduction processing is carried out on the other path of signals formed by the wave beams;
a first automatic gain control module: the automatic gain control module is used for carrying out automatic gain control on the signal output by the second noise reduction module to obtain a voice recognition signal Y for voice recognition;
a signal merging module: the voice recognition system is used for combining a voice existence detection result X and a voice recognition signal Y into a left sound channel and a right sound channel which are provided for a voice recognition APP to use;
a third noise reduction module: the noise reduction processing is carried out on the beam-formed third path signal;
a second nonlinear echo suppression module: the nonlinear echo suppression module is used for carrying out nonlinear echo suppression processing on the signal output by the third noise reduction module;
a second automatic gain control module: and the second automatic gain control processing module is used for carrying out second automatic gain control processing on the signal output by the second nonlinear echo suppression module to obtain a voice application signal Z for recording or communication APP.
Example 3
Embodiment 3 of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements any of the steps of the method described above. In this embodiment, the processor is a control center of the computer system, and may be a processor of a physical machine or a processor of a virtual machine.
Example 4
Embodiment 4 of the present invention provides a computer-readable storage medium on which a computer program is stored, the program being executed by a processor to perform the steps of any one of the methods described above. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It is clear to a person skilled in the art that the solution according to the embodiments of the invention can be implemented by means of software and/or hardware. The "unit" or "module" in the present specification means software and/or hardware capable of performing a specific function by itself or in cooperation with other components.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A microphone signal processing method, characterized by comprising the steps of:
s1: carrying out linear echo cancellation processing on a plurality of paths of microphone signals and a reference signal together, and canceling out loudspeaker sound picked in a microphone;
s2: the multi-path microphone signals after the linear echo cancellation are processed by beam forming, one beam forming signal is divided into three,
after first noise reduction processing, performing first nonlinear echo suppression processing on one path of signal to further suppress residual echo, and then performing voice existence detection to obtain a voice existence detection result X;
the second path of signal is subjected to second noise reduction processing and then is subjected to first automatic gain control processing to obtain a voice recognition signal Y for voice recognition;
combining the voice existence detection result X and the voice recognition signal Y into two sound channels for being provided for the voice recognition APP to use;
and performing second nonlinear echo suppression processing on the third path of signals after third noise reduction processing to further suppress residual echo, and then performing second automatic gain control processing to obtain a voice application signal Z for recording or communication APP.
2. The microphone signal processing method according to claim 1, wherein in step S1, the reference signal is obtained from a speaker or from sound card driver/voice playing software.
3. The microphone signal processing method according to claim 1, wherein in step S1, the adaptive filter is used to perform linear echo cancellation processing on each microphone signal and the reference signal together.
4. The method as claimed in claim 1, wherein in step S2, the arrival angle is required to be known when the multi-path microphone signal is processed by beamforming, and the arrival angle is calculated according to a predetermined arrival angle estimation method.
5. The microphone signal processing method of claim 1, wherein in step S2, the voice presence detection result X and the voice recognition signal Y are combined into two channels, specifically: the speech presence detection result X is placed solely on one of the channels and the speech recognition signal Y is placed solely on the other channel.
6. The microphone signal processing method of claim 1, wherein in step S2, the voice presence detection result X and the voice recognition signal Y are combined into two channels, specifically: a certain bit of the speech recognition signal Y is used to store the presence detection result X.
7. The microphone signal processing method according to any one of claims 1 to 6, wherein the multiple microphone signals are acquired from multiple microphone hardware by a sound card driver and sent to a signal processing service program, the signal processing service program performs processing according to the method, the processed signals are stored in a virtual sound card driver, and the virtual sound card driver simulates multiple audio input ports for providing the processed microphone signals for the speech recognition APP and other APPs, respectively.
8. A microphone signal processing apparatus, comprising:
a linear echo cancellation module: the linear echo cancellation device is used for carrying out linear echo cancellation processing on a plurality of paths of microphone signals and a reference signal together and canceling out loudspeaker sound picked in a microphone;
a beam forming module: the system comprises a linear echo cancellation module, a beam forming module and a control module, wherein the linear echo cancellation module is used for outputting signals of multiple microphones;
a first noise reduction module: the device is used for carrying out noise reduction processing on one path of signals formed by the wave beams;
a first nonlinear echo suppression module: the first noise reduction module is used for carrying out nonlinear echo suppression processing on the signal output by the first noise reduction module;
a voice presence detection module: the voice presence detection module is used for detecting the voice presence of the signal output by the first nonlinear echo suppression module to obtain a voice presence detection result X;
a second noise reduction module: the noise reduction processing is carried out on the other path of signals formed by the wave beams;
a first automatic gain control module: the automatic gain control module is used for carrying out automatic gain control on the signal output by the second noise reduction module to obtain a voice recognition signal Y for voice recognition;
a signal merging module: the voice recognition system is used for combining a voice existence detection result X and a voice recognition signal Y into a left sound channel and a right sound channel which are provided for a voice recognition APP to use;
a third noise reduction module: the noise reduction processing is carried out on the beam-formed third path signal;
a second nonlinear echo suppression module: the nonlinear echo suppression module is used for carrying out nonlinear echo suppression processing on the signal output by the third noise reduction module;
a second automatic gain control module: and the second automatic gain control processing module is used for carrying out second automatic gain control processing on the signal output by the second nonlinear echo suppression module to obtain a voice application signal Z for recording or communication APP.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324799.2A CN110310655B (en) | 2019-04-22 | 2019-04-22 | Microphone signal processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324799.2A CN110310655B (en) | 2019-04-22 | 2019-04-22 | Microphone signal processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110310655A CN110310655A (en) | 2019-10-08 |
CN110310655B true CN110310655B (en) | 2021-10-22 |
Family
ID=68075394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910324799.2A Active CN110310655B (en) | 2019-04-22 | 2019-04-22 | Microphone signal processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310655B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233687B (en) * | 2020-12-10 | 2021-07-16 | 统信软件技术有限公司 | Audio noise reduction device and computing equipment |
CN113823314B (en) * | 2021-08-12 | 2022-10-28 | 北京荣耀终端有限公司 | Voice processing method and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8239194B1 (en) * | 2011-07-28 | 2012-08-07 | Google Inc. | System and method for multi-channel multi-feature speech/noise classification for noise suppression |
CN104183234A (en) * | 2013-05-28 | 2014-12-03 | 展讯通信(上海)有限公司 | Method and device for processing voice signal and achieving multi-party conversation, and communication terminal |
CN105304093A (en) * | 2015-11-10 | 2016-02-03 | 百度在线网络技术(北京)有限公司 | Signal front-end processing method used for voice recognition and device thereof |
CN107547813A (en) * | 2016-06-29 | 2018-01-05 | 深圳市巨龙科教高技术股份有限公司 | A kind of system and method for acquisition process multipath audio signal |
CN108766456A (en) * | 2018-05-22 | 2018-11-06 | 出门问问信息科技有限公司 | A kind of method of speech processing and device |
CN109074816A (en) * | 2016-06-15 | 2018-12-21 | 英特尔公司 | Far field automatic speech recognition pretreatment |
CN109660904A (en) * | 2019-02-02 | 2019-04-19 | 恒玄科技(上海)有限公司 | Headphone device, audio signal processing method and system |
-
2019
- 2019-04-22 CN CN201910324799.2A patent/CN110310655B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8239194B1 (en) * | 2011-07-28 | 2012-08-07 | Google Inc. | System and method for multi-channel multi-feature speech/noise classification for noise suppression |
CN104183234A (en) * | 2013-05-28 | 2014-12-03 | 展讯通信(上海)有限公司 | Method and device for processing voice signal and achieving multi-party conversation, and communication terminal |
CN105304093A (en) * | 2015-11-10 | 2016-02-03 | 百度在线网络技术(北京)有限公司 | Signal front-end processing method used for voice recognition and device thereof |
CN109074816A (en) * | 2016-06-15 | 2018-12-21 | 英特尔公司 | Far field automatic speech recognition pretreatment |
CN107547813A (en) * | 2016-06-29 | 2018-01-05 | 深圳市巨龙科教高技术股份有限公司 | A kind of system and method for acquisition process multipath audio signal |
CN108766456A (en) * | 2018-05-22 | 2018-11-06 | 出门问问信息科技有限公司 | A kind of method of speech processing and device |
CN109660904A (en) * | 2019-02-02 | 2019-04-19 | 恒玄科技(上海)有限公司 | Headphone device, audio signal processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110310655A (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097891B (en) | Microphone signal processing method, device, equipment and storage medium | |
US10535362B2 (en) | Speech enhancement for an electronic device | |
US9913022B2 (en) | System and method of improving voice quality in a wireless headset with untethered earbuds of a mobile device | |
US10269369B2 (en) | System and method of noise reduction for a mobile device | |
US9197974B1 (en) | Directional audio capture adaptation based on alternative sensory input | |
US9438985B2 (en) | System and method of detecting a user's voice activity using an accelerometer | |
US9838784B2 (en) | Directional audio capture | |
KR101171494B1 (en) | Robust two microphone noise suppression system | |
EP2715725B1 (en) | Processing audio signals | |
US8712069B1 (en) | Selection of system parameters based on non-acoustic sensor information | |
US20100290615A1 (en) | Echo canceller operative in response to fluctuation on echo path | |
US20140126746A1 (en) | Signal-separation system using a directional microphone array and method for providing same | |
EP2863392B1 (en) | Noise reduction in multi-microphone systems | |
US10978086B2 (en) | Echo cancellation using a subset of multiple microphones as reference channels | |
US20170365249A1 (en) | System and method of performing automatic speech recognition using end-pointing markers generated using accelerometer-based voice activity detector | |
KR20160099640A (en) | Systems and methods for feedback detection | |
KR20120101457A (en) | Audio zoom | |
WO2014051969A1 (en) | System and method of detecting a user's voice activity using an accelerometer | |
CN112424863A (en) | Voice perception audio system and method | |
CN110310655B (en) | Microphone signal processing method, device, equipment and storage medium | |
US20150371656A1 (en) | Acoustic Echo Preprocessing for Speech Enhancement | |
KR101982812B1 (en) | Headset and method for improving sound quality thereof | |
CN111081233B (en) | Audio processing method and electronic equipment | |
CN114008999B (en) | Acoustic echo cancellation | |
US9729967B2 (en) | Feedback canceling system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |