CN111901704B - Audio data processing method, device, equipment and computer readable storage medium - Google Patents

Audio data processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN111901704B
CN111901704B CN202010552971.2A CN202010552971A CN111901704B CN 111901704 B CN111901704 B CN 111901704B CN 202010552971 A CN202010552971 A CN 202010552971A CN 111901704 B CN111901704 B CN 111901704B
Authority
CN
China
Prior art keywords
audio data
processing
filtering threshold
threshold value
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010552971.2A
Other languages
Chinese (zh)
Other versions
CN111901704A (en
Inventor
沈卫民
刘祖芳
骆传伏
黄猛
王志辉
王伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Michoi Security Technology Co ltd
Original Assignee
Shenzhen Michoi Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Michoi Security Technology Co ltd filed Critical Shenzhen Michoi Security Technology Co ltd
Priority to CN202010552971.2A priority Critical patent/CN111901704B/en
Publication of CN111901704A publication Critical patent/CN111901704A/en
Application granted granted Critical
Publication of CN111901704B publication Critical patent/CN111901704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q5/00Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange
    • H04Q5/24Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange for two-party-line systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The invention discloses an audio data processing method, which comprises the following steps: when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal; performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain target audio data; and sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data. The invention also discloses an audio data processing device, equipment and a computer readable storage medium. According to the invention, the filtering threshold value is determined according to the acquired audio data so as to dynamically obtain the filtering threshold value, so that excessive elimination of the echo can be avoided when the echo of the audio data is eliminated according to the dynamic filtering threshold value, and normal voice in the voice data is not eliminated and the normal voice sound is not reduced.

Description

Audio data processing method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of building intercom technologies, and in particular, to an audio data processing method, apparatus, device, and computer readable storage medium.
Background
The building talkback is a safety precaution system, and is a management system for realizing intercommunication, information exchange and control of safe access passages of communities among visitors, residents and property management centers in a multi-storey or high-rise building. The visitor can call the resident through the door phone in front of the downstairs unit door and converse with the resident, and call the property management personnel without a key to assist in opening the unit door lock, while the resident can control the opening and closing of the unit door indoors, the resident can operate the indoor unit to call the property management personnel, and the door phone can receive the alarm signal of the resident at any time and transmit the alarm signal to the on-duty host to inform the community security personnel, so that the safety security work of the high-rise residence is enhanced, the resident is greatly facilitated, unnecessary troubles of going upstairs and downstairs are reduced, and the communication is more convenient, rapid, safe and reliable.
At present, full-duplex talkback is required in building talkback, and if audio data needing to be transmitted is not processed, echoes of two talkback ends can be heard, so that the talkback conversation quality is seriously influenced. In order to reduce echo, open-source free Speex and Webrtc technologies are often adopted to perform echo cancellation on audio data to be transmitted, but there is a phenomenon that excessive echo cancellation causes part of normal voice to be cancelled and the voice becomes small.
The above is only for the purpose of assisting understanding of the technical solution of the present invention, and does not represent an admission that the above is the prior art.
Disclosure of Invention
The invention mainly aims to provide an audio data processing method, an audio data processing device, audio data processing equipment and a computer readable storage medium, and aims to solve the technical problems that part of normal voice is eliminated and the voice becomes small due to excessive echo elimination of an existing talkback terminal.
In order to achieve the above object, the present invention provides an audio data processing method, including the steps of:
when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal;
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain target audio data;
and sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
Further, in an embodiment, the step of performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold to obtain target audio data includes:
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain processed audio data;
and carrying out high-pass filtering processing on the processed audio data to obtain target audio data.
Further, in an embodiment, the step of performing high-pass filtering processing on the processed audio data to obtain the target audio data includes:
carrying out high-pass filtering processing on the processed audio data to obtain filtered audio data;
and carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data.
Further, in an embodiment, the step of performing AGC automatic gain processing on the filtered audio data to obtain the target audio data includes:
carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
a phase inversion operation is performed on the gained audio data to obtain target audio data.
Further, in an embodiment, the step of determining a filtering threshold corresponding to audio data currently acquired by a microphone of the first intercom terminal includes:
performing VAD detection on audio data currently acquired by a microphone of the first talkback terminal to obtain a VAD detection result;
and determining a filtering threshold corresponding to the VAD detection result.
Further, in an embodiment, the step of determining a filtering threshold corresponding to the VAD detection result includes:
acquiring a mapping relation between a preset detection result and a preset filtering threshold value;
and determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
Further, in an embodiment, the audio data processing method further includes:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
Further, to achieve the above object, the present invention also provides an audio data processing apparatus comprising:
the device comprises a determining module, a filtering module and a processing module, wherein the determining module is used for determining a filtering threshold value corresponding to audio data currently acquired by a microphone of a first talkback terminal when the first talkback terminal is in a full-duplex mode;
the processing module is used for carrying out noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value so as to obtain target audio data;
and the sending module is used for sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
Furthermore, to achieve the above object, the present invention also provides an audio data processing apparatus comprising: a memory, a processor and an audio data processing program stored on the memory and executable on the processor, the audio data processing program, when executed by the processor, implementing the steps of the audio data processing method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an audio data processing program, which when executed by a processor, implements the steps of the aforementioned audio data processing method.
According to the invention, when the first talkback terminal is in a full-duplex mode, a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal is determined; then, based on the filtering threshold value, carrying out noise suppression processing and echo cancellation processing on the audio data to obtain target audio data; and then the target audio data is sent to a second talkback terminal so that the second talkback terminal can play the target audio data, the filtering threshold value is dynamically obtained by determining the filtering threshold value according to the collected audio data, and the echo can be prevented from being excessively eliminated when the echo of the audio data is eliminated according to the dynamic filtering threshold value so as to ensure that the normal voice in the voice data is not eliminated and the normal voice sound is not reduced.
Drawings
FIG. 1 is a schematic diagram of an audio data processing device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an audio data processing method according to the present invention;
FIG. 3 is a functional block diagram of an audio data processing apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an audio data processing device in a hardware operating environment according to an embodiment of the present invention.
The audio data processing device of the embodiment of the invention can be a PC, and can also be a door phone or an indoor phone in an intercom system.
As shown in fig. 1, the audio data processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the audio data processing device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. In particular, the light sensor may include an ambient light sensor and a proximity sensor. Of course, the audio data processing device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
It will be appreciated by those skilled in the art that the audio data processing device configuration shown in fig. 1 does not constitute a limitation of the audio data processing device and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, the memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an audio data processing program.
In the audio data processing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and communicating with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call the audio data processing program stored in the memory 1005.
In the present embodiment, the audio data processing apparatus includes: a memory 1005, a processor 1001 and an audio data processing program stored in the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the audio data processing program stored in the memory 1005, the following operations are performed:
when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal;
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain target audio data;
and sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
the step of performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold to obtain target audio data comprises:
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain processed audio data;
and carrying out high-pass filtering processing on the processed audio data to obtain target audio data.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
the step of performing high-pass filtering processing on the processed audio data to obtain target audio data includes:
performing high-pass filtering processing on the processed audio data to obtain filtered audio data;
and carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
the step of performing AGC automatic gain processing on the filtered audio data to obtain target audio data comprises:
carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
a phase inversion operation is performed on the gained audio data to obtain target audio data.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
the step of determining the filtering threshold corresponding to the audio data currently acquired by the microphone of the first intercom terminal includes:
performing VAD detection on audio data currently acquired by a microphone of the first talkback terminal to obtain a VAD detection result;
and determining a filtering threshold corresponding to the VAD detection result.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
acquiring a mapping relation between a preset detection result and a preset filtering threshold value;
and determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
Further, the processor 1001 may call the audio data processing program stored in the memory 1005, and also perform the following operations:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of an audio data processing method according to the present invention.
In this embodiment, the audio data processing method includes the following steps:
step S100, when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal;
in this embodiment, the first intercom terminal may be an indoor unit, a door phone, or a full management terminal in a building intercom system.
In a building intercom system, two ends in an intercom state are communicated in a full duplex mode, so that audio data (or audio and video data) transmitted by an opposite end is received and played while currently acquired audio data (or audio and video data) is transmitted.
In this embodiment, when the first intercom terminal is in the full-duplex mode, audio data currently acquired by a microphone of the first intercom terminal is acquired, the currently acquired audio data includes audio data which is acquired by the microphone before the current time and is not sent to the second intercom terminal, and a corresponding filtering threshold value is determined according to the currently acquired audio data, so that association between the filtering threshold value and the audio data is realized, and the size of the filtering threshold value is not fixed. The currently acquired audio data comprises human voice data of a user using the first talkback terminal, environmental sound data of an environment where the first talkback terminal is located and audio data of the second talkback terminal played by a loudspeaker of the first talkback terminal.
Step S200, based on the filtering threshold, carrying out noise suppression processing and echo cancellation processing on the audio data to obtain target audio data;
in this embodiment, after obtaining the filtering threshold, the noise suppression processing and the echo cancellation processing are performed on the audio data according to the filtering threshold to obtain the target audio data, specifically, the filtering parameters in the noise suppression processing and the echo cancellation processing may be adjusted according to the filtering threshold, and the noise suppression processing and the echo cancellation processing may be performed on the audio data after the adjustment, or, the noise suppression processing is firstly carried out on the audio data, the filtering parameter in the echo cancellation processing is adjusted according to the filtering threshold value, the adjusted audio data after the noise suppression processing is subjected to echo cancellation processing, for example, the filter coefficient of the filter corresponding to the echo cancellation processing is adjusted according to the filter threshold, the audio data (the audio data after the noise suppression processing) is subjected to echo cancellation processing through the filter after the coefficient adjustment, so that the normal voice is eliminated and the normal voice sound is reduced due to excessive echo cancellation.
Step S300, the target audio data is sent to a second talkback terminal, so that the second talkback terminal can play the target audio data.
In this embodiment, the second intercom terminal is an indoor unit, a door phone or a security management terminal in the building intercom system, which talkbacks (visual intercom) with the first intercom terminal. For example, the first intercom terminal may be a door phone in the building intercom system, and the second intercom terminal may be an indoor unit in the building intercom system, or the first intercom terminal may be an indoor unit in the building intercom system, and the second intercom terminal may be a door phone in the building intercom system.
After the target audio data are obtained, the target audio data are sent to a second talkback terminal, and after the second talkback terminal receives the target audio data, the second talkback terminal plays the target audio data.
Further, in an embodiment, the audio data processing method further includes:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
In this embodiment, the second intercom terminal processes the acquired audio data in the same processing manner to obtain audio data to be played, sends the audio data to be played to the first intercom terminal, and plays the audio data to be played by the first intercom terminal when receiving the audio data to be played sent by the second intercom terminal.
In the audio data processing method provided by the embodiment, when the first intercom terminal is in the full-duplex mode, a filtering threshold corresponding to audio data currently acquired by a microphone of the first intercom terminal is determined; then, based on the filtering threshold value, carrying out noise suppression processing and echo cancellation processing on the audio data to obtain target audio data; and then the target audio data is sent to a second talkback terminal so that the second talkback terminal can play the target audio data, the filtering threshold value is dynamically obtained by determining the filtering threshold value according to the collected audio data, and the echo can be prevented from being excessively eliminated when the echo of the audio data is eliminated according to the dynamic filtering threshold value so as to ensure that the normal voice in the voice data is not eliminated and the normal voice sound is not reduced.
Based on the first embodiment, a second embodiment of the audio data processing method of the present invention is proposed, in which the step S200 includes:
step S210, based on the filtering threshold, performing noise suppression processing and echo cancellation processing on the audio data to obtain processed audio data;
step S220, performing high-pass filtering processing on the processed audio data to obtain target audio data.
In this embodiment, after obtaining the filtering threshold, the audio data is subjected to noise suppression processing and echo cancellation processing according to the filtering threshold, so as to obtain processed audio data, specifically, the filtering parameters in the noise suppression processing and the echo cancellation processing may be adjusted according to the filtering threshold, and the noise suppression processing and the echo cancellation processing may be performed on the audio data after the adjustment, or, the noise suppression processing is firstly carried out on the audio data, the filtering parameter in the echo cancellation processing is adjusted according to the filtering threshold value, the adjusted audio data after the noise suppression processing is subjected to echo cancellation processing, for example, the filter coefficient of the filter corresponding to the echo cancellation processing is adjusted according to the filter threshold, and performing echo cancellation processing on the audio data (the audio data after the noise suppression processing) through the filter after the coefficient adjustment to obtain the processed audio data.
And then, carrying out high-pass filtering processing on the processed audio data to obtain target audio data, and filtering low-frequency signals in the audio data through high-pass filtering to obtain more accurate target audio data.
In the audio data processing method provided in this embodiment, the noise suppression processing and the echo cancellation processing are performed on the audio data based on the filtering threshold to obtain the processed audio data, and then the high-pass filtering processing is performed on the processed audio data to obtain the target audio data.
Based on the second embodiment, a third embodiment of the audio data processing method of the present invention is proposed, in which the step S220 includes:
step S222, performing high-pass filtering processing on the processed audio data to obtain filtered audio data;
step S223, performing AGC automatic gain processing on the filtered audio data to obtain target audio data.
In the present embodiment, after the processed audio data is obtained, the high-pass filtering processing is performed on the processed audio data to filter the low-frequency signal in the audio data through the high-pass filtering by the filtered audio data.
And then, carrying out AGC (automatic gain control) automatic gain processing on the filtered audio data to obtain target audio data, carrying out filtered audio control on the filtered audio data through the AGC automatic gain processing to adjust the loudness of voice in the filtered audio data, and obtaining the gained voice data, namely the target voice data, so as to improve the volume of the target voice data during playing and avoid undersize playing sound of an opposite terminal (a second talkback terminal).
In the audio data processing method provided in this embodiment, high-pass filtering is performed on processed audio data to obtain filtered audio data; and then, carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data, and carrying out AGC automatic gain processing on the audio data to improve the loudness of voice in the audio data so as to improve the volume of the target voice data during playing and avoid undersize playing sound of an opposite terminal (a second talkback terminal).
A fourth embodiment of the audio data processing method of the present invention is proposed based on the fourth embodiment, and in this embodiment, step S223 includes:
step a, carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
and b, performing phase inversion operation on the gained audio data to obtain target audio data.
In this embodiment, when the filtered audio data is obtained, the AGC automatic gain processing is performed on the filtered audio data to obtain target audio data, and the filtered audio data is subjected to filtered audio control through the AGC automatic gain processing to adjust loudness of voice in the filtered audio data, so as to obtain gained voice data, so as to increase volume of the target voice data when being played, and avoid undersize playing sound of an opposite terminal (a second intercom terminal).
After the gained audio data is obtained, performing phase inversion operation on the gained audio data to obtain target audio data, and avoiding howling corresponding to the target audio data through phase inversion.
In the audio data processing method provided in this embodiment, AGC automatic gain processing is performed on filtered audio data to obtain gained audio data, then a phase inversion operation is performed on the gained audio data to obtain target audio data, the AGC automatic gain processing is used to increase the volume of the target audio data during playing, so as to avoid that the playing sound of an opposite terminal (a second intercom terminal) is too small, and avoid howling corresponding to the target audio data through phase inversion.
A fifth embodiment of the audio data processing method of the present invention is proposed based on the first embodiment, and in this embodiment, the step S100 includes:
step S110, performing VAD detection on the audio data currently acquired by the microphone of the first intercom terminal to obtain a VAD detection result;
step S120, determining a filtering threshold corresponding to the VAD detection result.
VAD (Voice Activity Detection), also called Voice endpoint Detection or Voice boundary Detection, refers to detecting the existence of Voice in a noise environment, and is generally used in Voice processing systems such as Voice coding and Voice enhancement, and plays roles of reducing Voice coding rate, saving communication bandwidth, reducing energy consumption of mobile devices, and improving recognition rate.
In this embodiment, when the first intercom terminal is in the full-duplex mode, audio data currently acquired by a microphone of the first intercom terminal is acquired, where the currently acquired audio data includes audio data that has been acquired by the microphone before the current time and is not sent to the second intercom terminal, VAD detection is performed on the audio data currently acquired by the microphone of the first intercom terminal to obtain a VAD detection result, and a corresponding filtering threshold is determined according to the VAD detection result to implement association between the filtering threshold and the audio data, so that the filtering threshold is not fixed.
In the audio data processing method provided by this embodiment, VAD detection is performed on audio data currently acquired by a microphone of a first intercom terminal to obtain a VAD detection result; and then determining a filtering threshold corresponding to the VAD detection result, and accurately reaching the filtering threshold through the VAD detection result by carrying out VAD detection on the voice data, so as to improve the accuracy of the filtering threshold and further improve the accuracy of carrying out echo cancellation on the voice data.
A sixth embodiment of the audio data processing method of the present invention is proposed based on the fifth embodiment, and in this embodiment, the step S120 includes:
step S121, obtaining a mapping relation between a preset detection result and a preset filtering threshold value;
and step S122, determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
In this embodiment, a mapping relationship between the preset detection result and the preset filtering threshold may be preset, and after the VAD detection result is obtained, the mapping relationship between the preset detection result and the preset filtering threshold is obtained first; and then determining a filtering threshold corresponding to the VAD detection result based on the mapping relation, and further accurately obtaining the filtering threshold according to the VAD detection result through the mapping relation between the preset detection result and the preset filtering threshold.
In the audio data processing method provided by this embodiment, a mapping relationship between a preset detection result and a preset filtering threshold is obtained; and then determining a filtering threshold corresponding to the VAD detection result based on the mapping relation, accurately obtaining the filtering threshold according to the VAD detection result through the mapping relation between the preset detection result and the preset filtering threshold, improving the accuracy of the filtering threshold, and further improving the accuracy of echo cancellation on the audio data.
Referring to fig. 3, fig. 3 is a schematic diagram of functional modules of an audio data processing apparatus according to an embodiment of the present invention, where the audio data processing apparatus includes:
the determining module 100 is configured to determine a filtering threshold corresponding to audio data currently acquired by a microphone of the first intercom terminal when the first intercom terminal is in the full-duplex mode;
a processing module 200, configured to perform noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold to obtain target audio data;
a sending module 300, configured to send the target audio data to a second intercom terminal, so that the second intercom terminal plays the target audio data.
Further, the processing module 200 is further configured to:
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain processed audio data;
and carrying out high-pass filtering processing on the processed audio data to obtain target audio data.
Further, the processing module 200 is further configured to:
carrying out high-pass filtering processing on the processed audio data to obtain filtered audio data;
and carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data.
Further, the processing module 200 is further configured to:
carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
a phase inversion operation is performed on the gained audio data to obtain target audio data.
Further, the determining module 100 is further configured to:
performing VAD detection on audio data currently acquired by a microphone of a first talkback terminal to obtain a VAD detection result;
and determining a filtering threshold corresponding to the VAD detection result.
Further, the determining module 100 is further configured to:
acquiring a mapping relation between a preset detection result and a preset filtering threshold value;
and determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
Further, the audio data processing apparatus further includes:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an audio data processing program is stored on the computer-readable storage medium, and when executed by a processor, the audio data processing program implements the following operations:
when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal;
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain target audio data;
and sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
Further, the audio data processing program when executed by the processor further performs the following operations:
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain processed audio data;
and carrying out high-pass filtering processing on the processed audio data to obtain target audio data.
Further, the audio data processing program when executed by the processor further performs the following operations:
performing high-pass filtering processing on the processed audio data to obtain filtered audio data;
and carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data.
Further, the audio data processing program when executed by the processor further performs the operations of:
carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
a phase inversion operation is performed on the gained audio data to obtain target audio data.
Further, the audio data processing program when executed by the processor further performs the operations of:
performing VAD detection on audio data currently acquired by a microphone of the first talkback terminal to obtain a VAD detection result;
and determining a filtering threshold corresponding to the VAD detection result.
Further, the audio data processing program when executed by the processor further performs the operations of:
acquiring a mapping relation between a preset detection result and a preset filtering threshold value;
and determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
Further, the audio data processing program when executed by the processor further performs the operations of:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or the portions contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An audio data processing method, characterized by comprising the steps of:
when the first talkback terminal is in a full-duplex mode, determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first talkback terminal;
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain target audio data, wherein the filtering coefficient of a filter corresponding to the echo cancellation processing is adjusted according to the filtering threshold value, the noise suppression processing is performed on the audio data, and the echo cancellation processing is performed on the audio data after the noise suppression processing through the filter after the coefficient adjustment to obtain the target audio data;
and sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
2. The audio data processing method of claim 1, wherein the step of performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold to obtain target audio data comprises:
performing noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value to obtain processed audio data;
and carrying out high-pass filtering processing on the processed audio data to obtain target audio data.
3. The audio data processing method of claim 2, wherein the step of high-pass filtering the processed audio data to obtain the target audio data comprises:
carrying out high-pass filtering processing on the processed audio data to obtain filtered audio data;
and carrying out AGC automatic gain processing on the filtered audio data to obtain target audio data.
4. The audio data processing method of claim 3, wherein the step of performing AGC automatic gain processing on the filtered audio data to obtain target audio data comprises:
carrying out AGC automatic gain processing on the filtered audio data to obtain the audio data after gain;
a phase inversion operation is performed on the gained audio data to obtain target audio data.
5. The audio data processing method according to claim 1, wherein the step of determining the filtering threshold corresponding to the audio data currently collected by the microphone of the first intercom terminal includes:
performing VAD detection on audio data currently acquired by a microphone of the first talkback terminal to obtain a VAD detection result;
and determining a filtering threshold corresponding to the VAD detection result.
6. The audio data processing method of claim 5, wherein the step of determining the filtering threshold corresponding to the VAD detection result comprises:
acquiring a mapping relation between a preset detection result and a preset filtering threshold value;
and determining a filtering threshold corresponding to the VAD detection result based on the mapping relation.
7. The audio data processing method according to any one of claims 1 to 6, wherein the audio data processing method further comprises:
and when receiving the audio data to be played sent by the second talkback terminal, playing the audio data to be played.
8. An audio data processing apparatus, characterized in that the audio data processing apparatus comprises:
the determining module is used for determining a filtering threshold value corresponding to audio data currently acquired by a microphone of the first intercom terminal when the first intercom terminal is in a full-duplex mode, wherein the filtering coefficient of a filter corresponding to echo cancellation processing is adjusted according to the filtering threshold value, the noise suppression processing is carried out on the audio data, and the echo cancellation processing is carried out on the audio data after the noise suppression processing through the filter after the coefficient adjustment, so that target audio data are obtained;
the processing module is used for carrying out noise suppression processing and echo cancellation processing on the audio data based on the filtering threshold value so as to obtain target audio data;
and the sending module is used for sending the target audio data to a second talkback terminal so that the second talkback terminal can play the target audio data.
9. An audio data processing apparatus characterized in that the audio data processing apparatus comprises: memory, a processor and an audio data processing program stored on the memory and executable on the processor, which audio data processing program, when executed by the processor, carries out the steps of the audio data processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an audio data processing program which, when executed by a processor, implements the steps of the audio data processing method according to any one of claims 1 to 7.
CN202010552971.2A 2020-06-16 2020-06-16 Audio data processing method, device, equipment and computer readable storage medium Active CN111901704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010552971.2A CN111901704B (en) 2020-06-16 2020-06-16 Audio data processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010552971.2A CN111901704B (en) 2020-06-16 2020-06-16 Audio data processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111901704A CN111901704A (en) 2020-11-06
CN111901704B true CN111901704B (en) 2022-07-22

Family

ID=73206739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010552971.2A Active CN111901704B (en) 2020-06-16 2020-06-16 Audio data processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111901704B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112216297B (en) * 2020-12-10 2021-02-26 全时云商务服务股份有限公司 Processing method, system, medium and device for small VoIP sound of android mobile phone terminal
CN113286228B (en) * 2021-05-28 2022-11-08 北京千丁互联科技有限公司 Building intercom audio frequency automatic adjusting method and device and building intercom equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN102347785A (en) * 2010-07-23 2012-02-08 联芯科技有限公司 Echo elimination method and device
CN102572646A (en) * 2011-12-31 2012-07-11 广东步步高电子工业有限公司 Denoising method and device implemented in state of listening to music by using headset
CN202475574U (en) * 2012-03-12 2012-10-03 杭州艾力特音频技术有限公司 Echo-cancelling talkback equipment
CN104980600A (en) * 2014-04-02 2015-10-14 想象技术有限公司 Auto-tuning Of Non-linear Processor Threshold
CN105913853A (en) * 2016-06-13 2016-08-31 上海盛本智能科技股份有限公司 Near-field cluster intercom echo elimination system and realization method thereof
CN110913312A (en) * 2018-09-17 2020-03-24 海信集团有限公司 Echo cancellation method and device
CN111131645A (en) * 2019-12-24 2020-05-08 河南华启思创智能科技有限公司 Improved NLMS echo cancellation algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9947337B1 (en) * 2017-03-21 2018-04-17 Omnivision Technologies, Inc. Echo cancellation system and method with reduced residual echo
US20190387368A1 (en) * 2018-06-14 2019-12-19 Motorola Solutions, Inc Communication device providing half-duplex and pseudo full-duplex operation using push-to-talk switch

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN102347785A (en) * 2010-07-23 2012-02-08 联芯科技有限公司 Echo elimination method and device
CN102572646A (en) * 2011-12-31 2012-07-11 广东步步高电子工业有限公司 Denoising method and device implemented in state of listening to music by using headset
CN202475574U (en) * 2012-03-12 2012-10-03 杭州艾力特音频技术有限公司 Echo-cancelling talkback equipment
CN104980600A (en) * 2014-04-02 2015-10-14 想象技术有限公司 Auto-tuning Of Non-linear Processor Threshold
CN105913853A (en) * 2016-06-13 2016-08-31 上海盛本智能科技股份有限公司 Near-field cluster intercom echo elimination system and realization method thereof
CN110913312A (en) * 2018-09-17 2020-03-24 海信集团有限公司 Echo cancellation method and device
CN111131645A (en) * 2019-12-24 2020-05-08 河南华启思创智能科技有限公司 Improved NLMS echo cancellation algorithm

Also Published As

Publication number Publication date
CN111901704A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN106910500B (en) Method and device for voice control of device with microphone array
CN111901704B (en) Audio data processing method, device, equipment and computer readable storage medium
US9123352B2 (en) Ambient noise compensation system robust to high excitation noise
CN106164846A (en) Audio signal processing
CN107231473B (en) Audio output regulation and control method, equipment and computer readable storage medium
EP2982101B1 (en) Noise reduction
CN106791067B (en) Call volume adjusting method and device and mobile terminal
JP2002534716A (en) Voice input device with attention period
CN107785027B (en) Audio processing method and electronic equipment
CN109672775B (en) Method, device and terminal for adjusting awakening sensitivity
CN109788125B (en) Interference processing method and device and mobile terminal
CN110855313B (en) Signal control method and electronic equipment
CN108521501B (en) Voice input method, mobile terminal and computer readable storage medium
CN109951602B (en) Vibration control method and mobile terminal
CN111447223A (en) Call processing method and electronic equipment
CN111083297A (en) Echo cancellation method and electronic equipment
CN110634496A (en) Double-talk detection method and device, computer equipment and storage medium
CN117480554A (en) Voice enhancement method and related equipment
US11653184B2 (en) Call prompt method
JP6153020B2 (en) Mobile terminal, program, call system
WO2011033870A1 (en) Communication apparatus
CN109889665B (en) Volume adjusting method, mobile terminal and storage medium
CN112217948B (en) Echo processing method, device, equipment and storage medium for voice call
CN115050382A (en) In-vehicle and out-vehicle voice communication method and device, electronic equipment and storage medium
CN114187906A (en) Vehicle controller and voice awakening method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant