CN113572908A - Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call - Google Patents

Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call Download PDF

Info

Publication number
CN113572908A
CN113572908A CN202110667992.3A CN202110667992A CN113572908A CN 113572908 A CN113572908 A CN 113572908A CN 202110667992 A CN202110667992 A CN 202110667992A CN 113572908 A CN113572908 A CN 113572908A
Authority
CN
China
Prior art keywords
voice
human
vadnn
stream
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110667992.3A
Other languages
Chinese (zh)
Inventor
李旭滨
陈晓松
侯宇明
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunmao Internet Intelligent Technology Xiamen Co ltd
Original Assignee
Yunmao Internet Intelligent Technology Xiamen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunmao Internet Intelligent Technology Xiamen Co ltd filed Critical Yunmao Internet Intelligent Technology Xiamen Co ltd
Priority to CN202110667992.3A priority Critical patent/CN113572908A/en
Publication of CN113572908A publication Critical patent/CN113572908A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS

Abstract

The invention provides a method, a device and a system for reducing noise in VoIP (Voice over Internet protocol) conversation. The invention not only introduces the silence module without human voice, but also replaces the voice of the non-human voice segment with silence, thereby improving the integral signal-to-noise ratio and the conversation quality of the audio; and the VADNN technology is introduced, so that the voice detection precision under the noise environment is higher compared with the universal voice detection technology based on energy.

Description

Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call
Technical Field
One or more embodiments of the present invention relate to the technical field of communications, and in particular, to the field of noise cancellation, and in particular, to a method, an apparatus, and a system for reducing noise in a VoIP call.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Thus, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
VoIP (abbreviated as Voice over Internet Protocol) is to digitize analog signals (Voice) and transmit them in real time on an IP Network (IP Network) in the form of Data packets (Data packets). For example, more and more people choose to use network chat tools for voice chat, where the voice is not transmitted over the traditional telephone network of the telecommunications carrier, but over the internet. The technology for converting Voice into IP data packets, and part or all of which is based on IP network transmission, is VoIP (Voice over IP). The greatest advantage of VoIP is that it can widely adopt the Internet and global IP interconnection environment, providing more and better services than the traditional services. For example, VoIP can be used in many internet access devices including VoIP phones, smart phones, personal computers, to communicate and send messages via cellular networks, Wi-Fi.
Noise cancellation is a very important and troublesome problem in the field of signal processing, and the presence of noise has a great influence on the normal operation of a system. The network communication technology uses digital voice noise reduction to improve the experience of both parties of communication, and the noise reduction mainly comprises echo cancellation, environmental noise suppression, automatic regulation of human voice gain and the like. The noise reduced audio is expected to be clear, lossless and contain no echoes. However, for a noisy use scene in a call environment or a long-distance talkback scene, the voice signal-to-noise ratio is relatively low, and a good processing effect cannot be achieved by a general noise reduction algorithm. The background noise will not only interfere with normal conversation, but also affect the echo cancellation effect and form a continuous conversation echo.
In view of the above, a new noise reduction processing technology is needed to solve the problem of the influence of background noise on the communication quality during the VoIP network communication.
Disclosure of Invention
One or more embodiments of the present specification describe a method, an apparatus, and a system for reducing noise in a VoIP call, where when there is no voice in a noisy environment of a VoIP network call, based on a VADNN algorithm, a background voice is subjected to noise reduction optimization processing to be muted, so as to solve the influence of background noise on the call quality during the VoIP network call, improve the call quality, and improve and optimize user experience.
The technical scheme provided by one or more embodiments of the specification is as follows:
in a first aspect, the present invention provides a method for reducing noise in a VoIP call, wherein based on a VADNN algorithm, a voice of a non-human voice segment in an opposite-end voice stream in a received VoIP call is replaced with mute data, and then an audio output is sent to play.
In a possible implementation manner, the replacing, based on the VADNN algorithm, the voice of the non-human voice segment in the opposite-end voice stream in the received VoIP call with the mute data, and then sending the audio output to play includes the following steps:
acquiring an opposite-end voice flow in a VoIP call;
carrying out voice detection on the voice stream, and dividing the voice stream into a voice section and a non-voice section;
replacing the voice of the non-human voice section with mute data;
and playing the processed voice stream.
In a possible implementation manner, the replacing the voice stream of the non-human voice segment with the mute data specifically includes:
and (3) directly taking the data clear 0 as the voice part of the non-human voice section as mute processing.
In a second aspect, the present invention provides a device for reducing noise in a VoIP call, including a VADNN module and a voice playing module;
the VADNN module is used for replacing the voice of the non-human voice section in the opposite-end voice flow in the received VoIP call with mute data;
and the voice playing module is used for playing the processed voice stream.
In one possible implementation, the VADNN module includes an acquisition unit, a processing unit, and an silence without human voice unit, where
The acquiring unit is used for acquiring an opposite-end voice flow in the VoIP call;
the processing unit is used for carrying out voice detection on the voice stream and dividing the voice stream into a voice section and a non-voice section;
and the silence unit is used for replacing the voice of the non-human voice section with silence data.
In one possible implementation, the silent mute unit has a function of muting the speech part of the non-human speech segment by directly applying the data to the data table 0.
In a third aspect, the present invention provides a system for noise reduction in a VoIP call, the system comprising at least one processor and a memory;
the memory to store one or more program instructions;
the processor is configured to execute one or more program instructions to perform the method according to one or more of the first aspects.
In a fourth aspect, the present invention provides a chip, which is coupled to a memory in a system, so that the chip calls program instructions stored in the memory when running to implement the method according to one or more of the first aspects.
In a fifth aspect, the invention + provides a computer readable storage medium comprising one or more program instructions executable by a system according to the third aspect to implement a method according to one or more of the first aspects.
The technical scheme provided by the embodiment of the invention solves the problems in the prior art, and has the following advantages:
(1) an unmanned sound mute module is introduced, and the voice of the non-human voice segment is replaced by mute, so that the overall signal-to-noise ratio and the call quality of the audio are improved;
(2) compared with a general voice detection technology based on energy, the VADNN technology is introduced, and the voice detection precision under the noise environment is higher.
Drawings
Fig. 1 is a schematic diagram of implementing silent silence based on a VADNN according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for reducing noise in a VoIP call according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for reducing noise in a VoIP call according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a VADNN module according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a system for reducing noise in a VoIP call according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be further noted that, for the convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to solve the problem that background noise affects the call quality in the VoIP network call process, the application provides a method for converting non-human voice call voice into silence based on a deep learning VAD (voice over Internet protocol), namely VADNN (voice over Internet protocol) technology, so that the call quality is improved. The VADNN is a human voice classifier obtained by deep learning method training, does not depend on voice energy, and has recognition accuracy in a noise scene obviously superior to that of the common VAD technology. Fig. 1 is a schematic diagram of implementing silent silence based on a VADNN according to an embodiment of the present invention, as shown in fig. 1, the overall idea of the present application is: based on a VADNN algorithm, the voice of the non-human voice section in the opposite-end voice stream in the received VoIP call is replaced by mute data, and then audio output is sent for playing, so that the interference of external noise of the non-human voice section on the call is eliminated.
Fig. 2 shows a schematic flow chart of a method for reducing noise in a VoIP call according to an embodiment of the present invention, and as shown in fig. 2, based on a VADNN algorithm, a method for reducing noise in a VoIP call includes the following steps:
and step 10, acquiring an opposite-end voice flow in the VoIP call.
Specifically, in the VoIP call, a locally received opposite-end voice stream is intercepted, and is pre-processed by the "silent mute by no person" module, and then flows to the voice playing module instead of directly flowing to the voice playing module.
The preprocessing process is to convert the non-voice call voice in the received opposite end voice stream into silence, and includes steps 20 and 30, which are specifically as follows:
and 20, carrying out voice detection on the voice stream, and dividing the voice stream into a voice section and a non-voice section.
The present application is performed based on a VADNN algorithm, and the "silence by no person" module mentioned in step 10 may be understood as a VADNN algorithm model, which is referred to as a VADNN model for short.
The VADNN model distinguishes whether the voice stream is a voice section or a non-voice section by bisection according to voice, specifically, voice detection is carried out on the voice stream through a VADNN algorithm according to each voice section with fixed size, the voice stream is divided into the voice section and the non-voice section, and label remarking is carried out on the voice section and the non-voice section.
And step 30, replacing the voice of the non-human voice section with mute data.
If the non-human voice section label is not, the data of the voice part of the non-human voice section is directly cleared to 0, namely the non-human voice call voice is converted into silence, and the non-human voice silence processing is completed.
And step 40, playing the processed voice stream.
And sending the processed voice stream to a playing module for playing.
The method provided by the invention solves the problems in the prior art, and has the following advantages:
(1) an unmanned sound mute module is introduced, and the voice of the non-human voice segment is replaced by mute, so that the overall signal-to-noise ratio and the call quality of the audio are improved;
(2) compared with a general voice detection technology based on energy, the VADNN technology is introduced, and the voice detection precision under the noise environment is higher.
Corresponding to the method of the above embodiment, the present invention further provides a device for reducing noise in a VoIP call, fig. 3 is a schematic structural diagram of the device, and as shown in fig. 3, the device includes a VADNN module 1 and a voice playing module 2; in particular, the method comprises the following steps of,
the VADNN module 1 is used for replacing the voice of the non-human voice section in the opposite end voice flow in the received VoIP call with mute data;
and the voice playing module 2 is used for playing the processed voice stream.
In an example, fig. 4 is a schematic structural diagram of a VADNN module, and as shown in fig. 4, the VADNN module 1 includes an obtaining unit 11, a processing unit 12, and an silence mute unit 13, specifically,
the acquiring unit 11 is configured to acquire an opposite-end voice stream in the VoIP call.
The processing unit 12 is configured to perform voice detection on the voice stream, and divide the voice stream into a voice section and a non-voice section.
And the silent unit 13 is configured to replace the voice of the non-human voice segment with mute data.
The functions executed by each component in the apparatus provided in the embodiment of the present invention have been described in detail in the above-mentioned method, and therefore, redundant description is not repeated here.
Corresponding to the above embodiments, the embodiment of the present invention further provides a system for reducing noise in a VoIP call, specifically as shown in fig. 5, the system includes at least one processor 51 and a memory 52;
a memory 51 for storing one or more program instructions;
processor 52 is configured to execute one or more program instructions to perform any of the method steps of a method for noise reduction in a VoIP call as described in the above embodiments.
Corresponding to the above embodiment, an embodiment of the present invention further provides a chip, where the chip is coupled to the memory in the system, so that the chip invokes the program instruction stored in the memory when running, so as to implement the method for reducing noise in a VoIP call as described in the above embodiment.
In correspondence with the above embodiments, the present invention also provides a computer storage medium including one or more programs, where the one or more program instructions are used for executing the above-described method for reducing noise in a VoIP call by a system for reducing noise in a VoIP call.
According to the scheme provided by the application, when no person is in a VoIP network call noise environment, based on the VADNN algorithm, noise reduction and optimization processing are carried out on the background voice to be mute, so that the influence of the background noise on the call quality in the VoIP network call process is solved, the call quality is improved, and the user experience is promoted and optimized.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A method for reducing noise in VoIP call is characterized in that based on VADNN algorithm, the voice of non-human voice section in opposite end voice flow in received VoIP call is replaced by mute data, and then audio output is sent for playing.
2. The method according to claim 1, wherein the step of replacing the non-human voice segment in the opposite end voice stream in the received VoIP call with mute data based on the VADNN algorithm, and then sending an audio output for playing comprises the following steps:
acquiring an opposite-end voice flow in a VoIP call;
carrying out voice detection on the voice stream, and dividing the voice stream into a voice section and a non-voice section;
replacing the voice of the non-human voice section with mute data;
and playing the processed voice stream.
3. The method according to claim 1, wherein the replacing the voice stream of the non-human voice segment with silence data is specifically:
and (3) directly taking the data clear 0 as the voice part of the non-human voice section as mute processing.
4. A device for reducing noise in VoIP conversation is characterized by comprising a VADNN module and a voice playing module;
the VADNN module is used for replacing the voice of the non-human voice section in the opposite-end voice flow in the received VoIP call with mute data;
and the voice playing module is used for playing the processed voice stream.
5. The apparatus of claim 4, wherein the VADNN module comprises an acquisition unit, a processing unit, and an silence-without-human unit, wherein
The acquiring unit is used for acquiring an opposite-end voice flow in the VoIP call;
the processing unit is used for carrying out voice detection on the voice stream and dividing the voice stream into a voice section and a non-voice section;
and the silence unit is used for replacing the voice of the non-human voice section with silence data.
6. The apparatus of claim 5, wherein the silence unit has a function of muting the speech portion of the non-human speech segment by directly applying the data to the data table 0.
7. A system for noise reduction in a VoIP conversation, the system comprising at least one processor and a memory;
the memory to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method according to one or more of claims 1 to 3.
8. A chip, characterized in that it is coupled to a memory in a system such that it, when run, invokes program instructions stored in said memory, implementing the method according to one or more of claims 1 to 3.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises one or more program instructions that are executable by the system of claim 7 to implement the method of one or more of claims 1 to 3.
CN202110667992.3A 2021-06-16 2021-06-16 Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call Pending CN113572908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110667992.3A CN113572908A (en) 2021-06-16 2021-06-16 Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110667992.3A CN113572908A (en) 2021-06-16 2021-06-16 Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call

Publications (1)

Publication Number Publication Date
CN113572908A true CN113572908A (en) 2021-10-29

Family

ID=78162078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110667992.3A Pending CN113572908A (en) 2021-06-16 2021-06-16 Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call

Country Status (1)

Country Link
CN (1) CN113572908A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6865162B1 (en) * 2000-12-06 2005-03-08 Cisco Technology, Inc. Elimination of clipping associated with VAD-directed silence suppression
CN110085251A (en) * 2019-04-26 2019-08-02 腾讯音乐娱乐科技(深圳)有限公司 Voice extracting method, voice extraction element and Related product
CN111179975A (en) * 2020-04-14 2020-05-19 深圳壹账通智能科技有限公司 Voice endpoint detection method for emotion recognition, electronic device and storage medium
CN111243595A (en) * 2019-12-31 2020-06-05 京东数字科技控股有限公司 Information processing method and device
CN111754982A (en) * 2020-06-19 2020-10-09 平安科技(深圳)有限公司 Noise elimination method and device for voice call, electronic equipment and storage medium
CN111883182A (en) * 2020-07-24 2020-11-03 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium
CN112116909A (en) * 2019-06-20 2020-12-22 杭州海康威视数字技术股份有限公司 Voice recognition method, device and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6865162B1 (en) * 2000-12-06 2005-03-08 Cisco Technology, Inc. Elimination of clipping associated with VAD-directed silence suppression
CN110085251A (en) * 2019-04-26 2019-08-02 腾讯音乐娱乐科技(深圳)有限公司 Voice extracting method, voice extraction element and Related product
CN112116909A (en) * 2019-06-20 2020-12-22 杭州海康威视数字技术股份有限公司 Voice recognition method, device and system
CN111243595A (en) * 2019-12-31 2020-06-05 京东数字科技控股有限公司 Information processing method and device
CN111179975A (en) * 2020-04-14 2020-05-19 深圳壹账通智能科技有限公司 Voice endpoint detection method for emotion recognition, electronic device and storage medium
CN111754982A (en) * 2020-06-19 2020-10-09 平安科技(深圳)有限公司 Noise elimination method and device for voice call, electronic equipment and storage medium
CN111883182A (en) * 2020-07-24 2020-11-03 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN100588210C (en) Wireless telephone and method for processing audio single in the wireless telephone
US20040267527A1 (en) Voice-to-text reduction for real time IM/chat/SMS
US8311817B2 (en) Systems and methods for enhancing voice quality in mobile device
US8606573B2 (en) Voice recognition improved accuracy in mobile environments
US10832696B2 (en) Speech signal cascade processing method, terminal, and computer-readable storage medium
EP1154408B1 (en) Multimode speech coding and noise reduction
US20090248411A1 (en) Front-End Noise Reduction for Speech Recognition Engine
US20090168673A1 (en) Method and apparatus for detecting and suppressing echo in packet networks
JP6545419B2 (en) Acoustic signal processing device, acoustic signal processing method, and hands-free communication device
CN109727607B (en) Time delay estimation method and device and electronic equipment
JP2011516901A (en) System, method, and apparatus for context suppression using a receiver
US9491545B2 (en) Methods and devices for reverberation suppression
CN109346098B (en) Echo cancellation method and terminal
GB2503922A (en) A transcription device configured to convert speech into text data in response to a transcription request from a receiving party
EP3078022B1 (en) Multi-path audio processing
CN109040501A (en) A kind of echo cancel method improving VOIP phone quality
US20140365212A1 (en) Receiver Intelligibility Enhancement System
JP2004133403A (en) Sound signal processing apparatus
CN103514876A (en) Method and device for eliminating noise and mobile terminal
EP2158753B1 (en) Selection of audio signals to be mixed in an audio conference
CN113572908A (en) Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call
US20240105198A1 (en) Voice processing method, apparatus and system, smart terminal and electronic device
CN110265061B (en) Method and equipment for translating call voice in real time
KR102344645B1 (en) Method for Provide Real-Time Simultaneous Interpretation Service between Conversators
US11955132B2 (en) Identifying method of sound watermark and sound watermark identifying apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination