CN113572908A

CN113572908A - Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call

Info

Publication number: CN113572908A
Application number: CN202110667992.3A
Authority: CN
Inventors: 李旭滨; 陈晓松; 侯宇明; 刘鹏
Original assignee: Yunmao Internet Intelligent Technology Xiamen Co ltd
Current assignee: Yunmao Internet Intelligent Technology Xiamen Co ltd
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2021-10-29

Abstract

The invention provides a method, a device and a system for reducing noise in VoIP (Voice over Internet protocol) conversation. The invention not only introduces the silence module without human voice, but also replaces the voice of the non-human voice segment with silence, thereby improving the integral signal-to-noise ratio and the conversation quality of the audio; and the VADNN technology is introduced, so that the voice detection precision under the noise environment is higher compared with the universal voice detection technology based on energy.

Description

Method, device and system for reducing noise in VoIP (Voice over Internet protocol) call

Technical Field

One or more embodiments of the present invention relate to the technical field of communications, and in particular, to the field of noise cancellation, and in particular, to a method, an apparatus, and a system for reducing noise in a VoIP call.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Thus, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

VoIP (abbreviated as Voice over Internet Protocol) is to digitize analog signals (Voice) and transmit them in real time on an IP Network (IP Network) in the form of Data packets (Data packets). For example, more and more people choose to use network chat tools for voice chat, where the voice is not transmitted over the traditional telephone network of the telecommunications carrier, but over the internet. The technology for converting Voice into IP data packets, and part or all of which is based on IP network transmission, is VoIP (Voice over IP). The greatest advantage of VoIP is that it can widely adopt the Internet and global IP interconnection environment, providing more and better services than the traditional services. For example, VoIP can be used in many internet access devices including VoIP phones, smart phones, personal computers, to communicate and send messages via cellular networks, Wi-Fi.

Noise cancellation is a very important and troublesome problem in the field of signal processing, and the presence of noise has a great influence on the normal operation of a system. The network communication technology uses digital voice noise reduction to improve the experience of both parties of communication, and the noise reduction mainly comprises echo cancellation, environmental noise suppression, automatic regulation of human voice gain and the like. The noise reduced audio is expected to be clear, lossless and contain no echoes. However, for a noisy use scene in a call environment or a long-distance talkback scene, the voice signal-to-noise ratio is relatively low, and a good processing effect cannot be achieved by a general noise reduction algorithm. The background noise will not only interfere with normal conversation, but also affect the echo cancellation effect and form a continuous conversation echo.

In view of the above, a new noise reduction processing technology is needed to solve the problem of the influence of background noise on the communication quality during the VoIP network communication.

Disclosure of Invention

One or more embodiments of the present specification describe a method, an apparatus, and a system for reducing noise in a VoIP call, where when there is no voice in a noisy environment of a VoIP network call, based on a VADNN algorithm, a background voice is subjected to noise reduction optimization processing to be muted, so as to solve the influence of background noise on the call quality during the VoIP network call, improve the call quality, and improve and optimize user experience.

The technical scheme provided by one or more embodiments of the specification is as follows:

in a first aspect, the present invention provides a method for reducing noise in a VoIP call, wherein based on a VADNN algorithm, a voice of a non-human voice segment in an opposite-end voice stream in a received VoIP call is replaced with mute data, and then an audio output is sent to play.

In a possible implementation manner, the replacing, based on the VADNN algorithm, the voice of the non-human voice segment in the opposite-end voice stream in the received VoIP call with the mute data, and then sending the audio output to play includes the following steps:

acquiring an opposite-end voice flow in a VoIP call;

carrying out voice detection on the voice stream, and dividing the voice stream into a voice section and a non-voice section;

replacing the voice of the non-human voice section with mute data;

and playing the processed voice stream.

In a possible implementation manner, the replacing the voice stream of the non-human voice segment with the mute data specifically includes:

and (3) directly taking the data clear 0 as the voice part of the non-human voice section as mute processing.

In a second aspect, the present invention provides a device for reducing noise in a VoIP call, including a VADNN module and a voice playing module;

the VADNN module is used for replacing the voice of the non-human voice section in the opposite-end voice flow in the received VoIP call with mute data;

and the voice playing module is used for playing the processed voice stream.

In one possible implementation, the VADNN module includes an acquisition unit, a processing unit, and an silence without human voice unit, where

The acquiring unit is used for acquiring an opposite-end voice flow in the VoIP call;

the processing unit is used for carrying out voice detection on the voice stream and dividing the voice stream into a voice section and a non-voice section;

and the silence unit is used for replacing the voice of the non-human voice section with silence data.

In one possible implementation, the silent mute unit has a function of muting the speech part of the non-human speech segment by directly applying the data to the data table 0.

In a third aspect, the present invention provides a system for noise reduction in a VoIP call, the system comprising at least one processor and a memory;

the memory to store one or more program instructions;

the processor is configured to execute one or more program instructions to perform the method according to one or more of the first aspects.

In a fourth aspect, the present invention provides a chip, which is coupled to a memory in a system, so that the chip calls program instructions stored in the memory when running to implement the method according to one or more of the first aspects.

In a fifth aspect, the invention + provides a computer readable storage medium comprising one or more program instructions executable by a system according to the third aspect to implement a method according to one or more of the first aspects.

The technical scheme provided by the embodiment of the invention solves the problems in the prior art, and has the following advantages:

(1) an unmanned sound mute module is introduced, and the voice of the non-human voice segment is replaced by mute, so that the overall signal-to-noise ratio and the call quality of the audio are improved;

(2) compared with a general voice detection technology based on energy, the VADNN technology is introduced, and the voice detection precision under the noise environment is higher.

Drawings

Fig. 1 is a schematic diagram of implementing silent silence based on a VADNN according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for reducing noise in a VoIP call according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a device for reducing noise in a VoIP call according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a VADNN module according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a system for reducing noise in a VoIP call according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be further noted that, for the convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to solve the problem that background noise affects the call quality in the VoIP network call process, the application provides a method for converting non-human voice call voice into silence based on a deep learning VAD (voice over Internet protocol), namely VADNN (voice over Internet protocol) technology, so that the call quality is improved. The VADNN is a human voice classifier obtained by deep learning method training, does not depend on voice energy, and has recognition accuracy in a noise scene obviously superior to that of the common VAD technology. Fig. 1 is a schematic diagram of implementing silent silence based on a VADNN according to an embodiment of the present invention, as shown in fig. 1, the overall idea of the present application is: based on a VADNN algorithm, the voice of the non-human voice section in the opposite-end voice stream in the received VoIP call is replaced by mute data, and then audio output is sent for playing, so that the interference of external noise of the non-human voice section on the call is eliminated.

Fig. 2 shows a schematic flow chart of a method for reducing noise in a VoIP call according to an embodiment of the present invention, and as shown in fig. 2, based on a VADNN algorithm, a method for reducing noise in a VoIP call includes the following steps:

and step 10, acquiring an opposite-end voice flow in the VoIP call.

Specifically, in the VoIP call, a locally received opposite-end voice stream is intercepted, and is pre-processed by the "silent mute by no person" module, and then flows to the voice playing module instead of directly flowing to the voice playing module.

The preprocessing process is to convert the non-voice call voice in the received opposite end voice stream into silence, and includes

steps

20 and 30, which are specifically as follows:

and 20, carrying out voice detection on the voice stream, and dividing the voice stream into a voice section and a non-voice section.

The present application is performed based on a VADNN algorithm, and the "silence by no person" module mentioned in step 10 may be understood as a VADNN algorithm model, which is referred to as a VADNN model for short.

The VADNN model distinguishes whether the voice stream is a voice section or a non-voice section by bisection according to voice, specifically, voice detection is carried out on the voice stream through a VADNN algorithm according to each voice section with fixed size, the voice stream is divided into the voice section and the non-voice section, and label remarking is carried out on the voice section and the non-voice section.

And step 30, replacing the voice of the non-human voice section with mute data.

If the non-human voice section label is not, the data of the voice part of the non-human voice section is directly cleared to 0, namely the non-human voice call voice is converted into silence, and the non-human voice silence processing is completed.

And step 40, playing the processed voice stream.

And sending the processed voice stream to a playing module for playing.

The method provided by the invention solves the problems in the prior art, and has the following advantages:

Corresponding to the method of the above embodiment, the present invention further provides a device for reducing noise in a VoIP call, fig. 3 is a schematic structural diagram of the device, and as shown in fig. 3, the device includes a VADNN module 1 and a voice playing module 2; in particular, the method comprises the following steps of,

the VADNN module 1 is used for replacing the voice of the non-human voice section in the opposite end voice flow in the received VoIP call with mute data;

and the voice playing module 2 is used for playing the processed voice stream.

In an example, fig. 4 is a schematic structural diagram of a VADNN module, and as shown in fig. 4, the VADNN module 1 includes an obtaining unit 11, a processing unit 12, and an silence mute unit 13, specifically,

the acquiring unit 11 is configured to acquire an opposite-end voice stream in the VoIP call.

The processing unit 12 is configured to perform voice detection on the voice stream, and divide the voice stream into a voice section and a non-voice section.

And the silent unit 13 is configured to replace the voice of the non-human voice segment with mute data.

The functions executed by each component in the apparatus provided in the embodiment of the present invention have been described in detail in the above-mentioned method, and therefore, redundant description is not repeated here.

Corresponding to the above embodiments, the embodiment of the present invention further provides a system for reducing noise in a VoIP call, specifically as shown in fig. 5, the system includes at least one processor 51 and a memory 52;

a memory 51 for storing one or more program instructions;

processor 52 is configured to execute one or more program instructions to perform any of the method steps of a method for noise reduction in a VoIP call as described in the above embodiments.

Corresponding to the above embodiment, an embodiment of the present invention further provides a chip, where the chip is coupled to the memory in the system, so that the chip invokes the program instruction stored in the memory when running, so as to implement the method for reducing noise in a VoIP call as described in the above embodiment.

In correspondence with the above embodiments, the present invention also provides a computer storage medium including one or more programs, where the one or more program instructions are used for executing the above-described method for reducing noise in a VoIP call by a system for reducing noise in a VoIP call.

According to the scheme provided by the application, when no person is in a VoIP network call noise environment, based on the VADNN algorithm, noise reduction and optimization processing are carried out on the background voice to be mute, so that the influence of the background noise on the call quality in the VoIP network call process is solved, the call quality is improved, and the user experience is promoted and optimized.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for reducing noise in VoIP call is characterized in that based on VADNN algorithm, the voice of non-human voice section in opposite end voice flow in received VoIP call is replaced by mute data, and then audio output is sent for playing.

2. The method according to claim 1, wherein the step of replacing the non-human voice segment in the opposite end voice stream in the received VoIP call with mute data based on the VADNN algorithm, and then sending an audio output for playing comprises the following steps:

acquiring an opposite-end voice flow in a VoIP call;

replacing the voice of the non-human voice section with mute data;

and playing the processed voice stream.

3. The method according to claim 1, wherein the replacing the voice stream of the non-human voice segment with silence data is specifically:

4. A device for reducing noise in VoIP conversation is characterized by comprising a VADNN module and a voice playing module;

and the voice playing module is used for playing the processed voice stream.

5. The apparatus of claim 4, wherein the VADNN module comprises an acquisition unit, a processing unit, and an silence-without-human unit, wherein

6. The apparatus of claim 5, wherein the silence unit has a function of muting the speech portion of the non-human speech segment by directly applying the data to the data table 0.

7. A system for noise reduction in a VoIP conversation, the system comprising at least one processor and a memory;

the memory to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method according to one or more of claims 1 to 3.

8. A chip, characterized in that it is coupled to a memory in a system such that it, when run, invokes program instructions stored in said memory, implementing the method according to one or more of claims 1 to 3.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises one or more program instructions that are executable by the system of claim 7 to implement the method of one or more of claims 1 to 3.