CN112562708B

CN112562708B - Nonlinear echo cancellation method, nonlinear echo cancellation device, electronic device and storage medium

Info

Publication number: CN112562708B
Application number: CN202011284238.3A
Authority: CN
Inventors: 卿睿; 韩润强; 魏建强
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2022-02-25
Anticipated expiration: 2040-11-17
Also published as: CN112562708A

Abstract

The application discloses a nonlinear echo cancellation method, a nonlinear echo cancellation device, electronic equipment and a storage medium, and relates to the artificial intelligence fields of intelligent voice, natural language processing, deep learning and the like, wherein the method comprises the following steps: determining whether each frequency point in the voice signal to be processed accords with a compression condition; and if any frequency point is determined to accord with the compression condition, compressing the signal amplitude of the frequency point. By applying the scheme, smaller voice distortion and the like can be ensured.

Description

Nonlinear echo cancellation method, nonlinear echo cancellation device, electronic device and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to the field of intelligent speech, natural language processing, and deep learning, and more particularly, to a method and apparatus for non-linear echo cancellation, an electronic device, and a storage medium.

Background

With the rapid popularization and development of instant messaging systems, people also put higher and higher requirements on call quality, and the elimination amount of nonlinear echo is an important factor influencing the call quality.

For how to eliminate the nonlinear echo, a method of using an equalizer in a frequency band with poor nonlinearity is currently generally adopted, and all signal amplitudes in the frequency band are directly compressed (reduced). However, this compression method is too hard, and tends to cause large speech distortion.

Disclosure of Invention

The application provides a nonlinear echo cancellation method, a nonlinear echo cancellation device, an electronic device and a storage medium.

A non-linear echo cancellation method, comprising:

determining whether each frequency point in the voice signal to be processed accords with a compression condition;

and if any frequency point is determined to accord with the compression condition, compressing the signal amplitude of the frequency point.

A non-linear echo cancellation device, comprising:

the determining module is used for determining whether each frequency point in the voice signal to be processed accords with a compression condition;

and the compression module is used for compressing the signal amplitude of the frequency point when any frequency point is determined to accord with the compression condition.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described above.

One embodiment in the above application has the following advantages or benefits: whether each frequency point in the voice signal to be processed meets the compression condition or not is respectively determined, and only the frequency points meeting the compression condition can compress the signal amplitude, so that the problem that the compression method in the prior art is too harsh is solved, the compression is more selective and specific, and smaller voice distortion and the like are further ensured.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow chart of a first embodiment of a non-linear echo cancellation method according to the present application;

FIG. 2 is a schematic diagram of a total harmonic distortion curve at different loudness levels as described herein;

FIG. 3 is a flow chart of a second embodiment of a non-linear echo cancellation method according to the present application;

fig. 4 is a schematic diagram illustrating an overall implementation process of the nonlinear echo cancellation method according to the present application;

fig. 5 is a schematic structural diagram of an embodiment of a nonlinear echo cancellation device 50 according to the present application;

FIG. 6 is a block diagram of an electronic device according to the method of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of a first embodiment of a nonlinear echo cancellation method according to the present application. As shown in fig. 1, the following detailed implementation is included.

In step 101, it is determined whether each frequency point in the speech signal to be processed meets the compression condition.

In step 102, if it is determined that any frequency point meets the compression condition, the signal amplitude of the frequency point is compressed.

In the scheme of the embodiment of the method, whether each frequency point in the voice signal to be processed meets the compression condition is determined respectively, and the signal amplitude is compressed only when the frequency point meets the compression condition, so that the problem that the compression method in the prior art is too hard is solved, the compression is more selective and specific, and smaller voice distortion and the like are ensured.

The voice signal to be processed can be a voice signal acquired in real time, and whether the voice signal meets the compression condition or not can be respectively determined aiming at each frequency point.

Preferably, for any frequency point, the signal amplitude of the frequency point may be compared with the suppressing threshold corresponding to the frequency point, if the signal amplitude of the frequency point is greater than the suppressing threshold corresponding to the frequency point, it may be determined that the frequency point meets the compression condition, otherwise, it may be determined that the frequency point does not meet the compression condition.

Different frequency points respectively correspond to respective suppression thresholds, and for any two different frequency points, the corresponding suppression thresholds may be the same or different.

The throttle threshold may be predetermined. Preferably, a Total Harmonic Distortion (THD) curve of a user equipment (e.g., a voice playing device) corresponding to the voice signal to be processed may be obtained, and the suppressing thresholds corresponding to the frequency points are respectively determined according to the obtained Total Harmonic Distortion curve.

In practical application, the total harmonic distortion curves of different types of user equipment can be obtained respectively, for example, for any user equipment, the total harmonic distortion curve of the user equipment can be obtained by performing frequency sweep test on the user equipment according to a test signal, and preferably, the total harmonic distortion curves under different loudness can be obtained.

Fig. 2 is a schematic diagram of a total harmonic distortion curve at different loudness according to the present application. As shown in FIG. 2, the different loudness may include-3 dB, -15dB, -30dB, -50dB, etc., and the different loudness refers to different amplitude suppression manners, for example, the dashed rectangular area shown in FIG. 2, wherein the total harmonic distortion curves respectively correspond to-50 dB, -3dB, -15dB, and-30 dB from top to bottom. It should be noted that the loudness shown in fig. 2 is only an example and is not used to limit the technical solution of the present application, and the specific loudness included may be determined according to actual needs.

For any user equipment, how to determine the suppressing threshold corresponding to each frequency point according to the total harmonic distortion curve can also be determined according to actual needs. Still taking fig. 2 as an example, as a possible implementation manner, for any frequency point, the longitudinal axis values of the frequency point in four total harmonic distortion curves may be respectively obtained, and the loudness corresponding to the minimum value among the values is taken as the throttle threshold corresponding to the frequency point. Of course, any feasible other method may be adopted to determine the suppressing threshold corresponding to each frequency point according to the total harmonic distortion curve, which is only an example here.

By means of the total harmonic distortion curve, the suppression threshold corresponding to each frequency point can be conveniently and accurately determined, the amplitude of the signal can be dynamically adjusted according to the determined suppression threshold, namely, the signal amplitude of the frequency point is compared with the suppression threshold corresponding to the frequency point for each frequency point in the voice signal to be processed, if the signal amplitude of any frequency point is greater than the suppression threshold corresponding to the frequency point, the frequency point can be determined to accord with the compression condition, namely, more harmonic distortion components to be generated by the loudspeaker are considered, the signal amplitude of the frequency point can be compressed, namely, the output signal amplitude of the frequency point is properly reduced, and therefore, the dynamic compression adjustment of nonlinear harmonics is achieved, a large amount of nonlinear harmonic distortion can be eliminated, and small distortion of the voice signal and the like are guaranteed.

Preferably, for any frequency point, when it is determined that the frequency point meets the compression condition and the signal amplitude of the frequency point is compressed, the signal amplitude of the frequency point may be compressed according to the compression ratio corresponding to the frequency point.

The compression ratios corresponding to the frequency points can be preset respectively, the specific values can be determined according to actual needs, and the corresponding compression ratios of any two different frequency points can be the same or different.

Further, the compression modes corresponding to the frequency points may also be preset, and the specific values may also be determined according to actual needs, and similarly, for any two different frequency points, the corresponding compression modes may be the same or different.

Therefore, for any frequency point, when the frequency point is determined to accord with the compression condition, the signal amplitude of the frequency point can be compressed according to the compression proportion and the compression mode corresponding to the frequency point.

The compression mode is usually a signal compression mode between frequency points, or called compression speed, and by setting a reasonable compression mode, the transition between adjacent frequency point signals after compression is smoother and more natural, and the voice quality is improved.

Based on the above description, fig. 3 is a flowchart of a second embodiment of the nonlinear echo cancellation method according to the present application. As shown in fig. 3, the following detailed implementation is included.

In step 301, for the voice signal to be processed, the processing shown in step 302-step 305 is performed for each frequency point.

In step 302, the signal amplitude of the frequency point is compared with the throttle threshold corresponding to the frequency point.

The total harmonic distortion curve of the user equipment corresponding to the voice signal to be processed can be obtained in advance, and the suppression threshold corresponding to each frequency point is determined according to the obtained total harmonic distortion curve.

In step 303, it is determined whether the signal amplitude of the frequency point is greater than the suppression threshold corresponding to the frequency point, if so, step 304 is executed, otherwise, step 305 is executed.

In step 304, the signal amplitude of the frequency point is compressed according to the compression ratio and the compression mode corresponding to the frequency point.

If the signal amplitude of the frequency point is greater than the suppression threshold corresponding to the frequency point, the frequency point can be determined to accord with the compression condition, and therefore the output signal amplitude of the frequency point can be properly reduced.

In step 305, the bin is not processed.

If the signal amplitude of the frequency point is not greater than the suppression threshold corresponding to the frequency point, the frequency point may not be processed.

With the above introduction, fig. 4 is a schematic diagram of an overall implementation process of the nonlinear echo cancellation method according to the present application. As shown in fig. 4, for any user equipment, a total harmonic distortion curve may be generated in advance according to a test signal, a suppressing threshold corresponding to each frequency point may be determined according to the total harmonic distortion curve, and for a real-time voice signal, processing may be performed according to a method (which may be referred to as a dynamic frequency control method, for example) described in this application, and a processing result may be output.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application. In addition, for parts which are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other parts.

The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.

Fig. 5 is a schematic structural diagram of a nonlinear echo cancellation device 50 according to an embodiment of the present application. As shown in fig. 5, includes: a determination module 501 and a compression module 502.

A determining module 501, configured to determine whether each frequency point in the voice signal to be processed meets a compression condition.

A compressing module 502, configured to compress the signal amplitude of any frequency point when it is determined that the frequency point meets the compression condition.

The voice signal to be processed may be a voice signal obtained in real time, and for each frequency point therein, the determining module 501 may determine whether it meets the compression condition.

Preferably, for any frequency point, the determining module 501 may compare the signal amplitude of the frequency point with the suppressing threshold corresponding to the frequency point, and if the signal amplitude of the frequency point is greater than the suppressing threshold corresponding to the frequency point, it may be determined that the frequency point meets the compression condition, otherwise, it may be determined that the frequency point does not meet the compression condition.

The throttle threshold may be predetermined. As shown in fig. 5, the apparatus may further include: the preprocessing module 500 is configured to obtain a total harmonic distortion curve of the user equipment corresponding to the voice signal to be processed, and determine a suppression threshold corresponding to each frequency point according to the obtained total harmonic distortion curve.

In addition, for any user equipment, how to respectively determine the suppressing threshold corresponding to each frequency point according to the total harmonic distortion curve can be determined according to actual needs.

Preferably, for any frequency point, the compression module 502 may compress the signal amplitude of the frequency point according to the compression ratio corresponding to the frequency point when determining that the frequency point meets the compression condition. The compression ratio corresponding to each frequency point can be preset respectively.

Furthermore, the compression modes corresponding to the frequency points can be preset respectively. Thus, for any frequency point, when determining that the frequency point meets the compression condition, the compression module 502 may compress the signal amplitude of the frequency point according to the compression ratio and the compression manner corresponding to the frequency point.

For a specific work flow of the apparatus embodiment shown in fig. 5, reference is made to the related description in the foregoing method embodiment, and details are not repeated.

In a word, by adopting the scheme of the embodiment of the device, whether the frequency points accord with the compression condition or not is respectively determined for each frequency point in the voice signal to be processed, and the signal amplitude is compressed only for the frequency points which accord with the compression condition, so that the problem that the compression method in the prior art is too hard is solved, the compression is more selective and specific, and smaller voice distortion and the like are further ensured.

The scheme can be applied to the field of artificial intelligence, and particularly relates to the fields of intelligent voice, natural language processing, deep learning and the like.

Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information for a graphical user interface on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor Y01 is taken as an example.

Memory Y02 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.

Memory Y02 is provided as a non-transitory computer readable storage medium that can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing, i.e., implements the method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or in another manner, and the connection by the bus is exemplified in fig. 6.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device, a tactile feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A non-linear echo cancellation method, comprising:

determining whether each frequency point in the voice signal to be processed meets the compression condition, including: respectively comparing the signal amplitude of the frequency point with a suppression threshold corresponding to the frequency point aiming at any frequency point, and if the signal amplitude of the frequency point is greater than the suppression threshold corresponding to the frequency point, determining that the frequency point meets the compression condition; the pressing threshold is obtained in advance;

if any frequency point is determined to accord with the compression condition, compressing the signal amplitude of the frequency point;

further comprising: acquiring total harmonic distortion curves of user equipment corresponding to the voice signal to be processed under different loudness, and respectively determining a suppression threshold corresponding to each frequency point according to the total harmonic distortion curves, wherein the suppression thresholds include: and aiming at any frequency point, respectively obtaining the longitudinal axis value of the frequency point in each total harmonic distortion curve, and taking the loudness corresponding to the minimum value as the suppression threshold corresponding to the frequency point.

2. The method of claim 1, wherein,

if it is determined that any frequency point meets the compression condition, compressing the signal amplitude of the frequency point comprises:

and if any frequency point is determined to accord with the compression condition, compressing the signal amplitude of the frequency point according to the compression proportion corresponding to the frequency point.

3. The method of claim 2, further comprising:

and if any frequency point is determined to accord with the compression condition, compressing the signal amplitude of the frequency point according to the compression ratio and the compression mode corresponding to the frequency point.

4. A non-linear echo cancellation device, comprising:

the determining module is used for determining whether each frequency point in the voice signal to be processed meets the compression condition, and comprises the following steps: respectively comparing the signal amplitude of the frequency point with a suppression threshold corresponding to the frequency point aiming at any frequency point, and if the signal amplitude of the frequency point is greater than the suppression threshold corresponding to the frequency point, determining that the frequency point meets the compression condition; the pressing threshold is obtained in advance;

the compression module is used for compressing the signal amplitude of any frequency point when the frequency point is determined to accord with the compression condition;

further comprising: the preprocessing module is configured to obtain total harmonic distortion curves of the user equipment corresponding to the to-be-processed voice signal under different loudness, and determine a suppressing threshold corresponding to each frequency point according to the total harmonic distortion curves, including: and aiming at any frequency point, respectively obtaining the longitudinal axis value of the frequency point in each total harmonic distortion curve, and taking the loudness corresponding to the minimum value as the suppression threshold corresponding to the frequency point.

5. The apparatus of claim 4, wherein,

and the compression module compresses the signal amplitude of the frequency point according to the compression proportion corresponding to the frequency point when determining that any frequency point accords with the compression condition.

6. The apparatus of claim 5, wherein,

and the compression module is further used for compressing the signal amplitude of the frequency point according to the compression proportion and the compression mode corresponding to the frequency point when any frequency point is determined to accord with the compression condition.

7. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-3.