CN112055284A

CN112055284A - Echo cancellation method, neural network training method, apparatus, medium, and device

Info

Publication number: CN112055284A
Application number: CN201910489616.2A
Authority: CN
Inventors: 胡玉祥
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2020-12-08
Anticipated expiration: 2039-06-05
Also published as: CN112055284B

Abstract

The embodiment of the disclosure discloses an echo cancellation method, a training device, a training medium and a training device of a neural network, wherein the echo cancellation method comprises the steps of obtaining an excitation voltage value input to a loudspeaker; processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value; because the excitation voltage value is directly used as a reference for eliminating the echo output by the loudspeaker based on the sound pressure signal value, the nonlinear component introduced by the nonlinear distortion of the loudspeaker system cannot be effectively eliminated, and the sound pressure signal value in the embodiment of the disclosure is the sound pressure signal which is estimated by using a machine learning method and subjected to the nonlinear distortion, the sound pressure signal value can be used as a reference signal on the premise of not increasing hardware, and the processing capability of the nonlinear distortion of the loudspeaker is effectively improved; and because extra hardware is not added, the signal acquisition saturation phenomenon can not occur.

Description

Echo cancellation method, neural network training method, apparatus, medium, and device

Technical Field

The present disclosure relates to sound processing technologies, and in particular, to an echo cancellation method, a neural network training method, an apparatus, a medium, and a device.

Background

In audio equipment such as intelligent sound boxes, voice signals collected by a microphone are often interfered by sound played by a local loudspeaker, and the echo interference directly influences voice collection quality. For this reason, in the prior art, an echo cancellation algorithm is used to cancel an echo signal in a collected voice signal, usually using an excitation voltage signal of a speaker as a reference signal. However, when the sound played by the speaker is large, due to the nonlinear characteristic of the speaker system, the signal played by the speaker has a large difference from the excitation voltage signal, and at this time, the excitation voltage signal of the speaker is used as a reference signal, so that effective echo cancellation cannot be realized.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides an echo cancellation method and device, a storage medium and an electronic device.

According to an aspect of the embodiments of the present disclosure, there is provided an echo cancellation method, including:

obtaining an excitation voltage value input to a speaker;

processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value;

and carrying out echo cancellation on the loudspeaker based on the sound pressure signal value.

According to another aspect of the embodiments of the present disclosure, there is provided a training method of a neural network, including:

obtaining a sample excitation voltage set, the sample excitation voltage set comprising a plurality of sample excitation voltage values, each of the sample excitation voltage values comprising a corresponding true acoustic pressure signal value;

training the first neural network based on the sample excitation voltage set.

According to still another aspect of the embodiments of the present disclosure, there is provided an echo cancellation device including:

an excitation voltage determination module for obtaining an excitation voltage value input to the speaker;

the sound pressure signal determining module is used for processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value;

and the echo cancellation module is used for performing echo cancellation on the loudspeaker based on the sound pressure signal value determined by the sound pressure signal determination module.

According to still another aspect of the embodiments of the present disclosure, there is provided a training apparatus for a neural network, including:

a sample set obtaining module, configured to obtain a sample excitation voltage set, where the sample excitation voltage set includes a plurality of sample excitation voltage values, and each sample excitation voltage value includes a corresponding true acoustic pressure signal value;

and the network training module is used for training the first neural network based on the sample excitation voltage set obtained by the sample set acquisition module.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the echo cancellation method of the above embodiments or executing the training method of the neural network of the above embodiments.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the echo cancellation method according to the foregoing embodiment or implement the training method of the neural network according to the foregoing embodiment.

Based on the echo cancellation method and the training method, device, medium and equipment of the neural network provided by the embodiment of the disclosure, the excitation voltage value input to the loudspeaker is obtained; processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value; the method comprises the steps that echo cancellation is carried out on a loudspeaker based on a sound pressure signal value, because the excitation voltage value is directly used as a reference, a nonlinear component introduced by nonlinear distortion of a loudspeaker system cannot be effectively eliminated, and the sound pressure signal value in the embodiment of the disclosure is a sound pressure signal which is estimated by using a machine learning method and subjected to nonlinear distortion, the sound pressure signal value is used as a reference signal on the premise of not increasing hardware, and the processing capacity of the nonlinear distortion of the loudspeaker is effectively improved; and because extra hardware is not added, the signal acquisition saturation phenomenon can not occur.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic structural diagram of an echo cancellation system to which the present disclosure is applicable.

Fig. 2 is a flowchart illustrating an echo cancellation method according to an exemplary embodiment of the disclosure.

Fig. 3 is a flowchart illustrating a training method of a neural network according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic flow chart of step 302 in the embodiment shown in fig. 3 of the present disclosure.

Fig. 5 is a schematic flow chart of step 3022 in the embodiment shown in fig. 4 of the present disclosure.

Fig. 6 is a schematic flow chart of step 301 in the embodiment shown in fig. 3 of the present disclosure.

Fig. 7 is a schematic structural diagram of an echo cancellation device according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of a training apparatus for a neural network according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of a training apparatus for a neural network according to another exemplary embodiment of the present disclosure.

Fig. 10 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

In the course of implementing the present disclosure, the inventor finds that the existing echo cancellation scheme is to directly acquire a sound pressure signal containing nonlinear distortion by installing a microphone at the near end of a speaker, and use the signal as a reference signal for echo cancellation. This solution has at least the following problems: requiring the use of additional microphones, increasing hardware costs, and causing distortion of the desired speech.

Exemplary System

Fig. 1 is a schematic structural diagram of an echo cancellation system to which the present disclosure is applicable. As shown in FIG. 1, wherein x_micFor signals picked up by the microphone, u_eFor exciting a voltage signal for the loudspeaker, p_estFor the estimated sound pressure signal after nonlinear distortion, y is the output signal after Echo Cancellation, AEC is Acoustic Echo Cancellation (Acoustic Echo Cancellation) for realizing the Acoustic Echo Cancellation based on the sound pressure signal p_estFor signal x_micAnd carrying out echo cancellation to obtain an output signal y.

The model of the nonlinear loudspeaker system in the echo cancellation system in fig. 1 is obtained by machine learning, and optionally, the structure of the model of the nonlinear loudspeaker system may be considered as a deep neural network, and the embodiment does not limit the specific structure of the model of the nonlinear loudspeaker system; the input signal of the nonlinear loudspeaker system model is the excitation voltage u of the loudspeaker_e(n) the output signal is a sound pressure signal p emitted by a loudspeaker which is synchronously collected_est(n) in the training phase, by comparing the predicted sound pressure signal p_est(n) and the measured sound pressure signal p_mea(n) to achieve the training of the nonlinear loudspeaker system model and to obtain the nonlinear loudspeaker system model with higher accuracy after training, wherein the actually measured sound pressure signal p_meaAnd (n) can be obtained through actual measurement of the microphone, the microphone is not needed after modeling (training) is completed, and the output sound pressure signal can be predicted according to the excitation voltage signal. The accuracy of the nonlinear loudspeaker system model may be calculated by using various evaluation indexes (e.g., loss of the neural network), for example, the loss of the nonlinear loudspeaker system model may be calculated by the following formula (1) or (2):

wherein p is_mea(n)、p_est(n), and p_meanRespectively representing the measured sound pressure signal, the sound pressure signal predicted by the nonlinear loudspeaker system model and the mean value of the measured sound pressure signal, wherein N is a time index, N is the total length of the observation signal, and the observation signal is an excitation voltage signal and a sound pressure signal which are synchronously acquired within a period of time.

When modeling a nonlinear loudspeaker system model using a machine learning method, a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and a long-term neural network (LSTM) may be used. Taking DNN as an example, the input layer is an excitation voltage signal of 1 second before, the numbers of neurons of the two hidden layers are 256 and 64, respectively, the output layer is a predicted sound pressure signal, and the error is a difference between the predicted sound pressure signal and a sound pressure signal actually measured by the microphone. The DNN and CNN are used to predict the sound pressure signal at the present time, which requires a long time before, and the RNN and LSTM structures can be used to reduce the data amount of the input layer.

The sound signals played by the loudspeakers during modeling comprise: white noise, voice signals, music, frequency sweep signals, single frequency signals, and the like. When the collected data volume is large enough, a nonlinear loudspeaker system model which is accurate enough can be obtained, and the excitation voltage signal u at two ends of the loudspeaker is used_eThe sound pressure signal p after nonlinear distortion can be accurately estimated_est. For the loudspeaker units in the same batch, the unit consistency is good, and the excitation voltage and sound pressure signals of a plurality of loudspeakers can be synchronously collected to be used as training data, so that a universal nonlinear model is obtained.

Exemplary method

Fig. 2 is a flowchart illustrating an echo cancellation method according to an exemplary embodiment of the disclosure. The embodiment can be applied to an electronic device, as shown in fig. 2, and includes the following steps:

in step 201, an excitation voltage value input to a speaker is obtained.

The excitation voltage value is a voltage value input into the loudspeaker, and can be obtained through a voltmeter or monitoring equipment based on an input end.

Step 202, processing the excitation voltage value by using a first neural network to obtain a sound pressure signal value.

In one embodiment, the sound pressure signal value may be a non-linearly distorted sound pressure signal predicted by the first neural network.

And step 203, performing echo cancellation on the loudspeaker based on the sound pressure signal value.

For example, unlike the prior art in which the excitation voltage value is used as the reference information, the present embodiment uses the sound pressure signal value as the reference signal to implement echo cancellation for the speaker.

The echo cancellation method provided by the present disclosure obtains an excitation voltage value input to a speaker; processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value; because the excitation voltage value is directly used as a reference for eliminating the echo output by the loudspeaker based on the sound pressure signal value, the nonlinear component introduced by the nonlinear distortion of the loudspeaker system cannot be effectively eliminated, and the sound pressure signal value in the embodiment of the disclosure is the sound pressure signal which is estimated by using a machine learning method and subjected to the nonlinear distortion, the sound pressure signal value can be used as a reference signal on the premise of not increasing hardware, and the processing capability of the nonlinear distortion of the loudspeaker is effectively improved; and because extra hardware is not added, the signal acquisition saturation phenomenon can not occur.

Optionally, step 203 comprises: and taking the sound pressure signal value as a reference signal, and eliminating the echo output by the loudspeaker based on the reference signal.

Optionally, the first neural network for obtaining the sound pressure signal value may be a nonlinear loudspeaker system model shown in fig. 1, the sound pressure signal value obtained through the nonlinear loudspeaker system model includes a nonlinear distortion signal, the present embodiment uses the sound pressure signal including the nonlinear distortion as a reference signal for echo cancellation, and optionally, echo cancellation may be performed on the loudspeaker by using the sound pressure signal including the nonlinear distortion as the reference signal for echo cancellation through AEC in fig. 1, so as to enhance the suppression capability of the echo cancellation system on the nonlinear distortion.

Fig. 3 is a flowchart illustrating a training method of a neural network according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 3, and includes the following steps:

step 301, a sample excitation voltage set is obtained.

Wherein the sample excitation voltage set comprises a plurality of sample excitation voltage values, each sample excitation voltage value comprising a corresponding true acoustic pressure signal value.

Step 302, a first neural network is trained based on a sample excitation voltage set.

According to the training method of the neural network, the first neural network is trained through the sample excitation voltage set, so that the first neural network meets the requirement that a sound pressure signal value can be output based on an excitation voltage value, the efficiency of realizing echo cancellation based on the first neural network is improved, in the training process, when the collected data volume (the number of the sample excitation voltage values included in the sample excitation voltage set) is large enough, a nonlinear loudspeaker system model (the first neural network in the corresponding embodiment) which is accurate enough can be obtained, at the moment, the sound pressure signal value after nonlinear distortion can be accurately estimated according to the excitation voltage values at two ends of a loudspeaker, and further, an ideal reference signal is provided for the echo cancellation method.

As shown in fig. 4, based on the embodiment shown in fig. 3, step 302 may include the following steps:

step 3021, processing each sample excitation voltage value in the sample excitation voltage set by using a first neural network, and obtaining a plurality of predicted sound pressure signal values.

Step 3022, adjusting the network parameter of the first neural network based on the predicted sound pressure signal value and the actual sound pressure signal value.

The first neural network provided in this embodiment, optionally, may be obtained by machine learning from the nonlinear loudspeaker system model provided in fig. 1, and the training process may be a process of training with reference to the nonlinear loudspeaker system model provided in fig. 1, taking the sample excitation voltage value as an input signal, outputting a signal as a predicted sound pressure signal, meanwhile, the sound pressure signal output by the loudspeaker is actually measured (for example, the sound pressure signal is measured by adding a microphone), the accuracy of the nonlinear loudspeaker system model is judged by comparing the difference between the predicted sound pressure signal and the actually measured sound pressure signal, the network parameters of the first neural network are adjusted by the difference between the predicted sound pressure signal and the measured sound pressure signal, the accuracy of the sound pressure signal predicted by the first neural network can be improved, and the effect of echo cancellation by applying the first neural network can be further improved.

As shown in fig. 5, based on the embodiment shown in fig. 4, the step 3022 may include the following steps:

step 501, determining a network loss based on a plurality of predicted sound pressure signal values and a plurality of real sound pressure signal values corresponding to a sample excitation voltage set.

In an alternative embodiment, step 501 comprises: obtaining a signal difference corresponding to each sample excitation voltage value based on a predicted sound pressure signal value corresponding to each sample excitation voltage value and a real sound pressure signal value corresponding to the predicted sound pressure signal value; the network loss is obtained based on a plurality of signal differences corresponding to a plurality of sample excitation voltage values in the sample excitation voltage set.

Optionally, in this embodiment, a specific formula for determining the network loss may refer to the above formula (1) or formula (2), and the like, and this embodiment does not limit the method for specifically obtaining the network loss. In this embodiment, the network loss is determined by using the signal difference between the predicted sound pressure signal value and the corresponding real sound pressure signal value, and the difference between the predicted sound pressure signal value and the real sound pressure signal value predicted by the first neural network can be gradually reduced through iterative training, so as to improve the prediction performance of the first neural network.

Step 502, adjusting network parameters of the first neural network based on the network loss.

Optionally, the network parameters of the first neural network may be adjusted by using network loss in a back gradient propagation manner, so that each layer of network layer in the first neural network achieves a training purpose, thereby improving the performance of the first neural network, i.e., obtaining a more accurate predicted sound pressure signal value for the output excitation voltage value.

As shown in fig. 6, based on the embodiment shown in fig. 3, step 301 may include the following steps:

step 3011, collecting excitation voltage values input to the speaker in a set time period, and obtaining a plurality of excitation voltage values as a sample excitation voltage set.

Step 3012, collecting sound pressure signal values output by the speaker in a set time period, obtaining a plurality of sound pressure signal values corresponding to a plurality of excitation voltage values, and using the sound pressure signal values as real sound pressure signal values of the excitation voltage values.

In this embodiment, by obtaining an excitation voltage value and a sound pressure signal value of a set time period, a signal is observed within a period of time, and by synchronously acquiring an excitation voltage signal and a sound pressure signal, a sample excitation voltage set is acquired, for example, a formula (1) or (2) is used as a formula for acquiring a network loss, where N is a total length of an observed signal, and in this embodiment, a value of N is the set time period, and at this time, N in the formula (1) or (2) is used as a time index, and by obtaining a continuous network loss for all excitation voltage signals and sound pressure signals acquired within the set time, training of a first neural network is implemented, so as to improve accuracy of processing a continuous excitation voltage signal by the first neural network.

Any of the echo cancellation methods or the training method of the neural network provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capability, including but not limited to: terminal equipment, a server and the like. Alternatively, any one of the echo cancellation methods or the training method of the neural network provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor may execute any one of the echo cancellation methods or the training method of the neural network mentioned in the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. And will not be described in detail below.

Exemplary devices

Fig. 7 is a schematic structural diagram of an echo cancellation device according to an exemplary embodiment of the present disclosure. The device provided by the embodiment comprises:

and an excitation voltage determining module 71, configured to obtain an excitation voltage value input to the speaker.

And the sound pressure signal determining module 72 is configured to process the excitation voltage value obtained by the excitation voltage determining module 71 by using a first neural network to obtain a sound pressure signal value.

And an echo cancellation module 73, configured to perform echo cancellation on the speaker based on the sound pressure signal value obtained by the sound signal determination module 72.

The echo cancellation device provided by the present disclosure obtains an excitation voltage value input to a speaker; processing the excitation voltage value by utilizing a first neural network to obtain a sound pressure signal value; on the basis of the elimination of the echo output by the loudspeaker by the sound pressure signal value, the sound pressure signal after nonlinear distortion is obtained by using a machine learning method on the premise of not increasing hardware, and the processing capability of the nonlinear distortion of the loudspeaker is effectively improved by taking the sound pressure signal value as a reference signal; and because extra hardware is not added, the signal acquisition saturation phenomenon can not occur.

Optionally, the echo cancellation module 73 is specifically configured to cancel the echo output by the speaker based on the reference signal, with the sound pressure signal value as the reference signal.

Fig. 8 is a schematic structural diagram of a training apparatus for a neural network according to an exemplary embodiment of the present disclosure. The device provided by the embodiment comprises:

and a sample set obtaining module 81, configured to obtain a sample excitation voltage set.

And a network training module 82, configured to train the first neural network based on the sample excitation voltage set obtained by the sample set obtaining module 81.

The training device of the neural network provided by the disclosure trains the first neural network through the sample excitation voltage set, so that the first neural network meets the requirement that the sound pressure signal value can be output based on the excitation voltage value, the efficiency of realizing echo cancellation based on the first neural network is improved, in the training process, when the collected data volume (the number of the sample excitation voltage values included in the sample excitation voltage set) is large enough, a sufficiently accurate nonlinear loudspeaker system model (the first neural network in the corresponding embodiment) can be obtained, at the moment, according to the excitation voltage values at two ends of the loudspeaker, the sound pressure signal value after nonlinear distortion can be accurately estimated, and further, a more ideal reference signal is provided for the echo cancellation method.

Fig. 9 is a schematic structural diagram of a training apparatus for a neural network according to another exemplary embodiment of the present disclosure. The device provided by the embodiment comprises:

in this embodiment, the sample set acquisition module 81 includes:

the input voltage acquisition unit 811 is configured to acquire excitation voltage values input to the speaker in a set time period, and obtain a plurality of excitation voltage values as a sample excitation voltage set.

The output voltage acquisition unit 812 is configured to acquire a sound pressure signal value output by the speaker in a set time period, obtain a plurality of sound pressure signal values corresponding to the plurality of excitation voltage values, and use the sound pressure signal value as a true sound pressure signal value of the excitation voltage values.

In this embodiment, the network training module 82 includes:

a signal prediction unit 821, configured to process each sample excitation voltage value in the sample excitation voltage set by using the first neural network, respectively, to obtain a plurality of predicted sound pressure signal values.

A parameter adjusting unit 822, configured to adjust a network parameter of the first neural network based on the predicted sound pressure signal value and the actual sound pressure signal value.

Optionally, the parameter adjusting unit 822 is specifically configured to determine a network loss based on a plurality of predicted sound pressure signal values and a plurality of real sound pressure signal values corresponding to the sample excitation voltage set; network parameters of the first neural network are adjusted based on the network loss.

Optionally, the parameter adjusting unit 822 is configured to, in the process of determining the network loss, obtain a signal difference corresponding to each sample excitation voltage value based on the predicted sound pressure signal value corresponding to each sample excitation voltage value and the real sound pressure signal value corresponding to the predicted sound pressure signal value; the network loss is obtained based on a plurality of signal differences corresponding to a plurality of sample excitation voltage values in the sample excitation voltage set.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 10. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.

FIG. 10 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 10, the electronic device 10 includes one or more processors 101 and memory 102.

The processor 101 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 102 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 101 to implement the echo cancellation methods or the neural network training methods of the various embodiments of the present disclosure described above, and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 103 and an output device 104, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 103 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 103 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.

The input device 103 may also include, for example, a keyboard, a mouse, and the like.

The output device 104 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 104 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present disclosure are shown in fig. 10, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the echo cancellation method or the training method of a neural network according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the echo cancellation method or the training method of the neural network according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An echo cancellation method, comprising:

obtaining an excitation voltage value input to a speaker;

2. The method of claim 1, the echo canceling the loudspeaker based on the sound pressure signal value, comprising:

and taking the sound pressure signal value as a reference signal, and eliminating the echo output by the loudspeaker based on the reference signal.

3. A method of training a neural network, comprising:

training the first neural network based on the sample excitation voltage set.

4. The training method of claim 3, wherein the training the first neural network based on a sample set of excitation voltages comprises:

processing each sample excitation voltage value in the sample excitation voltage set by using the first neural network respectively to obtain a plurality of predicted sound pressure signal values;

adjusting a network parameter of the first neural network based on the predicted sound pressure signal value and the true sound pressure signal value.

5. The training method of claim 4, wherein the adjusting the network parameters of the first neural network based on the predicted sound pressure signal values and the true sound pressure signal values comprises:

determining a network loss based on a plurality of the predicted sound pressure signal values and a plurality of the true sound pressure signal values corresponding to the sample excitation voltage set;

adjusting a network parameter of the first neural network based on the network loss.

6. The training method of claim 5, wherein said determining a network loss based on a plurality of said predicted sound pressure signal values and a plurality of said true sound pressure signal values for said sample excitation voltage set comprises:

obtaining a signal difference corresponding to each sample excitation voltage value based on the predicted sound pressure signal value corresponding to each sample excitation voltage value and the real sound pressure signal value corresponding to the predicted sound pressure signal value;

and obtaining the network loss based on a plurality of signal differences corresponding to a plurality of sample excitation voltage values in the sample excitation voltage set.

7. The training method of any one of claims 3-6, the obtaining a sample excitation voltage set comprising:

collecting excitation voltage values input into the loudspeaker in a set time period, and obtaining a plurality of excitation voltage values as the sample excitation voltage set;

and collecting sound pressure signal values output by the loudspeaker in the set time period, obtaining a plurality of sound pressure signal values corresponding to the plurality of excitation voltage values, and taking the sound pressure signal values as real sound pressure signal values of the excitation voltage values.

8. An echo cancellation device, comprising:

the sound pressure signal determining module is used for processing the excitation voltage value obtained by the excitation voltage determining module by utilizing a first neural network to obtain a sound pressure signal value;

and the echo cancellation module is used for carrying out echo cancellation on the loudspeaker based on the sound pressure signal value obtained by the sound pressure signal determination module.

9. An apparatus for training a neural network, comprising:

10. A computer-readable storage medium storing a computer program for executing the echo cancellation method according to claim 1 or 2 or the training method for a neural network according to any one of claims 3 to 7.

11. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the echo cancellation method of claim 1 or 2, or to implement the training method of the neural network of any one of claims 3 to 7.