CN113113035A

CN113113035A - Audio signal processing method, device and system and electronic equipment

Info

Publication number: CN113113035A
Application number: CN202010026449.0A
Authority: CN
Inventors: 纳跃跃; 刘章; 李韵; 王子腾; 田彪; 付强; 杨智慧; 马骁
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2021-07-13

Abstract

The application provides an audio signal processing method, which comprises the following steps: obtaining a first audio signal, wherein the first audio signal comprises a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal and a noise audio signal; obtaining an audio separation model for separating the linear echo audio signal from the first audio signal; and eliminating the linear echo audio signal in the first audio signal by utilizing the audio separation model to obtain a second audio signal comprising the nonlinear echo audio signal, the useful audio signal and the noise audio signal. The audio signal processing method provided by the application can eliminate the linear echo audio signal in the audio signal collected by the equipment by using the audio separation model, and obtain the audio signal comprising the nonlinear echo audio signal, the useful audio signal and the noise audio signal, so that the elimination efficiency of the linear echo audio signal in the audio signal collected by the equipment is improved.

Description

Audio signal processing method, device and system and electronic equipment

Technical Field

The application relates to the field of audio signal processing, and particularly provides an audio signal processing method; the application also provides an audio signal processing device, an audio signal processing system, an electronic device and a storage medium.

Background

With the development of related technologies such as computer technology and internet technology, most of the intelligent devices used by people in daily life are developed towards miniaturization and wearability. Due to the miniaturization and the wearable of the intelligent device, a single man-machine interaction mode realized through a keyboard, a mouse, a remote controller and the like cannot meet the man-machine interaction requirements of a user and the miniaturized and wearable intelligent device, and due to the convenience of voice in the man-person interaction process, the man-machine voice interaction begins to show a brand-new corner in the man-machine interaction.

In the human-computer voice interaction process, the intelligent device needs to collect audio signals related to user instructions and further makes corresponding feedback according to the user instructions, so that human-computer voice interaction is achieved. However, in an actual human-computer interaction scenario, the audio signal collected by the smart device is not only the audio signal related to the user instruction, but also includes other audio signals, such as: the first audio signal comprises a linear echo audio signal, a nonlinear echo audio signal, a noise audio signal and the like, and the other audio signals can influence the effective proceeding of human-computer voice interaction.

In order to ensure effective performance of human-computer voice interaction, the prior art generally adopts an NLMS (Normalized Least Mean Square adaptive filter) method to eliminate a linear echo audio signal in an audio signal collected by a device. The principle of eliminating the echo of the equipment by the NLMS method is as follows: the linear echo path is simulated by adjusting the adaptive filter, so that the simulated linear echo path is approximate to the actual linear echo path, thereby obtaining a prediction signal of the linear echo, and then the prediction signal of the linear echo is subtracted from the audio signal collected by the equipment, so that the elimination of the linear echo in the audio signal collected by the equipment can be realized. However, when there is a useful signal in the audio signal acquired by the existing method for eliminating the linear echo audio signal in the audio signal, the adaptive filter completely stops updating or continuously performs slow updating, and when the adaptive filter completely stops updating or continuously performs slow updating, the linear echo audio signal in the audio signal acquired by the equipment cannot be effectively eliminated, so that the existing method for eliminating the linear echo audio signal in the audio signal has poor efficiency for eliminating the linear echo audio signal in the audio signal acquired by the equipment.

Disclosure of Invention

The application provides an audio signal processing method to improve the elimination efficiency of linear echo audio signals in audio signals collected by equipment.

The present application provides an audio signal processing method, comprising:

obtaining a first audio signal, wherein the first audio signal comprises a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device;

obtaining an audio separation model for separating the linear echo audio signal from the first audio signal;

and eliminating the linear echo audio signal in the first audio signal by utilizing the audio separation model to obtain a second audio signal comprising the nonlinear echo audio signal, the useful audio signal and the noise audio signal.

Optionally, the obtaining an audio separation model for separating the linear echo audio signal from the first audio signal includes:

obtaining filter parameters and an identity matrix in the target device;

and generating a separation matrix according to the filter parameters and the identity matrix, and taking the separation matrix as the audio separation model.

Optionally, the eliminating the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal including the nonlinear echo audio signal, the useful audio signal, and the noise audio signal includes: and multiplying the separation matrix with the first audio signal matrix to generate a second audio signal matrix.

Optionally, the method further includes:

obtaining information of an echo path through which the audio signal sent by the target device passes, and obtaining a unit matrix;

generating a mixed matrix according to the information of the echo path and the identity matrix;

obtaining a second analog audio signal matrix, the second analog audio signal matrix being a matrix generated for analog audio signals of the second audio signal;

multiplying the mixing matrix and a second analog audio signal matrix to generate the first audio signal matrix.

Optionally, the obtaining a second analog audio signal matrix includes:

obtaining a reference audio signal, wherein the reference audio signal is a preset analog audio signal of an audio signal sent by the target equipment and used for information interaction with a user;

obtaining a second analog audio signal generated for the second audio signal;

and obtaining a second analog audio signal matrix according to the second analog audio signal and the reference audio signal.

Optionally, the obtaining the filter parameter and the identity matrix in the target device includes:

obtaining a first audio separation matrix, wherein the first audio separation matrix is an identity matrix corresponding to the separation matrix;

multiplying the first audio separation matrix with the first audio signal matrix to obtain a first candidate audio signal matrix;

determining whether a modulus of the first candidate audio signal matrix is greater than a modulus of the first audio signal matrix;

and if so, taking the elements in the first audio separation matrix as filter parameters in the target device.

Optionally, the method further includes:

if the modulus of the first candidate audio signal matrix is not larger than the modulus of the first audio signal matrix, obtaining a second audio separation matrix according to the first audio separation matrix;

multiplying the second audio separation matrix with the first audio signal matrix to obtain a second candidate audio signal matrix;

and judging whether the modulus of the second candidate audio signal matrix is larger than the modulus of the first audio signal matrix, if so, taking elements in the second audio separation matrix as filter parameters in the target equipment, and if not, repeating the steps until the modulus of the candidate audio signal matrix is larger than the modulus of the first audio signal matrix.

Optionally, the obtaining a second audio separation matrix according to the first audio separation matrix includes:

obtaining a preset first weighted correlation matrix;

obtaining a second weighted correlation matrix according to the preset first weighted correlation matrix, the first candidate audio signal matrix and the first audio signal matrix;

and obtaining a second audio separation matrix according to the second weighted correlation matrix and the first audio separation matrix.

Optionally, the obtaining a second weighted correlation matrix according to the preset first weighted correlation matrix, the first candidate audio signal matrix, and the first audio signal matrix includes:

obtaining a preset forgetting factor and a reference audio signal;

performing a non-linear transformation on the first candidate audio signal matrix;

and obtaining the second weighted correlation matrix according to the preset forgetting factor, the reference audio signal and the first candidate audio signal matrix after nonlinear transformation.

Optionally, the obtaining a second audio separation matrix according to the second weighted correlation matrix and the first audio separation matrix includes:

obtaining a first filter parameter matrix according to the second weighted correlation matrix;

replacing elements in the first audio separation matrix with elements in the first filter parameter matrix to obtain the second audio separation matrix. 11. The audio signal processing method according to claim 1, further comprising: outputting the second audio signal.

Optionally, the method further includes: and carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal.

In another aspect of the present application, there is provided an audio signal processing apparatus including:

a first audio signal obtaining unit, configured to obtain a first audio signal, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device;

an audio separation model obtaining unit configured to obtain an audio separation model for separating the linear echo audio signal from the first audio signal;

a second audio signal obtaining unit, configured to eliminate the linear echo audio signal in the first audio signal by using the audio separation model, and obtain a second audio signal including the nonlinear echo audio signal, the useful audio signal, and the noise audio signal.

In another aspect of the present application, an electronic device is provided, including:

a processor;

a memory for storing a program of an audio signal processing method, the apparatus performing the following steps after being powered on and running the program of the audio signal processing method by the processor:

In another aspect of the present application, there is provided a storage device storing a program of an audio signal processing method, the program being executed by a processor to perform the steps of:

In another aspect of the present application, there is provided an audio signal processing system including: the linear echo cancellation module and the audio signal separation module;

the linear echo cancellation module is configured to obtain a first audio signal, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device; obtaining an audio separation model for separating the linear echo audio signal from the first audio signal; eliminating the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal comprising the nonlinear echo audio signal, a useful audio signal and the noise audio signal; outputting the second audio signal;

the audio signal separation module is configured to obtain the second audio signal output by the linear echo cancellation module; carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal; and outputting the useful audio signal.

Optionally, the method further includes: and the target audio signal separation module is used for carrying out voice separation on the useful audio signals to obtain a plurality of target audio signals.

In another aspect of the present application, a smart tv is provided, including: pickup apparatus and linear echo cancellation apparatus, wherein the linear echo cancellation apparatus includes: the audio separation module comprises an audio separation model building module and an audio separation module;

the pickup equipment is used for obtaining a first audio signal, wherein the first audio signal comprises a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal which is sent by a target equipment and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal which is sent by the target equipment and used for information interaction with the user, and the useful audio signal is an audio signal which is sent by the user and used for information interaction with the target equipment;

the audio separation model building module is used for obtaining an audio separation model used for separating the linear echo audio signal from the first audio signal;

the audio separation module is configured to eliminate the linear echo audio signal in the first audio signal by using the audio separation model, and obtain a second audio signal including the nonlinear echo audio signal, the useful audio signal, and the noise audio signal.

On the other hand, this application provides an on-vehicle intelligent voice interaction device, its characterized in that includes: pickup apparatus, linear echo cancellation apparatus, audio signal separation apparatus, target audio signal separation apparatus, and execution apparatus, wherein the linear echo cancellation apparatus includes: the audio separation model building module and the first audio separation module;

the audio separation module is configured to eliminate the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal including the nonlinear echo audio signal, a useful audio signal, and the noise audio signal;

the audio signal separation device is configured to obtain the second audio signal output by the linear echo cancellation module; carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal; outputting the useful audio signal;

the target audio signal separation equipment is used for carrying out voice separation on the useful audio signals to obtain a plurality of target audio signals;

the execution equipment is used for carrying out voice recognition on the target audio signals and executing corresponding instructions according to the voice recognition results of the target audio signals.

In another aspect of the present application, there is provided an audio signal processing system including: a client and a server;

the client is configured to obtain a first audio signal, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device; obtaining a plurality of target audio signals provided by the client, performing voice recognition on the plurality of target audio signals, and executing corresponding instructions according to voice recognition results of the plurality of target audio signals;

the server is used for obtaining an audio separation model used for separating the linear echo audio signal from the first audio signal; eliminating the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal comprising the nonlinear echo audio signal, a useful audio signal and the noise audio signal; carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal; carrying out voice separation on the useful audio signals to obtain a plurality of target audio signals; outputting the plurality of target audio signals to the client.

Compared with the prior art, the method has the following advantages:

after a first audio signal comprising a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal and a noise audio signal is obtained, an audio separation model for separating the linear echo audio signal from the first audio signal is further obtained, then the linear echo audio signal in the first audio signal is eliminated by using the audio separation model, and a second audio signal comprising the nonlinear echo audio signal, the useful audio signal and the noise audio signal is obtained. The audio signal processing method provided by the application can eliminate the linear echo audio signal in the audio signal collected by the equipment by using the audio separation model, and obtain the audio signal comprising the nonlinear echo audio signal, the useful audio signal and the noise audio signal, so that the elimination efficiency of the linear echo audio signal in the audio signal collected by the equipment is improved.

Drawings

Fig. 1A is a schematic diagram of a first application scenario provided in the present application.

Fig. 1B is a schematic diagram of a second application scenario embodiment provided in the present application.

Fig. 2 is a flowchart of an audio signal processing method provided in the first embodiment.

Fig. 3 is a flowchart of a second weighted correlation matrix obtaining method provided in the first embodiment of the present application.

Fig. 4 is a flowchart of a first audio signal matrix obtaining method provided in the first embodiment of the present application.

Fig. 5 is a flowchart of a second analog audio signal matrix obtaining method provided in the first embodiment of the present application.

Fig. 6 is a schematic diagram of an audio signal processing apparatus according to a second embodiment of the present application.

Fig. 7 is a schematic diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

In order to show the present application more clearly, an application scenario of the audio signal processing method provided in the embodiments of the present application is introduced.

Some embodiments provided by the present application may be applied to a scenario of canceling a linear echo audio signal in an audio signal acquired by a device separately, as shown in fig. 1A, which is a schematic diagram of a first application scenario embodiment provided by the present application.

The basic idea of human-computer voice interaction is as follows: when a user gives an instruction to the intelligent equipment through voice, the intelligent equipment converts the voice instruction into a character instruction through a voice recognition technology, and then utilizes a semantic understanding technology to understand the intention of the character instruction so as to make corresponding feedback. In the first application scenario embodiment provided by the application, the smart device may be a smart phone, a smart speaker, a smart robot, or the like, and the following smart device is a smart speaker as an example, which is described in detail in the first application scenario embodiment of the application.

When the smart sound box 101 works, the sound pickup device 101-1 on the smart sound box 101 is always in a working state of collecting surrounding audio signals, so as to obtain an instruction issued by a target user through voice at any time, and in an actual human-computer voice interaction scene, the audio signals collected by the sound pickup device 101-1 often include not only an instruction audio signal 102 sent by the target user, but also a linear echo audio signal 104, a nonlinear echo audio signal 105, audio information 103 of a non-target user, a noise audio signal 106, and the like. In the first scenario embodiment of the present application, all audio signals collected by the sound pickup apparatus 101-1 are referred to as first audio signals. Non-linear echo audio signal 105 includes, but is not limited to, sounds played by speaker 101-1 of smart sound box 101, such as echoes of songs played by speaker 101-1, echoes of alert tones, and the like. In order to ensure that the audio processing system 101-3 of the smart sound box 101 can better recognize the command audio signal 102 sent by the target user, the audio processing system 101-3 first obtains all the audio signals collected by the sound pickup device 101-1 through the linear echo cancellation module 101-3-1 in the audio processing system 101-3, and further obtains an audio separation model for separating the linear echo audio signal 104 from all the audio signals collected by the sound pickup device 101-1 after obtaining all the audio signals collected by the sound pickup device 101-1, and then eliminates the linear echo audio signal 104 in all the audio signals collected by the sound pickup device 101-1 by using the audio separation model to obtain the command audio signal 102, the non-linear echo audio signal 105, the linear echo audio signal 102, the linear echo audio signal 105, the linear echo audio signal 104, and the linear echo audio signal 104 including the command signal sent by the, Audio information 103 of non-target users and a noise audio signal 106. In the first scenario embodiment of the present application, an audio signal including an instruction audio signal 102 issued by a target user, a non-linear echo audio signal 105, audio information 103 of a non-target user, and a noise audio signal 106 is referred to as a second audio signal. After obtaining the second audio signal, the linear echo cancellation module 101-3-1 outputs the second audio signal, so that the smart sound box 101 can further process the second audio signal, thereby obtaining the instruction audio signal 102 sent by the target user.

Some embodiments provided by the present application may also be applied to a scenario in which a linear echo audio signal in an audio signal acquired by a device and a voice separation is performed on the audio signal after the linear echo audio signal is removed, as shown in fig. 1B, which is a schematic diagram of a second application scenario embodiment provided by the present application.

The audio processing system 101-3 performs linear echo cancellation on a first audio signal collected by the sound pickup apparatus 101-1, the first audio signal including an instruction audio signal 102 issued by a target user, a linear echo audio signal 104, a non-linear echo audio signal 105, audio information 103 of a non-target user, and an audio signal 106 of a noise audio signal 106, and obtains and outputs a second audio signal including the instruction audio signal 102 issued by the target user, the non-linear echo audio signal 105, the audio information 103 of the non-target user, and the noise audio signal 106 through the linear echo cancellation module 101-3-1. An audio signal separation module 101-3-2 in the audio processing system 101-3 obtains a second audio signal output by the linear echo cancellation module 101-3-1, performs noise reduction processing and audio signal separation processing on the second audio signal to obtain an instruction audio signal 102 sent by a target user, and outputs the instruction audio signal 102 sent by the target user, so that the intelligent sound box obtains the instruction audio signal 102 sent by the target user, converts the instruction audio signal 102 sent by the target user into a text instruction through a voice recognition technology, understands the intention of the text instruction by using a semantic understanding technology, and further makes corresponding feedback.

It should be noted that the two application scenarios described above are only two embodiments of the application scenarios of the audio signal processing method provided in the present application, and the two application scenario embodiments are provided to facilitate understanding of the audio signal processing method provided in the present application, and are not used to limit the audio signal processing method provided in the present application. The audio signal processing method provided by the application can also be applied to other scenes, and is not repeated here.

First embodiment

A first embodiment of the present application provides an audio signal processing method, which is described below with reference to fig. 2 to 5.

Step S201, a first audio signal is obtained, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal.

The linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device.

In step S202, an audio separation model for separating the linear echo audio signal from the first audio signal is obtained.

Obtaining an audio separation model for separating a linear echo audio signal from a first audio signal, comprising: obtaining filter parameters and an identity matrix in target equipment; and generating a separation matrix according to the filter parameters and the identity matrix, and taking the separation matrix as an audio separation model.

Wherein obtaining filter parameters and an identity matrix in the target device comprises: obtaining a first audio separation matrix, wherein the first audio separation matrix is an identity matrix corresponding to the separation matrix; multiplying the first audio separation matrix by the first audio signal matrix to obtain a first candidate audio signal matrix; determining whether a modulus of the first candidate audio signal matrix is greater than a modulus of the first audio signal matrix; if so, taking the elements in the first audio separation matrix as filter parameters in the target device.

If the modulus of the first candidate audio signal matrix is not larger than the modulus of the first audio signal matrix, obtaining a second audio separation matrix according to the first audio separation matrix; multiplying the second audio separation matrix by the first audio signal matrix to obtain a second candidate audio signal matrix; and judging whether the modulus of the second candidate audio signal matrix is larger than the modulus of the first audio signal matrix, if so, taking the elements in the second audio separation matrix as filter parameters in the target equipment, and if not, repeatedly executing the steps until the modulus of the candidate audio signal matrix is larger than the modulus of the first audio signal matrix.

Specifically, obtaining the second audio separation matrix according to the first audio separation matrix includes: obtaining a preset first weighted correlation matrix; obtaining a second weighted correlation matrix according to a preset first weighted correlation matrix, a first candidate audio signal matrix and a first audio signal matrix; and obtaining a second audio separation matrix according to the second weighted correlation matrix and the first audio separation matrix.

Specifically, the specific process of obtaining the second weighted correlation matrix according to the preset first weighted correlation matrix, the first candidate audio signal matrix and the first audio signal matrix is as follows:

please refer to fig. 3, which is a flowchart illustrating a second weighted correlation matrix obtaining method according to a first embodiment of the present application.

Step S301, obtaining a preset forgetting factor and a reference audio signal.

Step S302, a first candidate audio signal matrix is subjected to nonlinear transformation.

Step S303, a second weighted correlation matrix is obtained according to a preset forgetting factor, the reference audio signal, and the first candidate audio signal matrix after the nonlinear transformation.

In the first embodiment of the present application, the nonlinear transformation in the process of obtaining the second weighted correlation matrix is as follows: recording the first candidate audio signal matrix as z, recording the first candidate audio signal matrix after nonlinear transformation as phi (z), and performing pair

Obtaining a second audio separation matrix according to the second weighted correlation matrix and the first audio separation matrix, further comprising: obtaining a first filter parameter matrix according to the second weighted correlation matrix; elements in the first audio separation matrix are replaced with elements in the first filter parameter matrix to obtain a second audio separation matrix.

In a first implementation of the present application, the filter parameters are expressed as: b₁、b₂...b_RIn the first embodiment of the present application, the separation matrix B is exemplified by a matrix having 3 three rows and three columns, and the formula of the separation matrix B is shown in (1):

step S203, using the audio separation model to eliminate the linear echo audio signal in the first audio signal, and obtain a second audio signal including the nonlinear echo audio signal, the useful audio signal, and the noise audio signal.

Eliminating a linear echo audio signal in a first audio signal by using an audio separation model to obtain a second audio signal comprising a nonlinear echo audio signal, a useful audio signal and a noise audio signal, comprising: the separation matrix is multiplied with the first audio signal matrix to generate a second audio signal matrix. Wherein, the first audio signal matrix is represented by x, and the second audio signal matrix is represented by y, in the first embodiment of the present application, the formula corresponding to the second audio signal matrix generated by multiplying the separation matrix by the first audio signal matrix is shown as (2):

y＝Bx....(2)

in the first embodiment of the present application, when the audio separation model is used to obtain the second audio signal, the separation matrix needs to be multiplied by the first audio signal matrix to generate the second audio signal matrix, and then the second audio signal is obtained according to the second audio signal matrix. And the second audio signal is an audio signal obtained after eliminating a linear echo audio signal in the first audio signal. The process of obtaining the first audio signal matrix in the first embodiment of the present application is as follows:

please refer to fig. 4, which is a flowchart illustrating a first audio signal matrix obtaining method according to a first embodiment of the present application.

Step S401, obtaining information of an echo path through which an audio signal sent by a target device passes, and obtaining an identity matrix.

Information of echo path that audio signal passes through in the first embodiment of the present application is represented by a₁、a₂...a_RTo indicate that the number of filter parameters is consistent with the number of echo path information.

Step S402, generating a mixed matrix according to the information of the echo path and the identity matrix.

In the first embodiment of the present application, the mixing matrix is represented by a, the mixing matrix a has the same number of rows and columns as the separation matrix B, and the formula of the mixing matrix a is shown in (3):

in step S403, a second analog audio signal matrix is obtained.

The second matrix of analog audio signals is a matrix generated for analog audio signals of the second audio signal. In the first embodiment of the present application, when an audio signal is modeled, a speech signal is modeled not as a gaussian signal but as a non-gaussian signal, and since a general speech signal can be modeled as a non-gaussian signal, in the first embodiment of the present application, an analog audio signal of the audio signal can be obtained even when the audio signal includes a nonlinear echo audio signal, a useful audio signal, and a noise audio signal.

Wherein the process of obtaining the second analog audio signal matrix is as follows:

please refer to fig. 5, which is a flowchart illustrating a second method for obtaining an analog audio signal matrix according to a first embodiment of the present application.

In step S501, a reference audio signal is obtained.

The reference audio signal is a preset analog audio signal aiming at an audio signal sent by the target equipment and used for information interaction with a user. Reference audio signal r in the first embodiment of the present application₁、r₂...r_RIt means that the number of reference audio signals is the same as the number of filter parameters.

Step S502, a second analog audio signal generated for the second audio signal is obtained.

The second analog audio signal is denoted by s in the first embodiment of the present application.

In step S503, a second analog audio signal matrix is obtained according to the second analog audio signal and the reference audio signal.

The second analog audio signal is denoted by s in the first embodiment of the present application. The formula of the second analog audio signal matrix s is shown in (4):

after the second analog audio signal matrix is obtained through step S403, step S404 may be further performed to obtain the first audio signal matrix.

Step S404, multiply the mixing matrix and the second analog audio signal matrix to generate a first audio signal matrix.

In the first embodiment of the present application, the formula for generating the first audio signal matrix by multiplying the mixing matrix and the second analog audio signal matrix is shown in (5):

x＝As....(5)

the audio signal processing method provided in the first embodiment of the present application further includes: and outputting the second audio signal, and performing noise reduction processing and audio signal separation processing on the second audio signal to obtain a useful audio signal.

Further, the audio signal processing method and the audio signal processing method provided in the first embodiment of the present application can eliminate a linear echo audio signal in an audio signal collected by a device by using an audio separation model, obtain an audio signal including a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, do not need to determine the content of the audio signal collected by the device, and adjust an iteration step according to the content of the audio signal collected by the device to adjust an update speed of an adaptive filter, so as to eliminate a linear echo in an audio signal collected by a device with different content.

Second embodiment

Corresponding to an audio signal processing method provided in the first embodiment of the present application, a second embodiment of the present application provides an audio signal processing apparatus. Since the apparatus embodiment is substantially similar to the method first embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

Fig. 6 is a schematic diagram of an audio signal processing apparatus provided in a second embodiment of the present application.

The audio signal processing apparatus includes:

a first audio signal obtaining unit 601, configured to obtain a first audio signal, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device;

an audio separation model obtaining unit 602, configured to obtain an audio separation model for separating the linear echo audio signal from the first audio signal;

a second audio signal obtaining unit 603, configured to eliminate the linear echo audio signal in the first audio signal by using the audio separation model, and obtain a second audio signal including the nonlinear echo audio signal, the useful audio signal, and the noise audio signal.

Optionally, the audio separation model obtaining unit 602 is specifically configured to obtain a filter parameter and an identity matrix in the target device; and generating a separation matrix according to the filter parameters and the identity matrix, and taking the separation matrix as the audio separation model.

Optionally, the second audio signal obtaining unit 603 is specifically configured to multiply the separation matrix with the first audio signal matrix to generate a second audio signal matrix.

Optionally, the method further includes:

Optionally, the obtaining a second analog audio signal matrix includes:

obtaining a second analog audio signal generated for the second audio signal;

Optionally, the method further includes:

obtaining a preset first weighted correlation matrix;

obtaining a preset forgetting factor and a reference audio signal;

replacing elements in the first audio separation matrix with elements in the first filter parameter matrix to obtain the second audio separation matrix. Optionally, the audio signal processing apparatus further includes: a second audio signal output unit for outputting the second audio signal.

Optionally, the audio signal processing apparatus further includes: and the second audio signal separation unit is used for carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal.

Third embodiment

Corresponding to the audio signal processing method provided in the first embodiment of the present application, a third embodiment of the present application provides an electronic device.

As shown in fig. 7, fig. 7 is a schematic view of an electronic device according to an embodiment of the present application. The electronic device includes:

a processor 701; and

a memory 702 for storing a program of an audio signal processing method, which executes the following steps after the apparatus is powered on and the program of the audio signal processing method is executed by the processor:

It should be noted that, for the detailed description of the electronic device provided in the third embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.

Fourth embodiment

In correspondence with the image processing method provided in the first embodiment of the present application, a fourth embodiment of the present application provides a storage device storing a program of an audio signal processing method, the program being executed by a processor to perform the steps of:

It should be noted that, for the detailed description of the storage medium provided in the fourth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.

Fifth embodiment

In the first embodiment described above, an audio signal processing method is provided, and correspondingly, a fifth embodiment of the present application provides an audio signal processing system. Since the audio signal processing system in the fifth embodiment is basically similar to the first embodiment of the method, the description is simple, and for the relevant points, refer to the partial description of the method embodiment. The system embodiments described below are merely illustrative.

Referring to fig. 1B again, the audio signal processing system includes: a linear echo cancellation module 101-3-1, an audio signal separation module 101-3-2;

the linear echo cancellation module 101-3-1 is configured to obtain a first audio signal, where the first audio signal includes a linear echo audio signal, a nonlinear echo audio signal, a useful audio signal, and a noise audio signal, the linear echo audio signal is a linear echo audio signal of an audio signal sent by a target device and used for information interaction with a user, the nonlinear echo audio signal is a nonlinear echo audio signal of an audio signal sent by the target device and used for information interaction with the user, and the useful audio signal is an audio signal sent by the user and used for information interaction with the target device; obtaining an audio separation model for separating the linear echo audio signal from the first audio signal; eliminating the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal comprising the nonlinear echo audio signal, a useful audio signal and the noise audio signal; outputting the second audio signal;

the audio signal separation module 101-3-2 is configured to obtain the second audio signal output by the linear echo cancellation module 101-3-1; carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal; and outputting the useful audio signal.

The audio signal processing system according to the fifth embodiment of the present application further includes: and the target audio signal separation module is used for carrying out voice separation on the useful audio signals to obtain a plurality of target audio signals. Specifically, when the audio signal processing system in the fifth embodiment of the present application is applied to subway ticket machines, high-speed rail ticket machines, and express container groups, the environment where these devices are located is often noisy, and there may be a situation where multiple persons operate multiple same devices at the same time. Taking one of the devices loaded with the audio signal processing system in the fifth embodiment of the present application as an example, in order to enable the device a interacting with the target user a to recognize the audio signal of the user a according to the current execution step and execute the corresponding instruction according to the voice recognition result of the target user a, at this time, the audio signal processing system on the device a is required to perform voice separation on the useful audio signal to obtain a plurality of target audio signals, and the device a can obtain the audio signal of the target user a according to the current execution step. Specifically, when the device a is a subway ticket vending machine or a high-speed railway ticket vending machine, the route query instruction, the remaining ticket query instruction, the ticket drawing instruction and the like can be executed according to the voice recognition result of the target user a; when the equipment A is an express cabinet or a meal cabinet, the equipment A can execute a password acquisition instruction, a cabinet door opening instruction, a cabinet door closing instruction and the like according to a voice recognition result of the target user A.

The audio signal processing system in the fifth embodiment of the present application may also be applied to other audio signal processing scenarios, and details are not repeated here.

Sixth embodiment

In the foregoing first embodiment, an audio signal processing method is provided, and correspondingly, a sixth embodiment of the present application provides a smart television.

The smart television in the sixth embodiment of the present application includes: pickup apparatus and linear echo cancellation apparatus, wherein the linear echo cancellation apparatus includes: the audio separation module comprises an audio separation model building module and an audio separation module;

Seventh embodiment

In the foregoing first embodiment, an audio signal processing method is provided, and correspondingly, a seventh embodiment of the present application provides an in-vehicle intelligent voice interaction device.

The on-vehicle intelligent voice interaction device in the seventh embodiment of this application includes: pickup apparatus, linear echo cancellation apparatus, audio signal separation apparatus, target audio signal separation apparatus, and execution apparatus, wherein the linear echo cancellation apparatus includes: the audio separation model building module and the first audio separation module;

Eighth embodiment

In the first embodiment described above, an audio signal processing method is provided, and correspondingly, an eighth embodiment of the present application provides another audio signal processing system.

An audio signal processing system, a client and a server in an eighth embodiment of the present application;

Although the present invention has been described with reference to the preferred embodiments, it should be understood that the scope of the present invention is not limited to the embodiments described above, and that various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the present invention.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or Flash memory (Flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage media, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable Media does not include non-Transitory computer readable Media (transient Media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. An audio signal processing method, comprising:

2. The audio signal processing method of claim 1, wherein the obtaining an audio separation model for separating the linear echo audio signal from the first audio signal comprises:

obtaining filter parameters and an identity matrix in the target device;

3. The audio signal processing method according to claim 2, wherein said eliminating the linear echo audio signal in the first audio signal by using the audio separation model to obtain a second audio signal including the nonlinear echo audio signal, a useful audio signal and the noise audio signal comprises: and multiplying the separation matrix with the first audio signal matrix to generate a second audio signal matrix.

4. The audio signal processing method according to claim 3, further comprising:

5. The audio signal processing method of claim 4, wherein the obtaining a second matrix of analog audio signals comprises:

obtaining a second analog audio signal generated for the second audio signal;

6. The audio signal processing method of claim 2, wherein the obtaining filter parameters and an identity matrix in the target device comprises:

7. The audio signal processing method according to claim 6, further comprising:

8. The audio signal processing method of claim 7, wherein the obtaining a second audio separation matrix from the first audio separation matrix comprises:

obtaining a preset first weighted correlation matrix;

9. The audio signal processing method according to claim 7, wherein obtaining a second weighted correlation matrix according to the preset first weighted correlation matrix, the first candidate audio signal matrix and the first audio signal matrix comprises:

obtaining a preset forgetting factor and a reference audio signal;

10. The audio signal processing method of claim 8, wherein obtaining a second audio separation matrix based on the second weighted correlation matrix and the first audio separation matrix comprises:

replacing elements in the first audio separation matrix with elements in the first filter parameter matrix to obtain the second audio separation matrix.

11. The audio signal processing method according to claim 1, further comprising: outputting the second audio signal.

12. The audio signal processing method according to claim 1, further comprising: and carrying out noise reduction processing and audio signal separation processing on the second audio signal to obtain the useful audio signal.

13. An audio signal processing apparatus, comprising:

14. An electronic device, comprising:

a processor;

15. A storage device, characterized in that,

a program storing an audio signal processing method, the program being executed by a processor to perform the steps of:

16. An audio signal processing system, comprising: the linear echo cancellation module and the audio signal separation module;

17. The audio signal processing system of claim 16, further comprising: and the target audio signal separation module is used for carrying out voice separation on the useful audio signals to obtain a plurality of target audio signals.

18. An intelligent television, comprising: pickup apparatus and linear echo cancellation apparatus, wherein the linear echo cancellation apparatus includes: the audio separation module comprises an audio separation model building module and an audio separation module;

19. The utility model provides an on-vehicle intelligent voice interaction device which characterized in that includes: pickup apparatus, linear echo cancellation apparatus, audio signal separation apparatus, target audio signal separation apparatus, and execution apparatus, wherein the linear echo cancellation apparatus includes: the audio separation model building module and the first audio separation module;

20. An audio signal processing system, comprising: a client and a server;