CN107886960B

CN107886960B - Audio signal reconstruction method and device

Info

Publication number: CN107886960B
Application number: CN201610877571.2A
Authority: CN
Inventors: 蒋三新; 应忍冬; 文飞; 贾晓立; 刘佩林; 肖玮; 金文宇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2020-12-01
Anticipated expiration: 2036-09-30
Also published as: CN107886960A; WO2018059409A1

Abstract

The embodiment of the invention discloses an audio signal reconstruction method and an audio signal reconstruction device, wherein the method comprises the following steps: acquiring compressed data corresponding to at least two audio signals; inversely quantizing the compressed data corresponding to the at least two audio signals to obtain measured data corresponding to the at least two audio signals; acquiring a measurement matrix, and jointly reconstructing sparse transform coefficients corresponding to at least two audio signals according to measurement data corresponding to at least two audio signals and the measurement matrix; and carrying out sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals. By adopting the embodiment of the invention, the quality of the audio signal can be improved.

Description

Audio signal reconstruction method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for reconstructing an audio signal.

Background

With the development of communication technology, the demand of high-quality audio-visual experience is becoming stronger, and more scenes requiring high-quality sound field information reconstruction are required, such as: teleconferencing, movies, and large network games. In recent years, Compressed Sensing (CS) theory fully considers sparseness of signals, compresses and reconstructs signals by using structural features of the signals, and is a research hotspot in the field of signal processing. CS theory states that since the basic goal of signal compression is to remove the redundant components contained therein, it is possible to directly obtain a compressed representation (i.e., compressed data) of a signal that is redundant in itself, omitting sampling of a large number of unwanted signals. Therefore, when the CS theory acts on the audio signal, the combination of the sampling and the compression process of the audio signal can be realized, so that the whole compression process is greatly simplified, the bottleneck of the Shannon sampling theorem is broken through in a certain sense, and the acquisition of the audio signal with high resolution becomes possible.

The conventional audio signal reconstruction method specifically includes: acquiring compressed data corresponding to the audio signal; carrying out inverse quantization on compressed data corresponding to the audio signal to obtain measurement data corresponding to the audio signal; and reconstructing the audio signal according to the corresponding measurement data and the measurement matrix of the audio signal. The measurement data corresponding to the audio signal obtained by inverse quantization is an underdetermined equation set, and because the underdetermined equation set has at least two groups of solutions, and the reconstructed audio signal is any one of the at least two groups of solutions, the similarity between the reconstructed audio signal and the acquired original audio signal is low, which results in poor quality of the reconstructed audio signal.

Disclosure of Invention

The application provides an audio signal reconstruction method and an audio signal reconstruction device, which can improve the quality of an audio signal.

A first aspect provides a method of audio signal reconstruction, the method comprising: acquiring compressed data corresponding to at least two audio signals; inversely quantizing the compressed data corresponding to the at least two audio signals to obtain measured data corresponding to the at least two audio signals; acquiring a measurement matrix, and jointly reconstructing sparse transform coefficients corresponding to at least two audio signals according to measurement data corresponding to at least two audio signals and the measurement matrix; and carrying out sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals.

In a specific implementation, after the terminal acquires the compressed data corresponding to the at least two audio signals, the terminal may perform inverse quantization on the compressed data corresponding to the at least two audio signals, so as to obtain the measurement data corresponding to the at least two audio signals, the terminal may further acquire a measurement matrix, jointly reconstruct sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals, and perform sparse inverse transform on the sparse transform coefficients corresponding to the at least two audio signals, so as to obtain the at least two audio signals, which may improve the quality of the at least two audio signals.

In the foregoing technical solution, optionally, the at least two audio signals may include a first audio signal and a second audio signal, and the terminal jointly reconstructs sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals, which may specifically be:

and calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal and the measurement matrix.

In the above technical solution, optionally, the first audio signal may correspond to a first channel, the second audio signal may correspond to a second channel, and the first audio signal and the second audio signal are audio signals acquired at the same time interval, and then the terminal calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and specifically may be:

according to a first amplitude in the sparse transform coefficient corresponding to the first audio signal, determining a second amplitude in the prior sparse transform coefficient corresponding to the second audio signal, taking the second amplitude as the prior of the amplitude in the sparse transform coefficient corresponding to the second audio signal, and according to the measurement data and the measurement matrix corresponding to the second audio signal, calculating the amplitude in the sparse transform coefficient corresponding to the second audio signal, wherein the ratio of the first amplitude to the second amplitude is the ratio of the logarithm of the distance from the microphone corresponding to the first audio signal to the sound source and the logarithm of the distance from the microphone corresponding to the second audio signal to the sound source.

according to a first phase in the sparse transform coefficient corresponding to the first audio signal, determining a second phase in the prior sparse transform coefficient corresponding to the second audio signal, taking the second phase as the prior of the phase in the sparse transform coefficient corresponding to the second audio signal, and according to the measurement data and the measurement matrix corresponding to the second audio signal, calculating the phase in the sparse transform coefficient corresponding to the second audio signal, wherein the ratio of the first phase to the second phase is the ratio of the distance from the microphone corresponding to the first audio signal to the sound source to the distance from the microphone corresponding to the second audio signal to the sound source.

and determining a first frequency in the sparse transform coefficient corresponding to the first audio signal as a second frequency in the prior sparse transform coefficient corresponding to the second audio signal, taking the second frequency as the prior of the frequency in the sparse transform coefficient corresponding to the second audio signal, and calculating the frequency in the sparse transform coefficient corresponding to the second audio signal according to the measurement data and the measurement matrix corresponding to the second audio signal.

In the above technical solution, optionally, the first audio signal and the second audio signal may correspond to the same channel, and the first audio signal and the second audio signal are audio signals acquired at different time intervals, and then the terminal calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and specifically may be:

according to the first amplitude in the sparse transform coefficient corresponding to the first audio signal, determining a second amplitude in the prior sparse transform coefficient corresponding to the second audio signal, taking the second amplitude as the prior of the amplitude in the sparse transform coefficient corresponding to the second audio signal, and according to the measurement data and the measurement matrix corresponding to the second audio signal, calculating the amplitude in the sparse transform coefficient corresponding to the second audio signal, wherein the amplitudes in the coefficient transform coefficients corresponding to the audio signals of different time periods corresponding to the same channel are in a linear relation with the sequence numbers of frames corresponding to the audio signals of different time periods.

and determining a first phase in the sparse transform coefficient corresponding to the first audio signal as a second phase in the prior sparse transform coefficient corresponding to the second audio signal, taking the second phase as the prior of the phase in the sparse transform coefficient corresponding to the second audio signal, and calculating the phase in the sparse transform coefficient corresponding to the second audio signal according to the measurement data and the measurement matrix corresponding to the second audio signal.

according to a first frequency in the sparse transform coefficient corresponding to the first audio signal, determining a second frequency in the prior sparse transform coefficient corresponding to the second audio signal, taking the second frequency as the prior of the frequency in the sparse transform coefficient corresponding to the second audio signal, and according to the measurement data and the measurement matrix corresponding to the second audio signal, calculating the frequency in the sparse transform coefficient corresponding to the second audio signal, wherein the first frequency and the second frequency have an intersection, and the frequencies in the intersection are obtained by randomly selecting the frequencies in the first frequency.

In the above technical solution, optionally, the first audio signal and the second audio signal are audio signals acquired in adjacent time periods.

In the foregoing technical solution, optionally, the at least two audio signals may include a third audio signal, and the terminal jointly reconstructs sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals, which may specifically be:

and calculating the sparse transform coefficient corresponding to the third audio signal according to the preset initial sparse transform coefficient, the measurement data corresponding to the third audio signal and the measurement matrix.

A second aspect provides a computer storage medium storing a program, which when executed includes all or part of the steps of the audio signal reconstruction method provided in the first aspect of the embodiments of the present application.

A third aspect provides an audio signal reconstruction apparatus that may comprise a compressed data acquisition module, an inverse quantization module, a joint reconstruction module, and a sparse inverse transform module, which may be used to implement some or all of the steps in conjunction with the first aspect.

A fourth aspect provides a terminal comprising a processor and a memory, the processor being operable to carry out some or all of the steps in connection with the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1A is a schematic interface diagram of a microphone array provided in an embodiment of the invention;

fig. 1B is a schematic diagram of an interface for transferring parameter information according to an embodiment of the present invention;

fig. 1C is a schematic interface diagram of a reconstructed audio signal provided in an embodiment of the present invention;

FIG. 1D is a schematic diagram of an interface for sampling an audio signal provided in an embodiment of the invention;

fig. 1E is a schematic flowchart of reconstructing an audio signal according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an audio signal reconstruction method provided in an embodiment of the present invention;

fig. 3 is a schematic flow chart of an audio signal reconstruction method according to another embodiment of the present invention;

fig. 4 is a schematic flow chart of an audio signal compression method provided in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an audio signal reconstruction apparatus provided in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention.

The embodiment of the invention provides an audio signal reconstruction method, wherein a terminal can acquire compressed data corresponding to at least two audio signals, perform inverse quantization on the compressed data corresponding to the at least two audio signals to obtain measured data corresponding to the at least two audio signals, acquire a measurement matrix, jointly reconstruct sparse transform coefficients corresponding to the at least two audio signals according to the measured data corresponding to the at least two audio signals and the measurement matrix, and perform sparse inverse transform on the sparse transform coefficients corresponding to the at least two audio signals to obtain the at least two audio signals, so that the quality of the audio signals can be improved.

In specific implementation, the terminal can collect audio signals for a sound source in parallel through an array composed of a plurality of microphones, the audio signals collected through any one microphone can be used as audio signals of one channel, and the audio signals of different channels are collected through different microphones. Taking the schematic interface diagram of the microphone array shown in fig. 1A as an example, each microphone (e.g., Mic _1, Mic _2, Mic _3 … Mic _ N) may be arranged on the right side of the sound source, the array composed of multiple microphones may be linear, and the distances between the microphones are the same, the audio signals collected by the microphones may be represented as x (t) ═ ccos (2 π f + θ), where the audio signals collected by the microphones have spatial correlation, and the audio signals collected by any microphone have temporal and frequency correlations, so that the terminal may sample the audio signals of each channel, and perform sparse transformation (by FFT or MDCT, etc.) on the audio signals of each channel, And (3) perception measurement (the sparse transformation coefficient obtained by sparse transformation is multiplied by a random measurement matrix), and a perception measurement value obtained by perception measurement is added with a noise vector corresponding to the acquisition environment of the audio signal, so that compressed data of each channel is obtained.

When the audio signals need to be reconstructed, the terminal can obtain compressed data corresponding to at least two audio signals, perform inverse quantization on the compressed data corresponding to at least two audio signals, so as to obtain measurement data corresponding to at least two audio signals, obtain a measurement matrix, jointly reconstruct sparse transform coefficients corresponding to at least two audio signals according to the measurement data and the measurement matrix corresponding to at least two audio signals, and perform sparse inverse transform on the sparse transform coefficients corresponding to at least two audio signals, so as to obtain at least two audio signals.

It should be noted that, in the embodiment of the present invention, a terminal that performs compression sampling on an audio signal and a terminal that reconstructs the audio signal may be the same terminal, for example, after the terminal samples the audio signal of each channel, the audio signal of each channel is processed by sparse transform, perceptual measurement, and the like to obtain compressed data, when the terminal needs to reconstruct the audio signal, the compressed data corresponding to at least two audio signals may be dequantized to obtain measured data corresponding to at least two audio signals, according to the measured data and the measurement matrix corresponding to the at least two audio signals, sparse transform coefficients corresponding to the at least two audio signals are reconstructed in a combined manner, and sparse inverse transform is performed on the sparse transform coefficients corresponding to the at least two audio signals to obtain the at least two audio signals. Optionally, in the embodiment of the present invention, the terminal that performs compression sampling on the audio signal and the terminal that reconstructs the audio signal may be different terminals, for example, after the first terminal samples the audio signal of each channel, the audio signal of each channel is subjected to processing such as sparse transformation and perceptual measurement, so as to obtain compressed data, the first terminal may send the compressed data to the second terminal, when the second terminal needs to reconstruct the audio signal, the parameter information of the channel located in the frame may be predicted, and the audio signal of the channel located in the frame may be reconstructed according to the compressed sampling data of the channel located in the frame and the parameter information thereof.

Referring to fig. 2, fig. 2 is a schematic flow chart of an audio signal reconstruction method according to an embodiment of the present invention, where the audio signal reconstruction method according to the embodiment of the present invention at least includes:

s201, compressed data corresponding to at least two audio signals are obtained.

The terminal may obtain compressed data corresponding to at least two audio signals. In a specific implementation, the terminal may obtain compressed data corresponding to at least two audio signals stored in a memory of the terminal, for example, after the terminal acquires an audio signal, the terminal performs compression sampling on the acquired audio signal to obtain compressed data, and stores the compressed data in the memory; for another example, after the other terminals acquire the audio signals, the other terminals perform compression sampling on the acquired audio signals to obtain compressed data, and then the other terminals send the compressed data to the terminal, and the terminal stores the compressed data in the memory.

S202, performing inverse quantization on the compressed data corresponding to the at least two audio signals to obtain the measured data corresponding to the at least two audio signals.

S203, obtaining a measurement matrix, and jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals.

The terminal can obtain the measurement matrix and jointly reconstruct sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals. Wherein, the measurement matrix may be a random measurement matrix.

Optionally, the at least two audio signals may include a first audio signal and a second audio signal, and the terminal may calculate a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix. In a specific implementation, the terminal may use the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix as inputs of a preset reconstruction algorithm (e.g., AMP or GAMP), and the obtained output of the preset reconstruction algorithm may be the sparse transform coefficient corresponding to the second audio signal.

Optionally, the audio signal has spatial correlation, and when the first audio signal corresponds to the first channel, the second audio signal corresponds to the second channel, and the first audio signal and the second audio signal are audio signals acquired at the same time interval, the terminal may determine a second amplitude in the prior sparse transform coefficient corresponding to the second audio signal according to a first amplitude in the sparse transform coefficient corresponding to the first audio signal, use the second amplitude as a prior of an amplitude in the sparse transform coefficient corresponding to the second audio signal, and calculate an amplitude in the sparse transform coefficient corresponding to the second audio signal according to measurement data and a measurement matrix corresponding to the second audio signal. Wherein the ratio of the first amplitude to the second amplitude is the ratio of the logarithm of the distance of the microphone corresponding to the first audio signal from the sound source to the logarithm of the distance of the microphone corresponding to the second audio signal from the sound source.

Optionally, the audio signal has spatial correlation, when the first audio signal corresponds to the first channel, the second audio signal corresponds to the second channel, and the first audio signal and the second audio signal are audio signals acquired at the same time period, the terminal may determine a second phase in a prior sparse transform coefficient corresponding to the second audio signal according to a first phase in a sparse transform coefficient corresponding to the first audio signal, use the second phase as a prior of a phase in a sparse transform coefficient corresponding to the second audio signal, and calculate a phase in a sparse transform coefficient corresponding to the second audio signal according to measurement data and a measurement matrix corresponding to the second audio signal. The ratio of the first phase to the second phase is the ratio of the distance from the microphone corresponding to the first audio signal to the sound source to the distance from the microphone corresponding to the second audio signal to the sound source.

Optionally, the audio signal has spatial correlation, when the first audio signal corresponds to the first channel, the second audio signal corresponds to the second channel, and the first audio signal and the second audio signal are audio signals acquired at the same time period, the terminal may determine a first frequency in a sparse transform coefficient corresponding to the first audio signal as a second frequency in a prior sparse transform coefficient corresponding to the second audio signal, use the second frequency as a prior of frequencies in a sparse transform coefficient corresponding to the second audio signal, and calculate a frequency in a sparse transform coefficient corresponding to the second audio signal according to measurement data and a measurement matrix corresponding to the second audio signal.

Optionally, the audio signals have a correlation in a time domain, when the first audio signal and the second audio signal correspond to the same channel and are acquired at different time intervals, the terminal may determine, according to a first amplitude in a sparse transform coefficient corresponding to the first audio signal, a second amplitude in a prior sparse transform coefficient corresponding to the second audio signal, use the second amplitude as a prior of an amplitude in a sparse transform coefficient corresponding to the second audio signal, and calculate, according to measurement data and a measurement matrix corresponding to the second audio signal, an amplitude in a sparse transform coefficient corresponding to the second audio signal. The amplitude of the coefficient transformation coefficient corresponding to the audio signals of different time periods in the same channel is in a linear relation with the sequence number of the frame corresponding to the audio signals of different time periods.

Optionally, the audio signal has a correlation in a time domain, when the first audio signal and the second audio signal correspond to the same channel and are acquired at different time intervals, the terminal may determine a first phase in a sparse transform coefficient corresponding to the first audio signal as a second phase in a prior sparse transform coefficient corresponding to the second audio signal, use the second phase as a prior of a phase in a sparse transform coefficient corresponding to the second audio signal, and calculate a phase in a sparse transform coefficient corresponding to the second audio signal according to measurement data and a measurement matrix corresponding to the second audio signal.

Optionally, the audio signals have a correlation in a time domain, when the first audio signal and the second audio signal correspond to the same channel and are acquired at different time intervals, the terminal may determine, according to a first frequency in a sparse transform coefficient corresponding to the first audio signal, a second frequency in a prior sparse transform coefficient corresponding to the second audio signal, use the second frequency as a prior of a frequency in a sparse transform coefficient corresponding to the second audio signal, and calculate, according to measurement data and a measurement matrix corresponding to the second audio signal, a frequency in a sparse transform coefficient corresponding to the second audio signal. And the first frequency and the second frequency have an intersection, and the frequencies in the intersection are obtained by randomly selecting the frequencies in the first frequency.

Alternatively, when the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal may be audio signals acquired in adjacent time periods. For example, the first audio signal may be an audio signal acquired during a first period of time, the second audio signal may be an audio signal acquired during a second period of time, and the first period of time may be earlier than the second period of time.

Optionally, the at least two audio signals may include a third audio signal, and the terminal may calculate a sparse transform coefficient corresponding to the third audio signal according to a preset initial sparse transform coefficient, measurement data corresponding to the third audio signal, and the measurement matrix.

Taking the schematic flow chart of reconstructing an audio signal shown in fig. 1E as an example, the terminal may obtain compressed sample data of different channels located in different frames, perform inverse quantization on the audio signals of the different channels located in the different frames, to obtain measurement data corresponding to the audio signals of the different channels located in the different frames, when the specified channel is the first channel (for example, i ═ 1) and the specified frame is the start frame, the terminal may initialize to obtain an initial sparse transform coefficient, and when the transmission direction of the sparse transform coefficient is the forward transmission direction (for example, FB ═ 1), the terminal may calculate the sparse transform coefficient corresponding to the audio signal of the first channel located in the start frame according to the initialized sparse transform coefficient, the measurement data corresponding to the audio signal of the first channel located in the start frame, and the measurement matrix.

The terminal may further determine a priori sparse transform coefficient corresponding to the audio signal of the second channel (e.g., i + +) located in the start frame according to the sparse transform coefficient corresponding to the audio signal of the first channel located in the start frame, use the priori sparse transform coefficient corresponding to the audio signal of the second channel located in the start frame as the priori sparse transform coefficient corresponding to the audio signal of the second channel located in the start frame, and calculate a sparse transform coefficient corresponding to the audio signal of the second channel located in the start frame according to the measurement data and the measurement matrix corresponding to the audio signal of the second channel located in the start frame until obtaining a sparse transform coefficient corresponding to the audio signal of the last channel (e.g., i ═ k) located in the start frame.

After the terminal obtains the sparse transform coefficient corresponding to the audio signal of the last channel located in the starting frame, the terminal may update the sparse transform coefficient transmission direction to a reverse transmission direction (e.g., FB ≠ 1), further determine a priori sparse transform coefficient corresponding to the audio signal of the last channel (e.g., i- -) located in the starting frame according to the sparse transform coefficient corresponding to the audio signal of the last channel located in the starting frame, use the a priori sparse transform coefficient corresponding to the audio signal of the last channel located in the starting frame as the priori sparse transform coefficient corresponding to the audio signal of the last channel located in the starting frame, and calculate the sparse transform coefficient corresponding to the audio signal of the last channel located in the starting frame according to the measurement data and the measurement matrix corresponding to the audio signal of the last channel located in the starting frame, until obtaining the sparse transform coefficient corresponding to the audio signal of the first channel located in the initial frame.

After the terminal obtains the sparse transform coefficient corresponding to the audio signal of the first channel in the initial frame, the terminal can also determine the prior sparse transform coefficient corresponding to the audio signal of the first channel in the second frame according to the sparse transform coefficient corresponding to the audio signal of the first channel in the initial frame, take the prior sparse transform coefficient corresponding to the audio signal of the first channel in the second frame as the prior of the sparse transform coefficient corresponding to the audio signal of the first channel in the second frame, and calculate the sparse transform coefficient corresponding to the audio signal of the first channel in the second frame according to the measurement data and the measurement matrix corresponding to the audio signal of the first channel in the second frame.

After the terminal obtains the sparse transform coefficient corresponding to the audio signal of the second frame in the first channel by calculation, it may also determine a priori sparse transform coefficient corresponding to the audio signal of the second frame in the second channel (e.g., i + +) according to the sparse transform coefficient corresponding to the audio signal of the second frame in the first channel, use the priori sparse transform coefficient corresponding to the audio signal of the second frame in the second channel as the priori sparse transform coefficient corresponding to the audio signal of the second frame in the second channel, and calculate the sparse transform coefficient corresponding to the audio signal of the second frame in the second channel according to the measurement data and the measurement matrix corresponding to the audio signal of the second frame in the second channel until obtaining the sparse transform coefficient corresponding to the audio signal of the second frame in the last channel (e.g., i ═ k).

After the terminal obtains the sparse transform coefficient corresponding to the audio signal of the second frame in the last channel, the terminal may update the transmission direction of the sparse transform coefficient to a reverse transmission direction (e.g., FB ≠ 1), further determine a priori sparse transform coefficient corresponding to the audio signal of the second frame in the last channel (e.g., i —) according to the sparse transform coefficient corresponding to the audio signal of the second frame in the last channel, use the a priori sparse transform coefficient corresponding to the audio signal of the second frame in the last channel as the priori sparse transform coefficient corresponding to the audio signal of the second frame in the last channel, and calculate the sparse transform coefficient corresponding to the audio signal of the second frame in the last channel according to the measurement data and the measurement matrix corresponding to the audio signal of the second frame in the last channel, until obtaining the sparse transform coefficient corresponding to the audio signal of the first channel in the second frame.

After the terminal obtains the sparse transform coefficient corresponding to the audio signal of the second frame of the first channel, the terminal can also determine the prior sparse transform coefficient corresponding to the audio signal of the third frame of the first channel according to the sparse transform coefficient corresponding to the audio signal of the second frame of the first channel, and calculate the sparse transform coefficient corresponding to the audio signal of the third frame of the first channel according to the measurement data and the measurement matrix corresponding to the audio signal of the third frame of the first channel by taking the prior sparse transform coefficient corresponding to the audio signal of the third frame of the first channel as the prior of the sparse transform coefficient corresponding to the audio signal of the third frame of the first channel.

Similarly, the terminal may obtain the sparse transform coefficient corresponding to the audio signal of the first channel located in the last frame through the calculation in the manner described above. Assuming that the last frame is the t-th frame, further, the terminal may determine a prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 th frame according to a sparse transform coefficient corresponding to the audio signal of the first channel located in the t-th frame, take the prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 th frame as the prior of the sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 th frame, and calculate a sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 th frame according to the measurement data and the measurement matrix corresponding to the audio signal of the first channel located in the t-1 th frame. Further, the terminal may determine a prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 frame according to a sparse transform coefficient corresponding to the audio signal of the first channel located in the t-2 frame, use the prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-2 frame as a prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-2 frame, and calculate a sparse transform coefficient corresponding to the audio signal of the first channel located in the t-2 frame according to measurement data and a measurement matrix corresponding to the audio signal of the first channel located in the t-2 frame until a sparse transform coefficient corresponding to the audio signal of the first channel located in the starting frame is obtained.

And S204, carrying out sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals.

The terminal can perform sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals. Illustratively, the terminal may perform inverse time-frequency transform on sparse transform coefficients corresponding to the at least two audio signals through an IMDCT or IFFT, and obtain the at least two audio signals.

In the audio signal reconstruction method shown in fig. 2, the terminal may obtain compressed data corresponding to at least two audio signals, perform inverse quantization on the compressed data corresponding to the at least two audio signals, thereby obtaining measurement data corresponding to the at least two audio signals, obtain a measurement matrix, jointly reconstruct sparse transform coefficients corresponding to the at least two audio signals according to the measurement data and the measurement matrix corresponding to the at least two audio signals, perform sparse inverse transform on the sparse transform coefficients corresponding to the at least two audio signals, obtain the at least two audio signals, and may improve quality of the audio signals.

Referring to fig. 3, fig. 3 is a schematic flowchart of an audio signal reconstruction method according to an embodiment of the present invention, where the audio signal reconstruction method according to the embodiment of the present invention at least includes:

s301, the first terminal samples the collected audio signals of each channel, and the sampled audio signals of each channel are collected in the same time period.

The first terminal may collect the audio signal through each microphone, wherein the frequency of the audio signal collected by each microphone is the same, that is, the supporting set of the audio signal collected by each microphone is the same, the supporting set may include at least one support for indicating whether the frequency of the audio signal is the same as the designated frequency, for example, the frequency of the audio signal collected by each microphone includes 5HZ, 7HZ, and 10HZ, the supporting set of the audio signal collected by each microphone is 0000101001, a first "0" in the supporting set indicates that the frequency of the audio signal is not 1HZ, a second "0" in the supporting set indicates that the frequency of the audio signal is not 2HZ, a third "0" in the supporting set indicates that the frequency of the audio signal is not 3HZ, a fourth "0" in the supporting set indicates that the frequency of the audio signal is not 4HZ, a first "1" in the supporting set indicates that the frequency of the audio signal is 5HZ, the fifth "0" in the support set indicates that the audio signal has a frequency of not 6HZ, the second "1" in the support set indicates that the audio signal has a frequency of 7HZ, the sixth "0" in the support set indicates that the audio signal has a frequency of not 8HZ, the seventh "0" in the support set indicates that the audio signal has a frequency of not 9HZ, and the third "1" in the support set indicates that the audio signal has a frequency of 10 HZ.

Wherein, the collected audio signals have correlation in frequency domain. Illustratively, the audio signals of the respective channels may be represented as follows:

x(t)＝ccos(2πf+θ)

where t denotes a frame of the audio signal of the channel, x (t) denotes an audio signal of the channel located at the t frame, c denotes an amplitude of the audio signal of the channel located at the t frame, f denotes a frequency of the audio signal of the channel located at the t frame, and θ denotes a phase of the audio signal of the channel located at the t frame. The first terminal may determine that the audio signal of any channel is composed of a limited number of frequency components, and the frequencies of the audio signals of the same channel collected at different time periods are sparse.

Wherein, the acquired audio signals have spatial correlation. Illustratively, the amplitude of the sparse transform coefficient corresponding to the audio signal of each channel acquired in the same time period is logarithmically attenuated with the acquisition distance of the audio signal of the channel, the phase of the sparse transform coefficient corresponding to the audio signal of each channel acquired in the same time period is linearly changed with the acquisition distance of the audio signal of the channel, and the support of the sparse transform coefficient corresponding to the audio signal of each channel acquired in the same time period is the same.

Wherein, the collected audio signals have time domain correlation. Illustratively, the amplitudes of the audio signals of the same channel acquired at different time intervals are linearly related, the phases of the audio signals of the same channel acquired at different time intervals are the same, and the supports of the audio signals of the same channel acquired at different time intervals are distributed in a bernoulli manner.

The first terminal may use the audio signal collected by any microphone as an audio signal of one channel, and sample the collected audio signals of the channels to obtain audio signals of the channels collected at the same time period, for example, the first terminal samples to obtain an audio signal of each channel located in a starting frame, or samples to obtain an audio signal of each channel located in a second frame, or samples to obtain an audio signal of each channel located in a last frame, and so on.

S302, the first terminal conducts sparse transformation on the audio signals of the channels obtained through sampling to obtain sparse transformation coefficients.

The first terminal may perform sparse transform on the sampled audio signals of each channel to obtain a sparse transform coefficient, that is, perform time-frequency transform on the sampled audio signals of each channel acquired in the same time period, for example, the first terminal may perform sparse transform on the sampled audio signals of each channel acquired in the same time period through an FFT algorithm or an MDCT algorithm to obtain a sparse transform coefficient.

And S303, multiplying the preset measurement matrix by the sparse transformation coefficient by the first terminal to obtain a perception measurement value.

After the first terminal obtains the sparse transform coefficient of the audio signal of each channel acquired at the same time period, the first terminal may multiply the preset measurement matrix by the sparse variable coefficient to obtain a perception measurement value. The number of rows of the preset measuring matrix is less than the number of columns, so that sub-Nyquist sampling can be realized, and the audio signal can be recovered without distortion in the process of reconstructing the audio signal.

And S304, the first terminal adds the perception measurement matrix and the noise vector to obtain compressed data.

After the first terminal obtains the sensing measurement value, matrix quantization may be performed on the sensing measurement value to obtain compressed sampling data. For example, the first terminal may obtain noise vectors corresponding to the acquisition environments of the audio signals of the channels, and add the perceptual measurement matrix to the noise vectors to obtain compressed data.

For example, the first terminal processes the audio signals of the channels in the same frame to obtain compressed data, which may be represented by the following formula:

wherein, Y_m×NCompressed sample data representing m frames of N channels, A_m×nA pre-set measurement matrix is represented,

represents sparse transform coefficients, W represents a noise vector corresponding to an acquisition environment of an audio signal of each channel, wherein,

X_n×Nrepresenting an audio signal of m frames for N channels, m < N.

In specific implementation, the first terminal may sample to obtain audio signals of each channel located in the start frame, perform sparse transformation on the sampled audio signals of each channel located in the start frame to obtain a sparse transformation coefficient, multiply a preset measurement matrix with the sparse transformation coefficient to obtain a perceptual measurement value, and add the perceptual measurement value with a noise vector to obtain compressed data of each channel located in the start frame. Further, the first terminal may sample audio signals of each channel in the second frame, perform sparse transform on the sampled audio signals of each channel in the second frame to obtain a sparse transform coefficient, multiply a preset measurement matrix with the sparse transform coefficient to obtain a perceptual measurement value, and add the perceptual measurement value to the noise vector to obtain compressed data of each channel in the second frame. Further, the first terminal may sample audio signals of each channel in the third frame, perform sparse transform on the sampled audio signals of each channel in the third frame to obtain a sparse transform coefficient, multiply a preset measurement matrix with the sparse transform coefficient to obtain a perception measurement value, add the perception measurement value to a noise vector to obtain compressed data of each channel in the third frame, until compressed data of each channel in the last frame is obtained.

S305, the first terminal sends the compressed data to the second terminal.

After the first terminal acquires the compressed data of each channel, the first terminal may send the compressed sampled data of each channel to the second terminal. Optionally, the first terminal may send the compressed sample data of each channel in different frames to the second terminal after acquiring the compressed sample data of each channel in different frames.

S306, the second terminal determines a prior sparse transform coefficient corresponding to the audio signal of the next channel of the channel in the appointed frame according to the sparse transform coefficient corresponding to the audio signal of the channel in the appointed frame.

And S307, the second terminal obtains the sparse transform coefficient corresponding to the audio signal of the next channel of the channel in the designated frame according to the prior sparse transform coefficient corresponding to the audio signal of the next channel of the channel in the designated frame, the measurement data corresponding to the audio signal of the next channel of the channel in the designated frame and the measurement matrix.

Optionally, the second terminal may use the prior sparse transform coefficient corresponding to the audio signal of the channel next to the designated frame, the measurement data corresponding to the audio signal of the channel next to the designated frame, and the measurement matrix as inputs of the bayesian algorithm, so as to obtain the sparse transform coefficient corresponding to the audio signal of the channel next to the designated frame.

Illustratively, the bayesian algorithm can be represented as follows:

wherein p (x, theta, c, s | y) represents the probability of obtaining x, theta, c and s based on y, x represents the sparse transform coefficient corresponding to the audio signal, theta represents the phase in the sparse transform coefficient corresponding to the audio signal, c represents the amplitude in the sparse transform coefficient corresponding to the audio signal, s represents the frequency in the sparse transform coefficient corresponding to the audio signal, and y represents the measurement data corresponding to the audio signal; t represents the frame length of the audio signal of each channel, M represents the row number of the measurement matrix, and N represents the column number of the measurement matrix;

the representation is based on x^(t)To obtain

The probability of (a) of (b) being,

measurement data, x, corresponding to the audio signal in the t-th frame representing the m-th channel^(t)Sparse transform coefficients corresponding to the audio signal representing the t-th frame;

the representation is based on

And s_nTo obtain

The probability of (a) of (b) being,

represents the phase of the nth channel in the corresponding sparse transform coefficient of the audio signal of the t-th frame,

representing the amplitude, s, of the n-th channel in the sparse transform coefficient corresponding to the audio signal of the t-th frame_nRepresenting frequencies in the sparse transform coefficients corresponding to the audio signal of the nth channel;

the representation is based on

To obtain

The probability of (a) of (b) being,

representing the phase of the nth channel in the sparse transform coefficient corresponding to the audio signal of the t-1 th frame;

the representation is based on

To obtain

The probability of (a) of (b) being,

representing the amplitude of the nth channel in the sparse transform coefficient corresponding to the audio signal of the t-1 th frame; p(s)_n) Sparse representation of audio signal correspondence for nth channelThe probability of a frequency in a transform coefficient.

The message transmission mode may include a forward transmission mode and a reverse transmission mode, where the forward transmission mode determines a prior sparse transform coefficient corresponding to the audio signal of the channel located next to the designated frame according to a sparse transform coefficient corresponding to the audio signal of the channel located at the designated frame, and obtains a sparse transform coefficient corresponding to the audio signal of the channel located next to the designated frame according to the prior sparse transform coefficient corresponding to the audio signal of the channel located next to the designated frame, measurement data corresponding to the audio signal of the channel located next to the designated frame, and the measurement matrix. And performing reverse transmission, namely determining a prior sparse transform coefficient corresponding to the audio signal of the last channel of the channel in the designated frame according to the sparse transform coefficient corresponding to the audio signal of the channel in the designated frame, and obtaining the sparse transform coefficient corresponding to the audio signal of the last channel of the channel in the designated frame according to the prior sparse transform coefficient corresponding to the audio signal of the last channel in the designated frame, the measurement data corresponding to the audio signal of the last channel in the designated frame and the measurement matrix. Taking the interface schematic diagram for transferring parameter information shown in fig. 1B as an example,

measurement data corresponding to the audio signal representing the first channel in the t-th frame,

measurement data corresponding to the audio signal representing the mth channel located in the tth frame,

the sparse transform coefficient corresponding to the audio signal of the nth channel located in the t-th frame is represented,

the sparse transform coefficient corresponding to the audio signal of the k channel located in the t frame,

indicating the frequency of the nth channel in the corresponding sparse transform coefficient of the audio signal of the t-th frame,

indicating the frequency of the k channel in the corresponding sparse transform coefficient of the audio signal of the t frame,

indicating the phase of the nth channel in the corresponding sparse transform coefficient of the audio signal of the t-th frame,

indicating the phase of the k channel in the corresponding sparse transform coefficient of the audio signal of the t frame,

representing the amplitude of the nth channel in the corresponding sparse transform coefficient of the audio signal of the t-th frame,

magnitudes in sparse transform coefficients corresponding to the audio signal of the t-th frame representing the k-th channel, e.g. based on the second terminal

To obtain

The second terminal may be according to

Obtaining the frequency of the n channel in the sparse transform coefficient corresponding to the audio signal of the t frame (i.e. obtaining the frequency of the n channel in the sparse transform coefficient corresponding to the audio signal of the t frame)

) The nth channel is located in the phase of the corresponding sparse transform coefficient of the audio signal of the t-th frame (i.e. the phase of the corresponding sparse transform coefficient of the audio signal of the t-th frame)

) The nth channel is located in the amplitude of the corresponding sparse transform coefficient of the audio signal of the t-th frame (i.e. the amplitude of the corresponding sparse transform coefficient of the audio signal of the t-th frame)

) And then according to an audio signal time domain near-field algorithm, determining a priori sparse transform coefficient corresponding to the audio signal of the nth channel located in the t +1 th frame based on the frequency, the phase and the amplitude of the nth channel located in the sparse transform coefficient corresponding to the audio signal of the t +1 th frame, and obtaining a sparse transform coefficient corresponding to the audio signal of the nth channel located in the t +1 th frame according to the priori sparse transform coefficient corresponding to the audio signal of the nth channel located in the t +1 th frame, the measurement data corresponding to the audio signal of the nth channel located in the t +1 th frame and the measurement matrix, for example, the phase (namely, the phase) of the nth channel located in the sparse transform coefficient corresponding to the audio signal of the t +1 th frame

) And the like. For another example, the second terminal determines the prior sparse transform coefficient corresponding to the audio signal of the nth channel located in the t +1 th frame according to the sparse transform coefficient corresponding to the audio signal of the nth channel located in the t +1 th frame, and obtains the sparse transform coefficient corresponding to the audio signal of the nth channel located in the t th frame according to the prior sparse transform coefficient corresponding to the audio signal of the nth channel located in the t th frame, the measurement data corresponding to the audio signal of the nth channel located in the t th frame, and the measurement matrix. It should be noted that, in the embodiment of the present invention, sparse transform coefficients corresponding to audio signals of the same channel collected at different time periods are transmitted in the same channel, so that redundancy of the channel can be improved.

Taking the interface schematic diagram of the reconstructed audio signal shown in fig. 1C as an example, the Frame length of the acquired audio signal is t, the number of channels is K, the second terminal may preset an initial sparse transform coefficient corresponding to the audio signal of the first channel (Chann 1) located in the starting Frame (Frame1), and calculate a sparse transform coefficient corresponding to the audio signal of the first channel located in the starting Frame according to the initial sparse transform coefficient corresponding to the audio signal of the first channel located in the starting Frame, the measurement data corresponding to the audio signal of the first channel located in the starting Frame, and the measurement matrix. Further, the second terminal may use a sparse transform coefficient corresponding to the audio signal of the first channel located in the starting frame as an input of an audio signal airspace near-field algorithm to obtain a prior sparse transform coefficient corresponding to the audio signal of the second channel (Chann 2) located in the starting frame, and calculate a sparse transform coefficient corresponding to the audio signal of the second channel located in the starting frame according to the prior sparse transform coefficient corresponding to the audio signal of the second channel located in the starting frame, the measurement data corresponding to the audio signal of the second channel located in the starting frame, and the measurement matrix. Further, the second terminal may use a sparse transform coefficient corresponding to the audio signal of the second channel located in the start frame as an input of the audio signal spatial domain near-field algorithm to obtain a prior sparse transform coefficient corresponding to the audio signal of the third channel (Chann 3) located in the start frame, and calculate a sparse transform coefficient corresponding to the audio signal of the third channel located in the start frame according to the prior sparse transform coefficient corresponding to the audio signal of the third channel located in the start frame, the measurement data corresponding to the audio signal of the third channel located in the start frame, and the measurement matrix.

Optionally, after the second terminal obtains the sparse transform coefficient corresponding to the audio signal of the K-th channel (Chann K) located in the starting frame, the sparse transform coefficient corresponding to the audio signal of the K-th channel located in the starting frame may be used as an input of an audio signal airspace near-field algorithm to obtain a priori sparse transform coefficient corresponding to the audio signal of the K-1-th channel located in the starting frame, and the sparse transform coefficient corresponding to the audio signal of the K-1-th channel located in the starting frame is calculated according to the priori sparse transform coefficient corresponding to the audio signal of the K-1-th channel located in the starting frame, the measurement data corresponding to the audio signal of the K-1-th channel located in the starting frame, and the measurement matrix until the sparse transform coefficient corresponding to the audio signal of the first channel located in the starting frame is obtained.

Optionally, after the second terminal obtains the sparse transform coefficient corresponding to the audio signal of the first channel located in the starting frame, the sparse transform coefficient corresponding to the audio signal of the first channel located in the starting frame may be used as an input of an audio signal time-domain near-field algorithm to obtain a prior sparse transform coefficient corresponding to the audio signal of the first channel located in the second frame, and the sparse transform coefficient corresponding to the audio signal of the first channel located in the second frame is calculated according to the prior sparse transform coefficient corresponding to the audio signal of the first channel located in the second frame, the measurement data corresponding to the audio signal of the first channel located in the second frame, and the measurement matrix. Further, the second terminal may use a sparse transform coefficient corresponding to the audio signal of the first channel located in the second Frame as an input of an audio signal time-domain near-field algorithm to obtain a prior sparse transform coefficient corresponding to the audio signal of the first channel located in the third Frame, and calculate a sparse transform coefficient corresponding to the audio signal of the first channel located in the third Frame according to the prior sparse transform coefficient corresponding to the audio signal of the first channel located in the third Frame, the measurement data corresponding to the audio signal of the first channel located in the third Frame, and the measurement matrix until a sparse transform coefficient corresponding to the audio signal of the first channel located in the t-th Frame (Frame t) is obtained.

Optionally, after the second terminal obtains the sparse transform coefficient corresponding to the audio signal of the t-th frame in the first channel, the sparse transform coefficient corresponding to the audio signal of the first channel located in the t-th Frame (Frame t-1) can be used as the input of the audio signal time domain near-field algorithm to obtain the prior sparse transform coefficient corresponding to the audio signal of the first channel located in the t-1 th Frame (Frame t-1), and calculating the sparse transform coefficient corresponding to the audio signal of the first channel positioned at the t-1 frame according to the prior sparse transform coefficient corresponding to the audio signal of the first channel positioned at the t-1 frame, the measurement data corresponding to the audio signal of the first channel positioned at the t-1 frame and the measurement matrix until the sparse transform coefficient corresponding to the audio signal of the first channel positioned at the first frame is obtained.

The audio signal spatial domain near-field algorithm may specifically include: the amplitude of the sparse transform coefficient corresponding to the audio signal of each channel acquired at the same time interval is logarithmically attenuated with the acquisition distance of the audio signal of the channel, the phase of the sparse transform coefficient corresponding to the audio signal of each channel acquired at the same time interval is linearly changed with the acquisition distance of the audio signal of the channel, and the support of the sparse transform coefficient corresponding to the audio signal of each channel acquired at the same time interval is the same.

The audio signal time-domain near-field algorithm may specifically include: the amplitudes of the sparse transform coefficients corresponding to the audio signals of the same channel acquired at different time periods are linearly related, the phases of the sparse transform coefficients corresponding to the audio signals of the same channel acquired at different time periods are the same, and the supports in the sparse transform coefficients corresponding to the audio signals of the same channel acquired at different time periods are distributed in a Bernoulli manner.

And S308, the second terminal performs sparse inverse transformation on the sparse transformation coefficient corresponding to the audio signal of the next channel of the channel in the appointed frame to obtain the audio signal of the next channel of the channel in the appointed frame.

In the audio signal reconstruction method shown in fig. 3, a first terminal samples audio signals of each channel acquired at the same time period, the first terminal performs sparse transform and perceptual measurement on the audio signals of each channel acquired by the sampling to obtain compressed data, after a second terminal acquires the compressed data sent by the first terminal, according to a sparse transform coefficient corresponding to an audio signal of a channel located in a designated frame, a priori sparse transform coefficient corresponding to an audio signal of a next channel located in the designated frame is determined, according to a priori sparse transform coefficient corresponding to an audio signal of a next channel located in the designated frame, measurement data corresponding to an audio signal of a next channel located in the designated frame and a measurement matrix, a sparse transform coefficient corresponding to an audio signal of a next channel located in the designated frame is obtained, and carrying out sparse inverse transformation on a sparse transformation coefficient corresponding to the audio signal of the next channel of the channel positioned in the appointed frame to obtain the audio signal of the next channel of the channel positioned in the appointed frame, so that the quality of the audio signal can be improved.

Referring to fig. 4, fig. 4 is a schematic flow chart of an audio signal compression method according to another embodiment of the present invention, where the audio signal compression method according to the embodiment of the present invention at least includes:

s401, the terminal collects audio signals of a plurality of channels.

And S402, the terminal obtains the audio signals of all channels collected in the same time period through cosine window sampling.

The terminal can obtain the audio signals of all channels collected in the same time period through cosine window sampling. Taking the interface schematic diagram of sampling audio signals shown in fig. 1D as an example, the terminal acquires audio signals of K channels, and the frame length of each acquired audio signal is t, the terminal may obtain audio signals of K channels located in the start frame through cosine window sampling, and optionally, the terminal may obtain audio signals of K channels located in the t-th frame through cosine window sampling, and so on.

Optionally, after the terminal obtains the audio signals of the channels collected at the same time period through cosine window sampling, the audio signals of the channels collected at the same time period obtained through sampling may be stored in a preset buffer area. In addition, when the data amount of the audio signals in the preset buffer area is larger than the capacity of the preset buffer area, the terminal can acquire the storage time of each audio signal and delete the audio signal with the earliest storage time.

And S403, the terminal performs sparse transformation on the sampled audio signals of each channel to obtain sparse transformation coefficients.

S404, the terminal multiplies the preset measurement matrix by the sparse transformation coefficient to obtain a perception measurement value.

And S405, the terminal adds the perception measurement value and the noise vector to obtain compressed data.

S406, the terminal judges whether the sampled audio signal is positioned in the last frame.

The terminal may determine whether to sample the audio signal of each channel located in the last frame, and when the sampled audio signal is not located in the last frame, the terminal may further perform step S407; when the sampled audio signal is located in the last frame, the terminal may further perform step S408.

And S407, when the sampled audio signal is not in the last frame, the terminal translates the cosine window.

For example, after the first terminal obtains the audio signal of each channel in the initial frame through cosine window sampling, and obtains the compressed sampling data of each channel in the initial frame, when the audio signal obtained through sampling is not in the last frame, the first terminal may translate the cosine window, and further perform steps S402 to S405, that is, obtain the audio signal of each channel in the second frame through sampling, and perform processing such as sparse transformation, perceptual measurement and the like on the audio signal of each channel in the second frame, so as to obtain the compressed data of each channel in the second frame; when the audio signal obtained by sampling is not located in the last frame, the terminal can translate the cosine window, obtain the audio signal of each channel located in the third frame by sampling, and perform sparse transformation, perception measurement and other processing on the audio signal of each channel located in the third frame to obtain the compressed data of each channel located in the third frame until the compressed data of each channel located in the last frame is obtained.

And S408, when the sampled audio signal is positioned in the last frame, the terminal outputs an end mark.

When the audio signal obtained by the terminal through cosine window sampling is located in the last frame, the terminal can output an end mark to trigger the terminal to send the compressed data of each channel located in different frames to other terminals. Optionally, the terminal may process the audio signal of each channel in the same frame to obtain compressed data of each channel in the same frame, and then send the compressed data to other terminals.

Illustratively, the terminal may collect the sound source through an array of 32 microphones, to obtain 32 channels of audio signals, where the frame length of the collected audio signals of each channel is 16384. The cosine windows are 50% overlapped. The preset measurement matrix can be a normalized gaussian random matrix 5641 × 16384, the noise vector corresponding to the acquisition environment of the audio signal can be gaussian white noise with a mean value of 0 and a variance of 1, the data volume of the preset buffer area is 16384 × 32B, the compression ratio in the process of processing the audio signal by the terminal to obtain the compressed sampling data is 1:3, namely when the data volume of the audio signal of one channel in the specified frame is 300KB, the data volume of the compressed sampling data obtained by processing the audio signal is 100 KB.

In the audio signal compression method shown in fig. 4, a terminal obtains audio signals of each channel collected at the same time period through cosine window sampling, performs sparse transformation on the audio signals of each channel obtained through sampling to obtain a sparse transformation coefficient, multiplies a preset measurement matrix by the sparse transformation coefficient to obtain a perception measurement value, adds the perception measurement value to a noise vector to obtain compressed sampling data, translates the cosine window to obtain audio signals of each channel located in a next frame of a specified frame, further obtains compressed data of each channel located in the next frame of the specified frame, and can establish an effective sparse model to compress the audio signals.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an audio signal reconstruction apparatus according to an embodiment of the present invention, where the audio signal reconstruction apparatus according to the embodiment of the present invention at least includes a compressed data obtaining module 501, an inverse quantization module 502, a joint reconstruction module 503, and an inverse sparse transform module 504, where:

a compressed data obtaining module 501, configured to obtain compressed data corresponding to at least two audio signals;

an inverse quantization module 502, configured to perform inverse quantization on compressed data corresponding to the at least two audio signals, so as to obtain measurement data corresponding to the at least two audio signals;

a joint reconstruction module 503, configured to obtain a measurement matrix, and jointly reconstruct sparse transform coefficients corresponding to the at least two audio signals according to measurement data corresponding to the at least two audio signals and the measurement matrix;

a sparse inverse transform module 504, configured to perform sparse inverse transform on sparse transform coefficients corresponding to the at least two audio signals to obtain the at least two audio signals.

Optionally, the at least two audio signals include a first audio signal and a second audio signal, and the joint reconstruction module 503 is specifically configured to:

Optionally, the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired at the same time period, and the joint reconstruction module 503 calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

determining a second amplitude in a priori sparse transform coefficient corresponding to the second audio signal according to a first amplitude in the sparse transform coefficient corresponding to the first audio signal, wherein the ratio of the first amplitude to the second amplitude is the ratio of the logarithm of the distance from a microphone corresponding to the first audio signal to a sound source and the logarithm of the distance from the microphone corresponding to the second audio signal to the sound source;

and taking the second amplitude as the prior of the amplitude in the sparse transform coefficient corresponding to the second audio signal, and calculating the amplitude in the sparse transform coefficient corresponding to the second audio signal according to the measurement data corresponding to the second audio signal and the measurement matrix.

determining a second phase in a sparse transform coefficient corresponding to the second audio signal according to a first phase in the sparse transform coefficient corresponding to the first audio signal, wherein the ratio of the first phase to the second phase is the ratio of the distance from a microphone corresponding to the first audio signal to a sound source to the distance from a microphone corresponding to the second audio signal to the sound source;

and taking the second phase as the prior of the phase in the sparse transform coefficient corresponding to the second audio signal, and calculating the phase in the sparse transform coefficient corresponding to the second audio signal according to the measurement data corresponding to the second audio signal and the measurement matrix.

determining a first frequency in a sparse transform coefficient corresponding to the first audio signal as a second frequency in a prior sparse transform coefficient corresponding to the second audio signal;

and taking the second frequency as the prior of the frequency in the sparse transform coefficient corresponding to the second audio signal, and calculating the frequency in the sparse transform coefficient corresponding to the second audio signal according to the measurement data corresponding to the second audio signal and the measurement matrix.

Optionally, the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the joint reconstruction module 503 calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

determining a second amplitude in the prior sparse transform coefficient corresponding to the second audio signal according to a first amplitude in the sparse transform coefficient corresponding to the first audio signal, wherein the amplitudes in the coefficient transform coefficients corresponding to the audio signals of different time periods corresponding to the same channel are in a linear relation with the sequence numbers of the frames corresponding to the audio signals of different time periods;

determining a first phase in a sparse transform coefficient corresponding to the first audio signal as a second phase in a prior sparse transform coefficient corresponding to the second audio signal;

determining a second frequency in a priori sparse transform coefficient corresponding to the second audio signal according to a first frequency in the sparse transform coefficient corresponding to the first audio signal, wherein the first frequency and the second frequency have an intersection, and the frequencies in the intersection are obtained by randomly selecting the frequencies in the first frequency;

Optionally, the first audio signal and the second audio signal are audio signals acquired in adjacent time periods.

Optionally, the at least two audio signals include a third audio signal, and the joint reconstruction module 503 jointly reconstructs sparse transform coefficients corresponding to the at least two audio signals according to the measurement data corresponding to the at least two audio signals and the measurement matrix, specifically to:

and calculating a sparse transform coefficient corresponding to the third audio signal according to a preset initial sparse transform coefficient, the measurement data corresponding to the third audio signal and the measurement matrix.

Specifically, the audio signal reconstruction apparatus described in the embodiment of the present invention may be used to implement part or all of the processes in the embodiment of the audio signal reconstruction method described in conjunction with fig. 2 to 4 of the present invention.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention. As shown in fig. 6, the terminal may include: a processor 601, a memory 602, and a network interface 603. The processor 601 is connected to the memory 602 and the network interface 603, for example, the processor 601 may be connected to the memory 602 and the network interface 603 through a bus.

The processor 601 may be a Central Processing Unit (CPU), a Network Processor (NP), or the like.

The memory 602 may be specifically configured to store audio signals of a plurality of channels, and the like. The memory 602 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD), or a solid-state drive (SSD); the memory may also comprise a combination of memories of the kind described above.

The network interface 603 is used for communicating with other terminals, for example, receiving compressed data sent by other terminals. The network interface 603 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

Acquiring compressed data corresponding to at least two audio signals;

inversely quantizing the compressed data corresponding to the at least two audio signals to obtain measured data corresponding to the at least two audio signals;

acquiring a measurement matrix, and jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to measurement data corresponding to the at least two audio signals and the measurement matrix;

and carrying out sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals.

Optionally, the at least two audio signals include a first audio signal and a second audio signal, and the jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to the measurement data corresponding to the at least two audio signals and the measurement matrix includes:

Optionally, the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix includes:

Optionally, the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix includes:

Optionally, the at least two audio signals include a third audio signal, and the jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to the measurement data corresponding to the at least two audio signals and the measurement matrix includes:

Specifically, the terminal described in the embodiment of the present invention may be used to implement part or all of the processes in the embodiment of the audio signal reconstruction method described in conjunction with fig. 2 to 4 of the present invention.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily for the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as a program listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer case (magnetic device), a random access memory, a read only memory, an erasable programmable read only memory, an optical fiber device, and a portable compact disc read only memory. Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array, a field programmable gate array, or the like.

In addition, the modules in the embodiments of the present invention may be implemented in the form of hardware, or may be implemented in the form of software functional modules. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method of audio signal reconstruction, comprising:

acquiring compressed data corresponding to at least two audio signals;

performing sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals;

wherein the at least two audio signals include a first audio signal and a second audio signal, and the jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to the measurement data corresponding to the at least two audio signals and the measurement matrix includes:

2. The method of claim 1, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in the same time period, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

3. The method of claim 1, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in the same time period, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

4. The method of claim 2, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in the same time period, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

5. The method according to any one of claims 1 to 4, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in the same time period, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

6. The method according to claim 1, wherein the first audio signal and the second audio signal correspond to a same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

7. The method according to claim 1 or 6, wherein the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the calculating the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix comprises:

8. An audio signal reconstruction method, characterized in that the method includes all the features of the method of any one of claims 1, 6 and 7, and the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the calculating of the sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix includes:

9. A method of audio signal reconstruction, characterized in that the method comprises all the features of the method of any one of claims 6 to 8, and in that the first audio signal and the second audio signal are audio signals acquired in adjacent time periods.

10. A method for audio signal reconstruction, wherein the method comprises all the features of the method of any one of claims 1 to 9, and wherein the at least two audio signals comprise a third audio signal, and wherein the jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals from the measurement data corresponding to the at least two audio signals and the measurement matrix comprises:

11. An audio signal reconstruction apparatus, comprising:

the compressed data acquisition module is used for acquiring compressed data corresponding to at least two audio signals;

the inverse quantization module is used for performing inverse quantization on the compressed data corresponding to the at least two audio signals so as to obtain the measurement data corresponding to the at least two audio signals;

the joint reconstruction module is used for acquiring a measurement matrix and jointly reconstructing sparse transform coefficients corresponding to the at least two audio signals according to the measurement data corresponding to the at least two audio signals and the measurement matrix;

the sparse inverse transformation module is used for carrying out sparse inverse transformation on sparse transformation coefficients corresponding to the at least two audio signals to obtain the at least two audio signals;

wherein the at least two audio signals comprise a first audio signal and a second audio signal, and the joint reconstruction module is specifically configured to:

12. The apparatus according to claim 11, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in a same time period, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

13. The apparatus according to claim 11, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in a same time period, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

14. The apparatus according to claim 12, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in a same time period, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

15. The apparatus according to any one of claims 11 to 14, wherein the first audio signal corresponds to a first channel, the second audio signal corresponds to a second channel, the first audio signal and the second audio signal are audio signals acquired in the same time period, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

16. The apparatus according to claim 11, wherein the first audio signal and the second audio signal correspond to a same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

17. The apparatus according to claim 11 or 16, wherein the first audio signal and the second audio signal correspond to a same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to a sparse transform coefficient corresponding to the first audio signal, measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

18. An audio signal reconstruction apparatus, characterized in that the apparatus includes all the features of the apparatus of claims 11, 16 and 17, and the first audio signal and the second audio signal correspond to the same channel, the first audio signal and the second audio signal are audio signals acquired at different time intervals, and the joint reconstruction module calculates a sparse transform coefficient corresponding to the second audio signal according to the sparse transform coefficient corresponding to the first audio signal, the measurement data corresponding to the second audio signal, and the measurement matrix, and is specifically configured to:

19. An apparatus for reconstructing an audio signal, the apparatus comprising all the features of the apparatus of any one of claims 16 to 18, wherein the first audio signal and the second audio signal are audio signals acquired in adjacent time periods.

20. An audio signal reconstruction apparatus, characterized in that said apparatus comprises all the features of the apparatus of any one of claims 11 to 19, and said at least two audio signals comprise a third audio signal, and said joint reconstruction module jointly reconstructs sparse transform coefficients corresponding to said at least two audio signals according to the measurement data corresponding to said at least two audio signals and said measurement matrix, in particular to:

21. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a computer device, is adapted to implement the method of any one of claims 1 to 10.

22. An audio signal reconstruction apparatus, comprising:

a processor and a memory, wherein the processor is capable of processing a plurality of data,

wherein the memory stores a computer program for implementing the method of any one of claims 1 to 10 when executed by the processor.