WO2018058989A1 - 一种音频信号的重建方法和装置 - Google Patents

一种音频信号的重建方法和装置 Download PDF

Info

Publication number
WO2018058989A1
WO2018058989A1 PCT/CN2017/086390 CN2017086390W WO2018058989A1 WO 2018058989 A1 WO2018058989 A1 WO 2018058989A1 CN 2017086390 W CN2017086390 W CN 2017086390W WO 2018058989 A1 WO2018058989 A1 WO 2018058989A1
Authority
WO
WIPO (PCT)
Prior art keywords
compressed data
audio signals
frequency domain
channel
audio signal
Prior art date
Application number
PCT/CN2017/086390
Other languages
English (en)
French (fr)
Inventor
蒋三新
应忍冬
文飞
江晓波
刘佩林
金文宇
肖玮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018058989A1 publication Critical patent/WO2018058989A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Definitions

  • the present invention relates to the field of communications, and in particular, to a method and apparatus for reconstructing an audio signal.
  • the reconstruction algorithm needs to meet the requirements of accuracy and speed at the same time.
  • the accuracy is too low or the speed is too slow to meet the actual application requirements.
  • the compressed sampling of the signal is achieved by multiplying the original signal by a measurement matrix that needs to be passed to the signal reconstruction end to achieve recovery of the compressed signal. Similar to the conventional audio codec scheme, the compressed sampling of the audio signal is also performed in units of "frames".
  • multi-channel massive data puts higher requirements on the operation speed of the reconstruction algorithm.
  • parallel processing can be used to increase the speed of the operation.
  • Parallel processing requires parallel units to be independent of each other, which means that parallelism between channels will result in inaccessibility of the channels, making reconstruction accuracy limited.
  • joint reconstruction of multiple channels will result in mutual coupling between channels, and parallel acceleration cannot be achieved.
  • the embodiment of the invention provides a method and a terminal for reconstructing an audio signal, which can solve the problem that the signal reconstruction precision is poor and the channel cannot be accelerated in parallel.
  • a method for reconstructing an audio signal includes: acquiring compressed data corresponding to at least two audio signals of at least two channels, at least two channels corresponding to at least two audio signals; acquiring at least two audio signals Grouping information of a group in which the corresponding channel is located; grouping compressed data corresponding to at least two audio signals according to the grouping information, thereby obtaining a compressed data group; acquiring a measurement matrix, and combining the compressed data and the measurement matrix in the compressed data group Reconstructing the frequency domain coefficients corresponding to the compressed data in the compressed data group; performing frequency domain to time domain transform on the frequency domain coefficients, thereby obtaining an audio signal corresponding to the compressed data in the compressed data group.
  • At least two audios may be according to the grouping information of the group of the channel corresponding to the at least two audio signals.
  • the compressed data corresponding to the signal is grouped, so that the compressed data in the obtained compressed data group can be jointly reconstructed, which can improve the accuracy of joint reconstruction in the group, and joint reconstruction between groups can improve the speed of joint reconstruction.
  • the method further includes: acquiring speech tone tag information of at least two audio signals, the speech tone tag information is used to indicate that the at least two audio signals are voice signals or musical tone signals; acquiring at least two channels
  • the compressed data corresponding to the at least two audio signals includes: obtaining a frame length corresponding to the tone music tag information according to the tone music tag information; and extracting measurement data corresponding to the at least two audio signals according to the frame length;
  • the measurement data is inverse quantized to obtain compressed data corresponding to at least two audio signals. Therefore, for a signal with a relatively slow time-varying characteristic of the tone signal, on the one hand, the signal frame length can be increased to improve the accuracy of the signal reconstruction. On the other hand, for the same length of the signal, the increase of the signal frame length is reduced.
  • the number of signal frames also further reduces the runtime of the signal processing algorithm.
  • the frequency domain coefficients corresponding to the compressed data in the jointly reconstructed compressed data group include: a frequency corresponding to the compressed data corresponding to one channel in the compressed data group.
  • the domain coefficient, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix, and the frequency domain coefficients corresponding to the compressed data corresponding to another channel in the compressed data group are calculated.
  • the calculation method may be an Approximate Message Passing (AMP) algorithm, or may be other algorithms.
  • AMP Approximate Message Passing
  • the present application does not limit the frequency domain coefficients corresponding to the compressed data with higher precision.
  • the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group is calculated according to the frequency domain coefficient corresponding to one channel in the compressed data group, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data includes: calculating the i++ according to the frequency domain coefficient corresponding to the compressed data corresponding to the i th channel in the compressed data group, the compressed data corresponding to the i+1th channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data corresponding to one channel until the frequency domain coefficient corresponding to the compressed data corresponding to the kth channel in the compressed data group is calculated i is a positive integer smaller than k, and k is a channel in the compressed data group total. That is, for the compressed data in the same compressed data group, the compressed data of each channel in the packet can be jointly reconstructed to improve the accuracy of the frequency domain coefficients.
  • the method further includes: calculating, according to the frequency domain coefficient corresponding to the compressed data corresponding to the jth channel in the compressed data group, the compressed data corresponding to the j-1th channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data corresponding to the j-1th channel until the frequency domain coefficient corresponding to the compressed data corresponding to the first channel in the compressed data group is calculated, and j is a positive integer greater than or equal to k and greater than 1. . That is to say, for the compressed data in the same compressed data group, the frequency domain coefficient corresponding to the compressed data corresponding to the kth channel can be calculated from the i-th channel, and then calculated from the k-th channel until the data is obtained.
  • the frequency domain coefficient corresponding to the compressed data corresponding to the jth channel may be referred to as an algorithm iteration until the frequency domain coefficient corresponding to the obtained compressed data reaches a preset requirement, that is, the joint reconstruction is performed by the compressed data in the group.
  • the frequency domain coefficient of the packet may be referred to as an algorithm iteration until the frequency domain coefficient corresponding to the obtained compressed data reaches a preset requirement, that is, the joint reconstruction is performed by the compressed data in the group.
  • the method further includes: calculating a frequency domain coefficient corresponding to the compressed data corresponding to the second channel according to the preset initial frequency domain coefficient, the compressed data corresponding to the first channel, and the measurement matrix. That is, the frequency domain coefficient corresponding to the first channel in the compressed data group can be preset.
  • the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group is calculated.
  • the frequency domain coefficient corresponding to the compressed data includes: determining, according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel, the a priori frequency domain coefficient corresponding to the compressed data corresponding to the other channel; and using the a priori frequency domain coefficient as the corresponding channel Compressing the a priori of the frequency domain coefficient corresponding to the data, and calculating the frequency domain coefficient corresponding to the compressed data corresponding to another channel in the compressed data group according to the compressed data and the measurement matrix corresponding to the other channel.
  • the frequency domain coefficient corresponding to the compressed data obtained in the previous channel is used as the a priori frequency domain coefficient corresponding to the compressed data corresponding to the next channel, and the a priori frequency domain coefficient is a priori, and the compression corresponding to the next channel is calculated.
  • the frequency domain coefficient corresponding to the data is used to obtain the frequency domain coefficient with high precision of the compressed data set.
  • a compression sampling method for an audio signal comprising: acquiring at least two audio signals of at least two channels, at least two channels corresponding to at least two audio signals; and calculating between at least two audio signals Correlation, grouping at least two audio signals according to correlation, thereby obtaining grouping information of a group in which channels in at least two channels are located; performing time domain to frequency domain transformation on at least two audio signals, thereby obtaining at least Two sets of frequency domain coefficients, at least two sets of frequency domain coefficients are corresponding to at least two audio signals one by one; acquiring a measurement matrix, and sampling at least two sets of frequency domain coefficients according to the measurement matrix, thereby obtaining compressed data corresponding to at least two audio signals .
  • the reconstructing device may group the compressed data of the at least two audio signals according to the packet information carried by the compressed data, so as to facilitate the compressed data of the channel with high correlation. Perform joint reconstruction and parallel reconstruction between groups to obtain audio signals of at least two channels to improve the accuracy and speed of signal reconstruction.
  • the method before acquiring the measurement matrix, further comprises: determining speech tone tag information of the at least two audio signals, the speech tone tag information is used to indicate that the at least two audio signals are speech signals or musical tone signals;
  • the tone tag information determines the frame length of at least two audio signals. Therefore, for a signal with a relatively slow time-varying characteristic of the tone signal, on the one hand, the signal frame length can be increased to improve the accuracy of the signal reconstruction. On the other hand, for the same length of the signal, the increase of the signal frame length is reduced. The number of signal frames also further reduces the runtime of the signal processing algorithm.
  • acquiring the measurement matrix includes: obtaining a measurement matrix corresponding to the frame length according to the frame length. That is, for the speech signal and the tone signal, corresponding measurement matrices can be generated according to different frame lengths. For example, for a tone signal, a tone structured measurement matrix can be generated, and for a voice signal, a structured signal measurement matrix can be generated.
  • calculating a correlation between at least two audio signals, and grouping the at least two audio signals according to the correlation includes: acquiring a first audio signal of the at least two audio signals, acquiring the first The first m audio signals of the remaining audio signals other than the audio signal having the highest correlation with the first audio signal, and the first audio signal and the first m audio signals having the highest correlation with the first audio signal are regarded as a set of audio signals, m a positive integer greater than or equal to 1; continuing to select the second audio signal from the remaining audio signals except the first audio signal and the first m audio signals having the highest correlation with the first audio signal, and acquiring the first audio signal, The second audio signal and the first m channels having the highest correlation with the second audio signal among the remaining m audio signals having the highest correlation with the first audio signal, and correlating the second audio signal with the second audio signal
  • the top m most audio signals are used as another set of audio signals until the grouping of at least two audios is completed.
  • the calculation of the correlation between the two audio signals may be obtained by
  • the correlation between the at least two audio signals includes the distance between the at least two audio signals. That is to say, the correlation between two audio signals can be understood as the spatial correlation of the audio signals.
  • an apparatus for reconstructing an audio signal includes: an acquiring unit, configured to acquire compressed data of at least two audio signals of at least two channels, and at least two channels are in one-to-one correspondence with at least two audio signals; The unit is further configured to acquire grouping information of a group in which the channel corresponding to the at least two audio signals is located, and a grouping unit, configured to group the compressed data of the at least two audio signals according to the grouping information, thereby obtaining a compressed data group; For obtaining measurement matrices, based on compressed data and measurements within a compressed data set a matrix, jointly reconstructing a frequency domain coefficient corresponding to the compressed data in the compressed data group; and a transform unit configured to perform frequency domain to time domain transform on the frequency domain coefficient, thereby obtaining an audio signal corresponding to the compressed data in the compressed data group.
  • the obtaining unit is further configured to: acquire speech tone tag information of at least two audio signals, the speech tone tag information is used to indicate that at least two audio signals are voice signals or musical tone signals; and an acquiring unit is configured to: Obtaining a frame length corresponding to the vocal music tag information according to the vocal music tag information; extracting measurement data corresponding to at least two audio signals according to the frame length; and performing inverse quantization on the measurement data, thereby obtaining compressed data corresponding to the at least two audio signals .
  • the reconstruction unit is configured to calculate the compressed data group according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data corresponding to the other channel is configured to calculate the compressed data group according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix.
  • the reconstruction unit is configured to: according to the frequency domain coefficient corresponding to the compressed data corresponding to the i th channel in the compressed data group, the compressed data corresponding to the i+1th channel in the compressed data group, and the measurement matrix, Calculating frequency domain coefficients corresponding to the compressed data corresponding to the i+1th channel until the frequency domain coefficients corresponding to the compressed data corresponding to the kth channel in the compressed data group are calculated, i is a positive integer less than k, and k is compressed data The total number of channels in the group.
  • the reconstruction unit is further configured to: according to the frequency domain coefficient corresponding to the compressed data corresponding to the jth channel in the compressed data group, the compressed data corresponding to the j-1th channel in the compressed data group, and the measurement matrix. Calculating a frequency domain coefficient corresponding to the compressed data corresponding to the j-1th channel until the frequency domain coefficient corresponding to the compressed data corresponding to the first channel in the compressed data group is calculated, where j is less than or equal to k and greater than 1. A positive integer.
  • the reconstruction unit is further configured to: calculate a frequency domain coefficient corresponding to the compressed data corresponding to the second channel according to the preset initial frequency domain coefficient, the compressed data corresponding to the first channel, and the measurement matrix.
  • the reconstruction unit is configured to: determine, according to a frequency domain coefficient corresponding to the compressed data corresponding to one channel, an a priori frequency domain coefficient corresponding to the compressed data corresponding to the other channel; and use the a priori frequency domain coefficient as another The a priori of the frequency domain coefficient corresponding to the compressed data corresponding to one channel, and calculating the frequency domain coefficient corresponding to the compressed data corresponding to another channel in the compressed data group according to the compressed data and the measurement matrix corresponding to the other channel.
  • a compression sampling device for an audio signal, comprising: an acquiring unit, configured to acquire at least two audio signals of at least two channels, at least two channels corresponding to at least two audio signals in one-to-one; For calculating a correlation between at least two audio signals, grouping at least two audio signals according to correlation, thereby obtaining grouping information of a group in which channels in at least two channels are located; and a transforming unit for pairing at least two The audio signals are transformed from the time domain to the frequency domain, thereby obtaining at least two sets of frequency domain coefficients, at least two sets of frequency domain coefficients are corresponding to at least two audio signals; the acquiring unit is further configured to acquire a measurement matrix; the sampling unit, And configured to sample at least two sets of frequency domain coefficients according to the measurement matrix, thereby obtaining compressed data corresponding to at least two audio signals.
  • the method further includes: determining unit, configured to: determine speech tone tag information of the at least two audio signals, the speech tone tag information is used to indicate that the at least two audio signals are voice signals or musical tone signals;
  • the tag information determines the frame length of at least two audio signals.
  • the obtaining unit is configured to: obtain a measurement matrix corresponding to the frame length according to the frame length.
  • the grouping unit is configured to: acquire the first audio signal of the at least two audio signals, and acquire the first m audios having the highest correlation with the first audio signal among the remaining audio signals except the first audio signal. And a first audio signal and a first m audio signal having the highest correlation with the first audio signal as a set of audio signals, m being a positive integer greater than or equal to 1; removing the first audio signal and the first audio
  • the second audio signal is continuously selected from the remaining m audio signals except the first m audio signals having the highest signal correlation, and the first audio signal, the second audio signal, and the first m audio signals having the highest correlation with the first audio signal are acquired.
  • the first m channels in the channel that are most correlated with the second audio signal, and the first audio signal and the first m audio signals having the highest correlation with the second audio signal are used as another set of audio signals until at least two audio The grouping is complete.
  • the correlation between the at least two audio signals includes the distance between the at least two audio signals.
  • the compressed sampling device of the audio signal groups the at least two audio signals according to the correlation between the at least two audio signals, thereby obtaining the group of the channels in the at least two channels. Grouping information, and then performing time domain to frequency domain transformation on at least two audio signals, thereby obtaining at least two sets of frequency domain coefficients, at least two sets of frequency domain coefficients corresponding to at least two audio signals one by one, acquiring a measurement matrix, according to the measurement The matrix samples at least two sets of frequency domain coefficients to obtain compressed data corresponding to at least two audio signals, so that at least two compressed audio data corresponding to at least two channels can be carried to the reconstruction device of the audio signal.
  • the grouping information of the group in which the two audio signals correspond to the channel so that the reconstructing device can group the compressed data of the at least two audio signals according to the grouping information, thereby obtaining a compressed data group, thereby compressing data and measuring according to the compressed data group.
  • the compressed sampling device groups the at least two audio signals according to the correlation between the at least two audio signals when grouping the audio signals, that is, the highly correlated audio signals are grouped into one group
  • the reconstruction device can perform joint reconstruction according to the strongly correlated compressed data in the group when reconstructing the audio signal, which can improve the accuracy of the audio signal reconstruction, and multiple groups can be jointly reconstructed in parallel, thereby improving the speed of joint reconstruction.
  • FIG. 1 is a schematic diagram of a remote conference call system according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a method for compressing and sampling an audio signal according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart diagram of a method for reconstructing an audio signal according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart of a method for compressing and reconstructing an audio signal according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
  • audio signal compression sampling and reconstruction can be applied to various application scenarios, such as a remote teleconferencing system.
  • the system can include a computing device of a microphone array and a remote terminal, including a microphone array.
  • the computing device can compress and sample the audio signal corresponding to the speaker, and transmit the data to the remote terminal by wire or wirelessly, and the remote terminal can reconstruct the received data to obtain the original audio signal, so that the user on the microphone array side A conference call is made in real time with the user on the remote terminal side.
  • the microphone array may be a group of microphones arranged at a certain distance, and the microphone array can be better pointed than the single microphone by the interaction of the acoustic waves to the microhour difference between each microphone in the array.
  • the computing device can include at least two microphones, a sound source processing module, and an audio data output module.
  • the sound source processing module is configured to compress and sample the audio collected by the microphone
  • the audio data output module is configured to quantize the compressed sampled data and transmit the data to the remote terminal.
  • the remote terminal that communicates with the computing device can be a personal computer (PC), a smart phone, a multimedia terminal, or the like.
  • the present invention proposes a compression sampling method for audio signals, which groups multiple audio signals by correlation between at least two audio signals of at least two channels. And obtaining grouping information of the group of the channels in the at least two channels, so as to jointly reconstruct the audio signals with high correlation in the group when the remote terminal performs signal reconstruction, thereby improving the accuracy of signal reconstruction, and accordingly,
  • the embodiment of the invention further provides a method for reconstructing an audio signal.
  • the terminal may group the compressed data of the at least two audio signals according to the group information, so as to The compressed data in each group is jointly reconstructed. Due to the high correlation between the audio signals in the group, the reconstruction accuracy of the signal can be effectively improved, and multiple packet channels are reconstructed in parallel, which can improve the signal reconstruction speed.
  • An embodiment of the present invention provides a compression sampling method for an audio signal, as shown in FIG. 2, including:
  • the computing device acquires at least two audio signals of at least two channels, where at least two channels are in one-to-one correspondence with at least two audio signals.
  • the microphone array in the computing device can acquire at least two audio signals of at least two channels when the person speaks, and the channels are in one-to-one correspondence with the audio signals.
  • the computing device calculates a correlation between the at least two audio signals, and groups the at least two audio signals according to the correlation, thereby obtaining grouping information of a group in which the channels in the at least two channels are located.
  • the present invention can obtain the correlation between the channels by taking one frame of data from each channel of the audio signal, and then can group the audio signals with high correlation strength into one group, which can be understood as dividing the multi-microphone array into Multiple sub-arrays for joint reconstruction in the sub-array during joint reconstruction can improve the accuracy of parallel acceleration between sub-arrays and joint reconstruction of audio signals between sub-arrays, and simultaneously reconstruct multiple sub-arrays to improve joint reconstruction. speed.
  • the grouping information can distinguish different groups with different identifiers, that is, each channel corresponds to the group identifier of the group to which it belongs.
  • the computing device performs time domain to frequency domain transform on the at least two audio signals, to obtain at least two sets of frequency domain coefficients, where at least two sets of frequency domain coefficients are in one-to-one correspondence with at least two audio signals.
  • the computing device can perform one-frame transform to the frequency domain transform on one frame of the audio signal corresponding to each channel, and obtain a frequency domain coefficient corresponding to one frame of data, so that at least two audio signals respectively correspond to one frame of data from time to time.
  • the domain is transformed into the frequency domain to obtain at least two sets of frequency domain coefficients.
  • the time domain can intuitively observe the shape of the signal, but the signal cannot be accurately described with limited parameters, and the frequency domain analysis can decompose the complex signal into a simple signal superposition, which can more accurately understand the signal.
  • "structure" Specifically, the modified Discrete Cosine Transform (MDCT) algorithm may be used to transform one frame of the audio signal from the time domain to the frequency domain, and other algorithms may be used, which is not limited in this application.
  • MDCT modified Discrete Cosine Transform
  • the computing device acquires a measurement matrix, and samples at least two sets of frequency domain coefficients according to the measurement matrix, so as to obtain compressed data corresponding to at least two audio signals.
  • Compressed data can be understood as compressed sampled data.
  • the computing device can determine the number of columns of the measurement matrix to be generated according to the preset frame length, for example, the frame length is 4096, and the number of columns of the measurement matrix to be generated is 4096, and the measurement matrix to be generated can be obtained according to the preset compression ratio.
  • the number of lines for example, the preset compression ratio is 1/3
  • the number of rows of the measurement matrix to be generated is obtained by multiplying the number of columns by 4096 times 1/3, and then according to the type and location of the preset measurement matrix.
  • the number of rows and columns obtained is used to generate a measurement matrix.
  • the computing device can perform compression sampling on the measurement matrix by multiplying at least two sets of frequency domain coefficients to obtain compressed data of at least two audio signals.
  • the compressed data corresponding to the at least two audio signals obtained after sampling may be quantized to obtain a quantized value.
  • the quantization is to approximate the amplitude value of the original continuous change with a finite amplitude value, and change the continuous amplitude of the analog signal into a finite number of discrete values with a certain interval, so that the quantized value can be encoded to obtain a signal for transmission. , transmitted to the remote terminal.
  • the compressed sampling method of the audio signal provided by the embodiment of the present invention can group at least two audio signals according to the correlation between the audio signals during compression sampling, and obtain group information of the group in which the channel in at least two channels is located.
  • the audio signal can be implemented in the reconstruction device to perform parallel reconstruction between groups according to the grouping information. Since the audio signals in the group are highly correlated, the audio signals in the group can be jointly reconstructed, thereby improving the speed and accuracy of the audio signal reconstruction.
  • the embodiment of the present invention provides a method for reconstructing an audio signal. As shown in FIG. 3, after the foregoing step 204, the method further includes:
  • the terminal acquires compressed data corresponding to at least two audio signals of at least two channels, and at least two communications.
  • the track corresponds to at least two audio signals one by one.
  • the terminal may receive the compressed data corresponding to the at least two audio signals of the at least two channels sent by the computing device, at least two channels and at least two The audio signals correspond one-to-one.
  • the compressed data can be understood as compressed and sampled data.
  • the terminal receives the data of the audio signal, the terminal needs to inversely quantize the measured data of the audio signal to obtain compressed and sampled data of at least two audio signals, that is, compressed data.
  • the terminal acquires group information of a group in which the channel corresponding to the at least two audio signals is located.
  • At least two channels of at least the audio signal may carry parameters for reconstructing the audio signal, and the parameters may include a signal frame length of the audio signal, a measurement matrix, a sparse basis, and grouping information of the channel.
  • the signal frame length is the frame length corresponding to one frame of data of each channel
  • the measurement matrix is a matrix for compressing samples generated during the compression sampling process of the audio signal
  • the sparse basis is from the time domain to the frequency domain in the process of compression sampling.
  • the algorithm used for performing the sparse transform, the grouping information of the channel is the grouping of the audio signals determined according to the correlation between the audio signals in the compressed sampling process, and the grouping information may include the group of the channel corresponding to the at least two audio signals.
  • the terminal groups the compressed data of the at least two audio signals according to the grouping information, to obtain a compressed data group.
  • the terminal may group the compressed data of the at least two audio signals according to the identifier of the group of the channels corresponding to the at least two audio signals, that is, group the audio signals with the same identification of the group into one group.
  • the terminal acquires a measurement matrix, and jointly reconstructs a frequency domain coefficient corresponding to the compressed data in the compressed data group according to the compressed data and the measurement matrix in the compressed data group.
  • the terminal may calculate the frequency domain corresponding to the compressed data corresponding to another channel in the compressed data group according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix. coefficient.
  • the calculation method may adopt an approximate information transfer AMP algorithm, or may adopt other algorithms. This application is not limited.
  • joint reconstruction may be performed in parallel between compressed data sets, that is, intra-group joint reconstruction is used, and parallel processing between different groups is performed. Strategy.
  • the terminal performs frequency domain to time domain transformation on the frequency domain coefficients, thereby obtaining an audio signal corresponding to the compressed data in the compressed data group.
  • the terminal may perform frequency domain to time domain transformation on the frequency domain coefficients of the compressed data.
  • the transform from the time domain to the frequency domain adopts the MDCT algorithm, and the sparse basis in the parameters obtained by the terminal is the indication information of the MDCT algorithm, and then the terminal performs frequency domain time on the frequency domain coefficients of the data.
  • the inverse MDCT algorithm is used to obtain the audio signal collected by the computing device.
  • the terminal may group the compressed data of the at least two audio signals according to the grouping information, so as to perform parallel reconstruction between groups according to the measurement matrix and the compressed data in the compressed data group.
  • the parallel reconstruction between groups, and the compressed data in the compressed data group has strong correlation, the speed and accuracy of signal reconstruction can be improved.
  • the computing device acquires at least two audio signals of at least two channels, at least two channels and at least two The audio signals correspond one-to-one.
  • the microphone array in the computing device can acquire at least two audio signals of at least two channels when the person speaks, and the channels are in one-to-one correspondence with the audio signals.
  • the computing device determines speech tone tag information of the at least two audio signals, and the speech tone tag information is used to indicate that the at least two audio signals are voice signals or musical tone signals.
  • the tone signals include signals from wind instruments, string instruments, and percussion instruments.
  • the computing device may select one channel from 32 channels, and take a frame of audio signal from the channel with a frame length of 4096, and detect whether the audio component of the frame includes a voice component, and if so, determine the language.
  • the tone tag information indicates that the at least two audio signals are voice signals, and if not, it is determined that the tone music tag information indicates that at least two of the audio signals are tone signals.
  • the computing device determines a frame length of the at least two audio signals according to the phonetic tone tag information.
  • the frame length of the tone signal can be preset to be long, and the frame length of the voice signal is short. Therefore, for a signal with a relatively slow time-varying characteristic of the tone signal, on the one hand, the signal frame length can be increased to improve the accuracy of the signal reconstruction. On the other hand, for the same length of the signal, the increase of the signal frame length is reduced. The number of signal frames also further reduces the runtime of the signal processing algorithm.
  • the tone tag information indicates that at least two audio signals are tone signals
  • the frame length of the audio signal is 4096
  • the language is determined
  • the tone tag information indicates that the at least two audio signals are voice signals, and then the frame length of the audio signal is determined to be 1024.
  • the computing device calculates a correlation between the at least two audio signals, and groups the at least two audio signals according to the correlation, thereby obtaining grouping information of the group in which the channels in the at least two channels are located.
  • the computing device determines that the frame length is 4096, then one frame of the audio signal with a frame length of 4096 is taken from each of the 32 channels, and the correlation between the at least two audio signals is calculated according to the taken audio signal of each frame.
  • the computing device acquires the first audio signal of the at least two audio signals, acquires the first m audio signals having the highest correlation with the first audio signal among the remaining audio signals except the first audio signal, and the first audio And a signal and a first m audio signals having the highest correlation with the first audio signal as a set of audio signals, m being a positive integer greater than or equal to 1; from the first m and the first audio signal having the highest correlation with the first audio signal
  • the second audio signal is continuously selected from the remaining audio signals except for the first audio signal, and the second audio signal is obtained in addition to the first audio signal, the second audio signal, and the first m audio signals having the highest correlation with the first audio signal.
  • the top m channels with the highest correlation, and the first audio signal and the first m audio signals having the highest correlation with the first audio signal are used as another set of audio signals until the grouping of at least two audios is completed.
  • the correlation between the at least two audio signals comprises a distance between the at least two audio signals, ie the audio signals are spatially correlated, the distance may be an Euclidean distance, and thus the correlation strength of the two audio signals is calculated When you use the Euclidean distance formula, you can calculate it.
  • the two audio signals are the first audio signal and the second audio signal
  • one frame of data of the first audio signal x (X 1 , X 2 , . . . , X n )
  • the correlation between the first audio signal and the second audio signal is expressed as: R(x, y) represents the correlation between the first audio signal and the second audio signal
  • x 1 , x 2 , ... x n represent the audio intensity corresponding to each time point in one frame of data corresponding to the first audio signal
  • y 1 , y 2 , . . . y n represents the audio intensity corresponding to each time point in one frame of data corresponding to the second audio signal
  • n represents the frame length.
  • the microphone array has 32 microphones, corresponding to 32 channels, firstly taking a frame of an audio signal of the first channel, and assuming that the audio signal is a tone signal, the frame length of one frame of the audio signal is 4096.
  • the audio signal of the first 3 channels with the strongest intensity is divided into the first channel and the first 3 channels, and then the audio signal of one channel is selected from the remaining 28 channels of audio signals, and this is calculated.
  • the audio signals of the channel and the audio signals of the remaining 27 channels are the strongest of the first 3 channels of audio signals, and so on.
  • Each of the 32 channels is divided into 4 groups and divided into 8 groups.
  • the correlation of the audio signals is the degree of correlation between the audio signals in the spatial domain
  • the grouping of the channels is determined to be unchanged.
  • the channel may be labeled with a group identification to obtain grouping information for the group in which the channels in the at least two channels are located.
  • the computing device performs time domain to frequency domain transformation on the at least two audio signals, to obtain at least two sets of frequency domain coefficients, where at least two sets of frequency domain coefficients are in one-to-one correspondence with at least two audio signals.
  • the computing device may add a window function to the audio signal corresponding to each channel, that is, adding the Hann window, because the audio signal needs to be transmitted from the time domain to the frequency domain and then transmitted to the remote terminal during transmission.
  • a window function to the audio signal corresponding to each channel, that is, adding the Hann window.
  • the windowing function is to window the audio signal of one frame of each channel.
  • a framed windowed audio signal of each channel is subjected to a sparse transform from the time domain to the frequency domain to obtain a sparse transform coefficient vector corresponding to one frame of the audio signal, that is, a frequency domain coefficient.
  • Each frame of the audio signal corresponds to a set of frequency domain coefficients, that is, at least two frequency domain coefficients are in one-to-one correspondence with at least two audio signals. This can be obtained by using the MDCT algorithm.
  • the frequency domain coefficient is other than the MDCT coefficient, and other algorithms, such as Discrete Wavelet Transform (DWT), etc., may be used, which is not limited in this application.
  • DWT Discrete Wavelet Transform
  • the computing device acquires a measurement matrix, and samples at least two sets of frequency domain coefficients according to the measurement matrix, so as to obtain compressed data corresponding to at least two audio signals.
  • the computing device can determine the column of the measurement matrix to be generated according to the frame length and the compression ratio.
  • the audio signal is a tone signal
  • the frame length is 4096
  • the number of columns of the measurement matrix to be generated is 4096.
  • the compression ratio is preset to 1/3
  • the number of rows can be determined according to the compression ratio and the number of columns of the determined measurement matrix, and the number of rows is 4096*(1/3) rounded, that is, 1365.
  • the measurement matrix is generated according to the number of rows, the number of columns, and the type of the preset measurement matrix.
  • the measurement matrix adopts a structured measurement matrix.
  • the type of the structured measurement matrix may be any one of a partial Fourier matrix, a partial discrete cosine transform DTC matrix, or a partial Bernoulli random matrix.
  • measurement matrix type is an example partial Fourier matrix
  • measurement matrix may be generated as follows: First, the 4096 ⁇ 4096 matrix I do Fourier transform, i.e., each column of the unit matrix I do 4096 ⁇ 4096 The Fourier transform obtains a Fourier matrix ⁇ 4096 ⁇ 4096 , and then randomly extracts 1365 rows of the Fourier matrix ⁇ 4096 ⁇ 4096 to obtain a partial Fourier matrix ⁇ 1365 ⁇ 4096 .
  • the frequency domain coefficient is multiplied by the measurement matrix to obtain a compressed sampled value of the frame audio signal, and then the sampled value is quantized and transmitted.
  • the signal to the remote terminal is transmitted to the remote terminal.
  • the terminal receives a parameter of the reconstructed audio signal sent by the computing device, where the parameter includes a measurement matrix, a sparse basis, a grouping information of the channel, a vocal music label information of the audio signal, and a signal frame length corresponding to the vocal music label information, and the vocal music label information is used. At least two audio signals are indicated as a voice signal or a tone signal.
  • the terminal When receiving the signal sent by the computing device, the terminal carries the parameter of the audio signal, and is used to enable the terminal to perform signal reconstruction according to the parameter.
  • the measurement matrix is used when the terminal performs the joint reconstruction algorithm between the channels; the sparse base indicates the time-domain to the frequency domain sparse transform algorithm of the computing device, for example, the MDCT algorithm, the DWT algorithm, etc., and the terminal can determine the terminal according to the sparse basis.
  • the algorithm of the inverse transform from the frequency domain to the time domain may correspondingly be an inverse MDCT algorithm, an inverse DWT algorithm, etc.; the grouping information of the channel may include a group identifier corresponding to each channel; the tone music label information of the audio signal may indicate the audio signal
  • the type may include a voice signal and a tone signal, and the parameter further includes a signal frame length corresponding to the type of the audio signal, for example, a signal frame length of the voice signal is 1024, and a signal frame length of the tone signal is 4096.
  • the terminal acquires compressed data corresponding to at least two audio signals of at least two channels.
  • the terminal acquires a frame length corresponding to the tone music tag information according to the phonetic tone tag information. If it is determined to be a voice signal, the terminal determines that the frame length is the frame length corresponding to the voice signal; if it is determined to be a tone signal, the terminal determines that the frame length is the frame length corresponding to the tone signal. For example, when the terminal determines that the audio signal is a voice signal, the corresponding frame length is 1024 corresponding to the voice signal, and when the terminal determines that the audio signal is a tone signal, the corresponding frame length is 4096 corresponding to the tone signal.
  • measurement data corresponding to at least two audio signals is extracted.
  • the determined frame length is 4096
  • the terminal takes one frame of measurement data with a length of 4096 in each of the signals received by each channel to obtain at least two audios corresponding to at least two channels.
  • the measurement data corresponding to the signal, the at least two channels are corresponding to the at least two audio signals, and the measurement data corresponding to each channel is performed.
  • Inverse quantization compressed data of at least two audio signals, that is, compressed sampled data of at least two audio signals, is obtained.
  • the terminal acquires group information of a group in which the channel corresponding to the at least two audio signals is located.
  • the terminal may acquire, according to the group information carried in the parameter, group information of a group in which the channel corresponding to the at least two audio signals is located, and the group information may indicate a group identifier corresponding to each channel. For example, there are 32 channels, which are divided into 8 groups of 4 channels each. The grouping information indicates that each 4 channels carry the same group identifier.
  • the terminal groups the compressed data of the at least two audio signals according to the grouping information, thereby obtaining a compressed data group.
  • the terminal may group the compressed data of the at least two audio signals corresponding to the at least two channels according to the group identifier corresponding to each channel, thereby obtaining the compressed data group.
  • 32 channels correspond to compressed data of 32 audio signals, and according to a group of identifiers corresponding to each channel, a total of 4 group identifiers are used, and the compressed data of the audio signals of the channels of the 8 same group identifiers are grouped into one group, and 4 is obtained. Compressed data sets.
  • the terminal acquires a measurement matrix, and jointly reconstructs a frequency domain coefficient corresponding to the compressed data in the compressed data group according to the compressed data and the measurement matrix in the compressed data group.
  • the terminal obtains the measurement matrix from the received parameters, and calculates another frequency domain in the compressed data group according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel in the compressed data group, the compressed data corresponding to another channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data corresponding to the channel is the frequency domain coefficient corresponding to the compressed data corresponding to the channel.
  • the terminal may calculate the corresponding channel of the i+1 channel according to the frequency domain coefficient corresponding to the compressed data corresponding to the i th channel in the compressed data group, the compressed data corresponding to the i+1th channel in the compressed data group, and the measurement matrix.
  • the frequency domain coefficient corresponding to the compressed data is calculated until the frequency domain coefficient corresponding to the compressed data corresponding to the kth channel in the compressed data group is calculated, i is a positive integer less than k, and k is the total number of channels in the compressed data group.
  • the calculation method can adopt the AMP algorithm.
  • the terminal calculates the frequency domain coefficient corresponding to the compressed data corresponding to the kth channel from the first channel in each compressed data group to the kth channel, it can be called once before.
  • the compressed data corresponding to the j-1th channel can be calculated according to the frequency domain coefficient corresponding to the compressed data corresponding to the jth channel in the compressed data group, the compressed data corresponding to the j-1th channel in the compressed data group, and the measurement matrix. Corresponding frequency domain coefficients are calculated until the frequency domain coefficients corresponding to the compressed data corresponding to the first channel in the compressed data group are calculated, and j is a positive integer greater than or equal to k and greater than 1.
  • a backward AMP algorithm iterative process when calculating from the kth channel in each compressed data group to the first channel to obtain the frequency domain coefficient corresponding to the compressed data corresponding to the first channel.
  • the specific algorithm may be: determining the a priori frequency domain coefficient corresponding to the compressed data corresponding to another channel according to the frequency domain coefficient corresponding to the compressed data corresponding to one channel; The domain coefficient is used as a priori of the frequency domain coefficient corresponding to the compressed data corresponding to the other channel, and the frequency domain coefficient corresponding to the compressed data corresponding to another channel in the compressed data group is calculated according to the compressed data and the measurement matrix corresponding to the other channel.
  • the posterior edge probability of the frequency domain coefficient corresponding to the compressed data corresponding to the channel can be obtained at the same time, and the compressed data corresponding to the previous channel is corresponding.
  • the frequency domain coefficient is used as the a priori frequency domain coefficient corresponding to the compressed data corresponding to the next channel, that is, the a priori frequency domain coefficient is corresponding to the compressed data corresponding to the next channel.
  • the a priori of the frequency domain coefficient if the frequency domain coefficient corresponding to the compressed data corresponding to the next channel in the compressed data group is calculated according to the compressed data and the measurement matrix corresponding to the next channel, the compressed data corresponding to the next channel is also obtained.
  • the posterior edge probability of the corresponding frequency domain coefficient if the iterative process of the forward AMP algorithm and the backward AMP algorithm iterative process, the posterior edge probability of the frequency domain coefficient corresponding to the compressed data corresponding to any channel reaches the preset The value determines that the frequency domain coefficient corresponding to the compressed data corresponding to the current channel is the most accurate, and the frequency domain coefficient corresponding to the compressed data corresponding to the channel is used as the frequency domain coefficient corresponding to the compressed data in the compressed data group corresponding to the channel.
  • the present invention uses a structured measurement matrix, such as a partial Fourier matrix, a partial DCT matrix, and a partial Bernoulli random matrix, in the audio signal compression sampling process.
  • a structured measurement matrix such as a partial Fourier matrix, a partial DCT matrix, and a partial Bernoulli random matrix
  • the time complexity of matrix multiplication is ab (a, b are the rows and columns of the matrix, respectively), and the structured measurement matrix can be used to ensure the reconstruction accuracy.
  • Significantly reducing the complexity of the algorithm can reduce the time complexity of matrix multiplication to nlog(n).
  • the multiplication time of the unstructured matrix is nonlinearly related to the signal frame length, but the square relationship, the selection of the signal frame length is limited, and the longer the frame length, the greater the time complexity, and the time for the tone signal.
  • a signal with a relatively flat characteristic change cannot increase the signal reconstruction accuracy by increasing the signal frame length.
  • the application of the structured measurement matrix can make the selection of the signal frame length more flexible, thereby improving the reconstruction accuracy while reducing the computation time.
  • the terminal performs frequency domain to time domain transform on the frequency domain coefficients, so as to obtain an audio signal corresponding to the compressed data in the compressed data group.
  • the algorithm for determining the inverse transform from the frequency domain to the time domain according to the sparse basis is the MDCT algorithm
  • the inverse transform uses the inverse MDCT algorithm. That is, the inverse MDCT algorithm is used to inverse transform the frequency domain coefficients of the compressed data of each compressed data group, and the inverse transformed signal is the time domain signal corresponding to the compressed data in the compressed data group after reconstruction, that is, the audio signal.
  • the at least two audio signals after receiving the compressed data of the at least two audio signals of the at least two channels, the at least two audio signals may be according to the grouping information of the group of the channels corresponding to the at least two audio signals.
  • the compressed data is grouped, so that the compressed data in the obtained compressed data group can be jointly reconstructed, which can improve the accuracy of joint reconstruction in the group, and joint reconstruction between groups can improve the speed of joint reconstruction. .
  • each network element such as a computing device, a terminal, etc.
  • each network element includes hardware structures and/or software modules corresponding to the execution of the respective functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present invention may divide the function module into the computing device and the terminal according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software.
  • the form of the functional module is implemented. It should be noted that the division of the module in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 5 is a schematic diagram of a possible structure of a terminal involved in the foregoing embodiment, where the terminal includes: an obtaining unit 501, a grouping unit 502, a reconstruction unit 503, and a transformation. Unit 504.
  • the obtaining unit 501 is configured to support the terminal to perform the processes 205, 206, 208 in FIG. 3, and the processes 408, 409, 411 in FIG. 4,
  • the grouping unit 502 is configured to support the terminal to perform the process 207 in FIG. 3, the process 410 in FIG.
  • the unit 503 is configured to support the terminal to perform the process 208 in FIG. 3, the process 411 in FIG. 4, and the transform unit 504 is configured to support the terminal to perform the process 209 in FIG. 3, the process 412 in FIG. All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
  • FIG. 6 shows a possible structural diagram of the terminal involved in the above embodiment.
  • the terminal includes a processing module 602 and a communication module 603.
  • the processing module 602 is configured to perform control management on the actions of the terminal.
  • the processing module 602 is configured to support the terminal to execute the processes 205, 206, 207, 208, and 209 in FIG. 3, and the processes 408, 409, 410, and 411 in FIG. 412.
  • the communication module 603 is configured to support the terminal to perform the process 407 of FIG. 4, and/or other processes for the techniques described herein.
  • Communication module 603 is used to support communication of terminals with other network entities, such as communications with the computing devices shown in FIG. 1, FIG. 2, or 4.
  • the terminal may further include a storage module 601 for storing program codes and data of the terminal.
  • the processing module 602 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific). Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor can also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 603 can be a transceiver, a transceiver circuit, a communication interface, or the like.
  • the storage module 601 can be a memory.
  • the terminal involved in the embodiment of the present invention may be the terminal shown in FIG.
  • the terminal includes a processor 712, a transceiver 713, a memory 711, and a bus 714.
  • the transceiver 713, the processor 712, and the memory 711 are connected to each other through a bus 714.
  • the bus 714 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. Wait.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • Wait The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.
  • FIG. 8 is a schematic diagram of a possible structure of a computing device involved in the foregoing embodiment, where the computing device includes: an obtaining unit 801, a grouping unit 802, and a transforming unit 803.
  • the obtaining unit 801 is configured to support the computing device to perform the process 201, 204 in FIG. 2, the process 401 in FIG. 4, and the grouping unit 802 is configured to support the computing device to perform the process 202 in FIG. 2, the process 404 in FIG. 803 is used to support the computing device to perform 203 in FIG. 2, the process 405 in FIG. 4, and the sampling unit 804 is configured to support the computing device to execute 204 in FIG.
  • the determining unit 805 is configured to support the computing device to execute 402, 403 in FIG. All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
  • FIG. 9 shows a possible structural diagram of the computing device involved in the above embodiment.
  • the computing device includes a processing module 902 and a communication module 903.
  • the processing module 902 is configured to control and manage the actions of the computing device.
  • the processing module 902 is configured to support the computing device to perform the processes 202, 203, 204 in FIG. 2, the processes 401, 402, 403, 404, 405, 406 in FIG. 4, and the communication module 903 is used to support the calculation.
  • the device performs the process 201 of Figure 2, and/or other processes for the techniques described herein.
  • Communication module 903 is used to support communication of computing devices with other network entities, such as communications with terminals shown in FIG. 1, FIG. 3, or 4.
  • the computing device can also include a storage module 901 for storing program code and data of the computing device.
  • the processing module 902 can be a processor or a controller, such as a central processing unit CPU, a general purpose processor, a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, and a transistor. Logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor can also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 903 can be a transceiver, a transceiver circuit, a communication interface, or the like.
  • the storage module 901 can be a memory.
  • the processing module 902 is a processor
  • the communication module 903 is a transceiver
  • the storage module 901 is a memory
  • the computing device according to the embodiment of the present invention may be the computing device shown in FIG.
  • the computing device includes: an array microphone 101, a sound source processing module 102, and an audio data output module 103.
  • the array microphone, the sound source processing module, and the audio data output module are connected to each other through a bus 104; the bus 104 may be external.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the steps of a method or algorithm described in connection with the present disclosure may be implemented in a hardware, or may be implemented by a processor executing software instructions.
  • the software instructions may be composed of corresponding software modules, which may be stored in a random access memory (RAM), a flash memory, a read only memory (ROM), an erasable programmable read only memory ( Erasable Programmable ROM (EPROM), electrically erasable programmable read only memory (EEPROM), registers, hard disk, removable hard disk, compact disk read only (CD-ROM) or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC. Additionally, the ASIC can be located in a core network interface device.
  • the processor and the storage medium may also exist as discrete components in the core network interface device.
  • the functions described herein can be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored in a computer readable medium or as one or more instructions on a computer readable medium. Or code to transfer.
  • Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种音频信号的重建方法和终端,涉及通信领域,能够解决信号重建精度差和速度慢的问题。其方法为:在对至少两个音频信号进行压缩采样时,将至少两个音频信号根据至少两个音频信号间的相关性进行分组,并将分组信息传递给远程终端,远程终端可以根据分组信息将至少两个音频信号对应的压缩数据进行分组,在信号重建时采用分组间并行重建,分组内联合重建。实施例用于音频信号的压缩采样和重建。

Description

一种音频信号的重建方法和装置
本申请要求于2016年9月30日提交中国专利局、申请号为201610879165.X、发明名称为“一种音频信号的重建方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信领域,尤其涉及一种音频信号的重建方法和装置。
背景技术
在音频信号压缩采样与重建的过程中,重建算法需要同时满足精度与速度的要求,精度过差或速度过慢都无法满足实际的应用需求。信号的压缩采样是通过将原始信号与一个测量矩阵相乘实现,测量矩阵需要传递给信号重建端以实现压缩信号的恢复。与传统的音频编解码方案相似,音频信号的压缩采样也是以“帧”为单位进行的。
对于多麦克风阵列信号处理的特殊情况,多通道的海量数据对重建算法的运算速度提出了更高的要求。同时,各个通道的接收信号之间由于存在强相关性,也给提高重建精度带来更多的可能性。对于多通道压缩采样的音频数据,可以通过并行处理的方式来提高运算速度。但并行处理要求并行的单元之间相互独立,这意味着通道之间并行将导致通道之间的相关性无法得到利用,从而使得重建精度受限。反之,对多个通道进行联合重建将导致通道之间相互耦合,无法实现并行加速。
发明内容
本发明实施例提供一种音频信号的重建方法和终端,能够解决信号重建精度差和通道间无法并行加速的问题。
一方面,提供一种音频信号的重建方法,包括:获取至少两个通道的至少两个音频信号对应的压缩数据,至少两个通道与至少两个音频信号一一对应;获取至少两个音频信号对应的通道所在的组的分组信息;根据分组信息,将至少两个音频信号对应的压缩数据进行分组,从而得到压缩数据组;获取测量矩阵,根据压缩数据组内的压缩数据和测量矩阵,联合重建压缩数据组内的压缩数据对应的频域系数;对频域系数进行频域到时域的变换,从而获得压缩数据组内的压缩数据对应的音频信号。于是,对于音频信号的重建端来说,在接收到至少两个通道的至少两个音频信号对应的压缩数据后,可根据至少两个音频信号对应的通道所在组的分组信息将至少两个音频信号对应的压缩数据进行分组,这样可对得到的压缩数据组内的压缩数据进行联合重建,可提升组内联合重建的精度,各组间进行联合重建可提升联合重建的速度。
在一种可能的设计中,所述方法还包括:获取至少两个音频信号的语乐音标签信息,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号;获取至少两个通道的至少两个音频信号对应的压缩数据包括:根据语乐音标签信息,获取语乐音标签信息对应的帧长;根据帧长,提取至少两个音频信号对应的测量数据; 对测量数据进行反量化,从而获得至少两个音频信号对应的压缩数据。于是,对于乐音信号这种时变特性相对缓慢的信号,一方面,可以通过增加信号帧长来提升信号重建的精度,另一方面,对于相同长度的信号,信号帧长的增加减少了需要处理的信号帧的数量,也进一步降低了信号处理算法的运行时间。
在一种可能的设计中,根据压缩数据组内的压缩数据和测量矩阵,联合重建压缩数据组内的压缩数据对应的频域系数包括:根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。其计算方法可以为近似信息传递(Approximate Message Passing,AMP)算法,也可以为其他的算法,本申请不做限定,可以得到精度较高的压缩数据对应的频域系数。
在一种可能的设计中,根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数包括:根据压缩数据组内第i个通道对应的压缩数据对应的频域系数、压缩数据组内第i+1个通道对应的压缩数据以及测量矩阵,计算第i+1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第k个通道对应的压缩数据对应的频域系数,i为小于k的正整数,k为压缩数据组内的通道总数。也即对于同一压缩数据组内的压缩数据,可对分组内的各个通道的压缩数据进行联合重建,提升频域系数的精度。
在一种可能的设计中,方法还包括:根据压缩数据组内第j个通道对应的压缩数据对应的频域系数、压缩数据组内第j-1个通道对应的压缩数据以及测量矩阵,计算第j-1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第1个通道对应的压缩数据对应的频域系数,j为小于或者等于k,并且大于1的正整数。也就是说,对于同一压缩数据组内的压缩数据来说,可以从第i个通道开始计算直至得到第k个通道对应的压缩数据对应的频域系数,再从第k个通道开始计算直至得到第j个通道对应的压缩数据对应的频域系数,该过程可以称为一个算法迭代,直至得到的压缩数据对应的频域系数达到预设要求,即通过组内的压缩数据进行联合重建得到该分组的频域系数。
在一种可能的设计中,方法还包括:根据预设的初始化频域系数、第1个通道对应的压缩数据以及测量矩阵,计算第2个通道对应的压缩数据对应的频域系数。即压缩数据组内的第1个通道对应的频域系数可进行预设。
在一种可能的设计中,根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数包括:根据一个通道对应的压缩数据对应的频域系数,确定另一个通道对应的压缩数据对应的先验频域系数;将先验频域系数作为另一个通道对应的压缩数据对应的频域系数的先验,并根据另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。也就是说,将上一通道得到的压缩数据对应的频域系数作为下一个通道对应的压缩数据对应的先验频域系数,以先验频域系数为先验,计算下一个通道对应的压缩数据对应的频域系数,以得到该压缩数据组精度较高的频域系数。
另一方面,提供一种音频信号的压缩采样方法,包括:获取至少两个通道的至少两个音频信号,至少两个通道与至少两个音频信号一一对应;计算至少两个音频信号之间的相关性,根据相关性对至少两个音频信号进行分组,从而得到至少两个通道中的通道所在的组的分组信息;对至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,至少两组频域系数与至少两个音频信号一一对应;获取测量矩阵,根据测量矩阵对至少两组频域系数进行采样,从而获得至少两个音频信号对应的压缩数据。这样当压缩采样后的压缩数据传输至音频信号的重建装置时,重建装置可根据压缩数据携带的分组信息对至少两个音频信号的压缩数据进行分组,以便于对相关性高的通道的压缩数据进行联合重建,分组间并行重建,得到至少两个通道的音频信号,以提升信号重建的精度和速度。
在一种可能的设计中,在获取测量矩阵之前,方法还包括:确定至少两个音频信号的语乐音标签信息,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号;根据语乐音标签信息,确定至少两个音频信号的帧长。于是,对于乐音信号这种时变特性相对缓慢的信号,一方面,可以通过增加信号帧长来提升信号重建的精度,另一方面,对于相同长度的信号,信号帧长的增加减少了需要处理的信号帧的数量,也进一步降低了信号处理算法的运行时间。
在一种可能的设计中,获取测量矩阵包括:根据帧长,获得帧长对应的测量矩阵。也即,对于语音信号和乐音信号,根据不同的帧长可生成相应的测量矩阵。例如对于乐音信号,可生成乐音结构化测量矩阵,对于语音信号,可生成语音信号结构化测量矩阵。
在一种可能的设计中,计算至少两个音频信号之间的相关性,根据相关性对至少两个音频信号进行分组包括:获取至少两个音频信号中的第一音频信号,获取除第一音频信号外其余音频信号中与第一音频信号相关性最高的前m个音频信号,并将第一音频信号和与第一音频信号相关性最高的前m个音频信号作为一组音频信号,m为大于或等于1的正整数;从除第一音频信号和与第一音频信号相关性最高的前m个音频信号外其余音频信号中继续选取第二音频信号并获取除第一音频信号、第二音频信号和与第一音频信号相关性最高的前m个音频信号外其余通道中与第二音频信号相关性最高的前m个通道,并将第二音频信号和与第而二音频信号相关性最高的前m个音频信号作为另一组音频信号,直至至少两个音频的分组完成。其中,计算两个音频信号之间的相关性可以通过欧氏距离算法获取,也可以通过其他的方式获取,本申请不做限定。
在一种可能的设计中,至少两个音频信号之间的相关性包括至少两个音频信号之间的距离。也就是说,两个音频信号之间的相关性可以理解为音频信号在空间上的相关性。
再一方面,提供一种音频信号的重建装置,包括:获取单元,用于获取至少两个通道的至少两个音频信号的压缩数据,至少两个通道与至少两个音频信号一一对应;获取单元,还用于获取至少两个音频信号对应的通道所在的组的分组信息;分组单元,用于根据分组信息,将至少两个音频信号的压缩数据进行分组,从而得到压缩数据组;重建单元,用于获取测量矩阵,根据压缩数据组内的压缩数据和测量 矩阵,联合重建压缩数据组内的压缩数据对应的频域系数;变换单元,用于对频域系数进行频域到时域的变换,从而获得压缩数据组内的压缩数据对应的音频信号。
在一种可能的设计中,获取单元还用于:获取至少两个音频信号的语乐音标签信息,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号;获取单元,用于:根据语乐音标签信息,获取语乐音标签信息对应的帧长;根据帧长,提取至少两个音频信号对应的测量数据;对测量数据进行反量化,从而获得至少两个音频信号对应的压缩数据。
在一种可能的设计中,重建单元用于:根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。
在一种可能的设计中,重建单元用于:根据压缩数据组内第i个通道对应的压缩数据对应的频域系数、压缩数据组内第i+1个通道对应的压缩数据以及测量矩阵,计算第i+1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第k个通道对应的压缩数据对应的频域系数,i为小于k的正整数,k为压缩数据组内的通道总数。
在一种可能的设计中,重建单元还用于:根据压缩数据组内第j个通道对应的压缩数据对应的频域系数、压缩数据组内第j-1个通道对应的压缩数据以及测量矩阵,计算第j-1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第1个通道对应的压缩数据对应的频域系数,j为小于或者等于k,并且大于1的正整数。
在一种可能的设计中,重建单元还用于:根据预设的初始化频域系数、第1个通道对应的压缩数据以及测量矩阵,计算第2个通道对应的压缩数据对应的频域系数。
在一种可能的设计中,重建单元用于:根据一个通道对应的压缩数据对应的频域系数,确定另一个通道对应的压缩数据对应的先验频域系数;将先验频域系数作为另一个通道对应的压缩数据对应的频域系数的先验,并根据另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。
又一方面,提供一种音频信号的压缩采样装置,包括:获取单元,用于获取至少两个通道的至少两个音频信号,至少两个通道与至少两个音频信号一一对应;分组单元,用于计算至少两个音频信号之间的相关性,根据相关性对至少两个音频信号进行分组,从而得到至少两个通道中的通道所在的组的分组信息;变换单元,用于对至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,至少两组频域系数与至少两个音频信号一一对应;获取单元,还用于获取测量矩阵;采样单元,用于根据测量矩阵对至少两组频域系数进行采样,从而获得至少两个音频信号对应的压缩数据。
在一种可能的设计中,还包括确定单元,用于:确定至少两个音频信号的语乐音标签信息,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号;根据语乐音标签信息,确定至少两个音频信号的帧长。
在一种可能的设计中,获取单元用于:根据帧长,获得帧长对应的测量矩阵。
在一种可能的设计中,分组单元用于:获取至少两个音频信号中的第一音频信号,获取除第一音频信号外其余音频信号中与第一音频信号相关性最高的前m个音频信号,并将第一音频信号和与第一音频信号相关性最高的前m个音频信号作为一组音频信号,m为大于或等于1的正整数;从除第一音频信号和与第一音频信号相关性最高的前m个音频信号外其余音频信号中继续选取第二音频信号并获取除第一音频信号、第二音频信号和与第一音频信号相关性最高的前m个音频信号外其余通道中与第二音频信号相关性最高的前m个通道,并将第而音频信号和与第而音频信号相关性最高的前m个音频信号作为另一组音频信号,直至至少两个音频的分组完成。
在一种可能的设计中,至少两个音频信号之间的相关性包括至少两个音频信号之间的距离。
由此一来,在本发明实施例中,音频信号的压缩采样装置根据至少两个音频信号之间的相关性对至少两个音频信号进行分组,从而得到至少两个通道中的通道所在组的分组信息,而后对至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,至少两组频域系数与至少两个音频信号一一对应,获取测量矩阵,根据测量矩阵对至少两组频域系数进行采样,从而获得至少两个音频信号对应的压缩数据,这样将至少两个通道的至少两个音频信号对应的压缩数据传输至音频信号的重建装置时可携带至少两个音频信号对应的通道所在的组的分组信息,以便重建装置可根据分组信息将至少两个音频信号的压缩数据进行分组,从而得到压缩数据组,从而根据压缩数据组内的压缩数据和测量矩阵,联合重建压缩数据组内的压缩数据对应的频域系数,而后对频域系数进行频域到时域的变换,从而获得压缩数据组内的压缩数据对应的音频信号,也就是说,重建装置在进行联合重建是,是对压缩数据组内的压缩数据进行联合重建,由于压缩采样装置在对音频信号进行分组时是根据至少两个音频信号之间的相关性对至少两个音频信号进行分组的,也就是说相关性强的音频信号分为一组,这样,重建装置在重建音频信号时可根据组内相关性强的压缩数据进行联合重建,可以提升音频信号重建的精度,多个分组可以并行进行联合重建,从而可提升联合重建的速度。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种远程电话会议系统的示意图;
图2为本发明实施例提供的一种音频信号的压缩采样方法的流程示意图;
图3为本发明实施例提供的一种音频信号的重建方法的流程示意图;
图4为本发明实施例提供的一种音频信号压缩和重建方法的流程示意图;
图5为本发明实施例提供的一种终端的结构示意图;
图6为本发明实施例提供的一种终端的结构示意图;
图7为本发明实施例提供的一种终端的结构示意图;
图8为本发明实施例提供的一种计算设备的结构示意图;
图9为本发明实施例提供的一种计算设备的结构示意图;
图10为本发明实施例提供的一种计算设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明实施例中,音频信号压缩采样与重建可应用于多种应用场景,例如远程电话会议系统,如图1所示,该系统可包括麦克风阵列的计算设备和远程终端,该包括麦克风阵列的计算设备可对说话人对应的音频信号进行压缩采样,并通过有线或无线的方式传输至远程终端,远程终端可以对接收到的数据进行重建,得到原始的音频信号,以便麦克风阵列侧的用户与远程终端侧的用户实时进行电话会议。
在本发明实施例中,麦克风阵列可以是按一定距离排列放置的一组麦克风,通过声波抵达阵列中每个麦克风之间的微小时差的相互作用,麦克风阵列可以得到比单个的麦克风更好地指向性。计算设备可以包括至少两个麦克风、声源处理模块和音频数据输出模块。声源处理模块用于对麦克风采集到的音频进行压缩采样,音频数据输出模块用于对压缩采样后的数据进行量化后传输至远程终端。与计算设备通信的远程终端可以为个人电脑(Personal Computer,PC)、智能手机、多媒体终端等。
本发明为了解决多通道联合重建时信号重建精度差的问题,提出了一种音频信号的压缩采样方法,通过至少两个通道的至少两个音频信号之间的相关性对多个音频信号进行分组,从而得到至少两个通道中的通道所在组的分组信息,以便于在远程终端进行信号重建时将组内相关性高的音频信号之间进行联合重建,提升信号重建的精度,相应地,本发明实施例还提供一种音频信号的重建方法,终端在接收到至少两个通道的至少两个音频信号的压缩数据时,可根据分组信息对至少两个音频信号的压缩数据进行分组,以对每个组内的压缩数据进行联合重建,由于组内音频信号间相关性高,可有效提升信号的重建精度,多个分组通道并行进行重建,可提升信号的重建速度。
本发明实施例提供一种音频信号的压缩采样方法,如图2所示,包括:
201、计算设备获取至少两个通道的至少两个音频信号,至少两个通道与至少两个音频信号一一对应。
计算设备中的麦克风阵列可在人说话时采集到至少两个通道的至少两个音频信号,通道与音频信号一一对应。
202、计算设备计算至少两个音频信号之间的相关性,根据相关性对至少两个音频信号进行分组,从而得到至少两个通道中的通道所在的组的分组信息。
例如多麦克风阵列的通道之间的相对时延不同,使得不同通道的音频信号之间的相关程度有差别,也使得所有通道的音频信号联合重建精度受到影响。因此,本发明可以从每个通道的音频信号中各取一帧数据获取通道之间的相关性,进而可将相关性强度大的音频信号分为一组,可以理解为将多麦克风阵列划分为多个子阵列,以便在联合重建时在子阵列内联合重建,可提升子阵列之间并行加速和子阵列内通道间的音频信号联合重建的精度,同时对多个子阵列进行重建,可提升联合重建的速度。
分组信息可以用不同的标识区别不同的分组,即每个通道对应自身所属的组的分组标识。
203、计算设备对至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,至少两组频域系数与至少两个音频信号一一对应。
计算设备可对各通道对应的音频信号的一帧数据进行从时域变换到频域的变换,得到一帧数据对应的频域系数,这样至少两个音频信号的分别对应的一帧数据从时域变换到频域,可得到至少两组频域系数。这是由于时域可以直观的观测到信号的形状,但是不能用有限的参数对信号进行准确的描述,而频域分析可以将复杂信号分解为简单的信号的叠加,可以更加精确的了解信号的“构造”。具体可以通过修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)算法将音频信号的一帧数据从时域变换到频域,也可以采用其它的算法,本申请不做限定。
204、计算设备获取测量矩阵,根据测量矩阵对至少两组频域系数进行采样,从而获得至少两个音频信号对应的压缩数据。
压缩数据可以理解为压缩采样后的数据。
计算设备可以根据预设的帧长确定待生成的测量矩阵的列数,例如帧长为4096,那么待生成的测量矩阵的列数为4096,根据预设的压缩率可获知待生成的测量矩阵的行数,例如预设的压缩率为1/3,则待生成的测量矩阵的行数为列数4096乘以1/3取整得到,而后,再根据预设的测量矩阵的类型和所获得的行数和列数生成测量矩阵。而后,计算设备可以将测量矩阵与至少两组频域系数相乘进行压缩采样,得到至少两个音频信号的压缩数据。
而后,可以对采样后得到的至少两个音频信号对应的压缩数据进行量化,得到量化后的值。其中量化是用有限个幅度值近似原来连续变化的幅度值,把模拟信号的连续幅度变为有限数量的有一定间隔的离散值,从而可以对量化后的值进行编码,得到用于传输的信号,传输至远程终端。
因此,本发明实施例提供的音频信号的压缩采样方法,能够在压缩采样时根据音频信号之间的相关性将至少两个音频信号进行分组,得到至少两个通道中的通道所在组的分组信息,可以使得音频信号在重建装置中根据分组信息实施组间并行重建,由于组内的音频信号相关性高,组内的音频信号可以联合重建,从而提升音频信号重建的速度与精度。
在上述音频信号压缩采样的基础上,本发明实施例提供一种音频信号的重建方法,如图3所示,在上述步骤204之后,该方法还包括:
205、终端获取至少两个通道的至少两个音频信号对应的压缩数据,至少两个通 道与至少两个音频信号一一对应。
当与终端无线或有线连接的计算设备中的麦克风阵列采集到声音时,终端可接收到计算设备发送的至少两个通道的至少两个音频信号对应的压缩数据,至少两个通道与至少两个音频信号一一对应。压缩数据可以理解为压缩采样后的数据,终端在接收到的音频信号的数据时,需要对音频信号的测量数据进行反量化,得到至少两个音频信号压缩采样后的数据,即压缩数据。
206、终端获取至少两个音频信号对应的通道所在的组的分组信息。
至少两个通道的至少音频信号中可以携带用于重建音频信号的参数,参数可以包括音频信号的信号帧长、测量矩阵、稀疏基以及通道的分组信息。
其中信号帧长即为每个通道的一帧数据对应的帧长,测量矩阵为音频信号在压缩采样过程中生成的用于压缩采样的矩阵,稀疏基是压缩采样过程中从时域到频域进行稀疏变换所使用的算法,通道的分组信息则是压缩采样过程中根据音频信号间的相关性确定的音频信号的分组情况,分组信息中可以包括至少两个音频信号对应的通道所在的组的标识。
207、终端根据分组信息,将至少两个音频信号的压缩数据进行分组,从而得到压缩数据组。
终端可以根据至少两个音频信号对应的通道所在组的标识对至少两个音频信号的压缩数据进行分组,即向组的标识相同的音频信号分为一组。
208、终端获取测量矩阵,根据压缩数据组内的压缩数据和测量矩阵,联合重建压缩数据组内的压缩数据对应的频域系数。
终端可以根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。计算方法可以采用近似信息传递AMP算法,也可以采用其它算法,本申请不做限定,同时,各压缩数据组间可以并行进行联合重建,即采用的是组内联合重建,不同组之间并行处理的策略。
209、终端对频域系数进行频域到时域的变换,从而获得压缩数据组内的压缩数据对应的音频信号。
终端在得到的每个通道的压缩数据的频域系数后,可以对该压缩数据的频域系数进行频域到时域的变换。例如压缩采样过程中,从时域到频域的变换采用的是MDCT算法,终端得到的参数中的稀疏基即为MDCT算法的指示信息,那么终端在对数据的频域系数进行频域到时域的逆变换时则采用逆MDCT算法,得到计算设备采集到的音频信号。
因此,本发明实施例提供的音频信号的重建方法中,终端可根据分组信息对至少两个音频信号的压缩数据进行分组,以便根据测量矩阵和压缩数据组内的压缩数据进行组间并行重建和组内联合重建,由于组间并行重建,且压缩数据组内的压缩数据具有强的相关性,可以提升信号重建的速度和精度。
下面对本发明的实施例进一步详细说明,本发明实施例提供一种音频信号的压缩采样和重建方法,以k=32通道的音频信号为例,如图4所示,该方法包括:
401、计算设备获取至少两个通道的至少两个音频信号,至少两个通道与至少两 个音频信号一一对应。
计算设备中的麦克风阵列可在人说话时采集到至少两个通道的至少两个音频信号,通道与音频信号一一对应。
402、计算设备确定至少两个音频信号的语乐音标签信息,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号。
乐音信号包括管乐器、弦乐器以及打击乐器等发出的信号。
本发明实施例中,计算设备可以从32个通道中任选一通道,以帧长为4096从该通道取一帧音频信号,检测该帧音频信号中是否包含语音成分,如果包含,则确定语乐音标签信息指示至少两个音频信号为语音信号,如果不包含,则确定语乐音标签信息指示至少两个音频信号为乐音信号。
403、计算设备根据语乐音标签信息确定至少两个音频信号的帧长。
由于乐音信号变化平缓,语音信号变化快,可以预设乐音信号的帧长较长,语音信号的帧长较短。于是,对于乐音信号这种时变特性相对缓慢的信号,一方面,可以通过增加信号帧长来提升信号重建的精度,另一方面,对于相同长度的信号,信号帧长的增加减少了需要处理的信号帧的数量,也进一步降低了信号处理算法的运行时间。
以乐音信号的帧长MuLen=4096,语音信号的帧长SpLen=1024为例,如果确定语乐音标签信息指示至少两个音频信号为乐音信号,则确定音频信号的帧长为4096,如果确定语乐音标签信息指示至少两个音频信号为语音信号,则确定音频信号的帧长为1024。
404、计算设备计算至少两个音频信号之间的相关性,根据相关性对至少两个音频信号进行分组,从而得到至少两个通道中的通道所在的组的分组信息。
若计算设备确定帧长为4096,则从32个通道中分别取帧长为4096的一帧音频信号,并根据所取的每帧音频信号计算至少两个音频信号之间的相关性。
示例性的,计算设备获取至少两个音频信号中的第一音频信号,获取除第一音频信号外其余音频信号中与第一音频信号相关性最高的前m个音频信号,并将第一音频信号和与第一音频信号相关性最高的前m个音频信号作为一组音频信号,m为大于或等于1的正整数;从除第一音频信号和与第一音频信号相关性最高的前m个音频信号外其余音频信号中继续选取第二音频信号并获取除第一音频信号、第二音频信号和与第一音频信号相关性最高的前m个音频信号外其余通道中与第二音频信号相关性最高的前m个通道,并将第而音频信号和与第而音频信号相关性最高的前m个音频信号作为另一组音频信号,直至至少两个音频的分组完成。
其中,至少两个音频信号之间的相关性包括至少两个音频信号之间的距离,即音频信号在空间上相关,该距离可以是欧氏距离,因此在计算两个音频信号的相关性强度时,可以利用欧式距离公式进行计算。
示例性的,若两个音频信号为第一音频信号和第二音频信号,第一音频信号的一帧数据x=(X1,X2,…,Xn),第二音频信号的一帧数据为y=(y1,y2,…,yn),则第一音频信号与第二音频信号的相关性表示为:
Figure PCTCN2017086390-appb-000001
R(x,y)表示第一音频信号和第二音频信号的相关性,x1,x2,…xn表示第一音频信号对应的一帧数据中各时间点对应的音频强度,y1,y2,…yn表示第二音频信号对应的一帧数据中各时间点对应的音频强度,n表示帧长。
当R(x,y)值越大,表示两个音频信号的相关性强度越小,反之,表示两个音频信号的相关性强度越大。
示例性的,麦克风阵列有32个麦克风,则对应32个通道,首先任取一第1个通道的一帧音频信号,假设音频信号为乐音信号,则一帧音频信号的帧长为4096,该第1个通道的一帧音频信号x=(X1,X2,…,Xn),n表示帧长,也对应采集到的n个时间点的音频强度,xn表示第1个通道的音频信号在时域上的第n个时间点的音频强度,同理,每个通道的一帧音频信号的帧长相同,当第2个通道的一帧音频信号y=(y1,y2,…,yn)时,yn表示第2个通道的音频信号在时域上的第n个时间点的音频强度,在根据欧氏距离公式获取第1个通道的音频信号与第2个通道的音频信号相关性之后,继续获取第1个通道的音频信号与第3个通道的音频信号的相关性,直至获取第1个通道的音频信号与第32个通道的的音频信号相关性,然后选取第1个通道的音频信号与其它31个通道的音频信号的相关性强度最强的前3个通道的音频信号,将第1个通道与该前3个通道分为一组,而后再从剩余的28个通道的音频信号中任选一个通道的音频信号,计算这个通道的音频信号与剩余的27个通道的音频信号中相关性强度最强的前3个通道的音频信号,以此类推,将32个通道每4个分为一组,共分为8组。
由于音频信号的相关性是音频信号间在空间域中的相关程度,因此,当取每个通道中的一帧的音频信号并将通道进行分组之后,通道的分组即确定不变。在确定通道的分组情况时,可将通道标示分组标识,以得到至少两个通道中的通道所在的组的分组信息。
405、计算设备对至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,至少两组频域系数与至少两个音频信号一一对应。
计算设备在获取到通道的分组情况后,可以对各通道对应的音频信号添加窗函数,即加Hann窗,这是由于音频信号在传输时需要从时域变换到频域后传输至远程终端,当实现工程测试信号处理时,不可能在时域上对无限长的信号进行测量和运算,而是取其有限的时间片段进行分析,然后用截取的信号时间片段进行周期延拓处理,得到虚拟的无限长的信号,然后就可以对信号进行相关分析等数学处理。但是无限长的信号被截断以后,其频谱发生了畸变,会产生频谱能量泄漏,可采用不同的截取函数对信号进行截断,截断函数称为窗函数,简称为窗。
由于计算设备是对一帧音频信号处理并传输至远程终端后再处理下一帧音频信号,因此,这里的加窗函数是对各通道的一帧音频信号进行加窗。在加窗函数之后,将每个通道的一帧加窗后的音频信号实施从时域至频域的稀疏变换,得到一帧音频信号对应的稀疏变换系数向量,即频域系数。每一帧音频信号对应一组频域系数,即至少两个频域系数与至少两个音频信号一一对应。这里除可以采用MDCT算法得到 的频域系数为MDCT系数以外,也可以采用其它的算法,比如离散小波变换(Discrete Wavelet Transform,DWT)等,本申请不做限定。
406、计算设备获取测量矩阵,根据测量矩阵对至少两组频域系数进行采样,从而获得至少两个音频信号对应的压缩数据。
计算设备可以根据帧长和压缩率确定待生成的测量矩阵的列,例如音频信号为乐音信号,帧长为4096,那么待生成的测量矩阵的列数为4096。例如压缩率预设为1/3,则可以根据压缩率和确定的测量矩阵的列数确定行数,行数为4096*(1/3)取整,即为1365。在确定测量矩阵的行数和列数后,进而根据行数、列数和预设的测量矩阵的类型生成测量矩阵。在本发明实施例中,测量矩阵采用结构化测量矩阵,例如结构化测量矩阵的类型可以为部分傅里叶矩阵、部分离散余弦变换DTC矩阵或部分伯努利随机矩阵中的任一种。
以测量矩阵的类型为部分傅里叶矩阵为例,生成测量矩阵的实现方式可以为:首先对单位矩阵I4096×4096做傅里叶变换,即,对单位矩阵I4096×4096的每一列做傅里叶变换得到傅里叶矩阵Φ4096×4096,然后随机抽取傅里叶矩阵Φ4096×4096的1365行得到部分傅里叶矩阵Φ1365×4096
对于任一通道的一帧音频信号对应的一组频域系数,将该频域系数与测量矩阵相乘,得到该帧音频信号压缩采样后的值,进而对采样后的值进行量化,得到发送给远程终端的信号,并传输至远程终端。
407、终端接收计算设备发送的重建音频信号的参数,参数包括测量矩阵、稀疏基、通道的分组信息、音频信号的语乐音标签信息以及语乐音标签信息对应的信号帧长,语乐音标签信息用于指示至少两个音频信号为语音信号或乐音信号。
终端在接收到计算设备发送的信号时,该信号中携带有音频信号的参数,用于使终端根据该参数进行信号重建。测量矩阵用于终端在通道间联合重建的算法时使用;稀疏基表示计算设备进行时域到频域的稀疏变换算法,例如可以为MDCT算法、DWT算法等,终端可以根据该稀疏基确定终端进行频域到时域的逆变换时的算法,相应地可以为逆MDCT算法、逆DWT算法等;通道的分组信息可以包括每个通道对应的组标识;音频信号的语乐音标签信息可以指示音频信号的类型,可以包括语音信号和乐音信号,该参数还包括音频信号的类型对应的信号帧长,例如语音信号的信号帧长为1024,乐音信号的信号帧长为4096。
408、终端获取至少两个通道的至少两个音频信号对应的压缩数据。
终端根据语乐音标签信息获取语乐音标签信息对应的帧长。若确定为语音信号,则终端确定帧长为语音信号对应的帧长;若确定为乐音信号,则终端确定帧长为乐音信号对应的帧长。例如当终端确定音频信号为语音信号时,其对应的帧长为语音信号对应的1024,当终端确定音频信号为乐音信号时,其对应的帧长为乐音信号对应的4096。
而后,根据帧长,提取至少两个音频信号对应的测量数据。例如多音频信号为乐音信号时,确定的帧长为4096,终端在每个通道接收到的信号中各取一帧长度为4096的测量数据,以获取到至少两个通道对应的至少两个音频信号对应的测量数据,至少两个通道与至少两个音频信号一一对应,进而对每个通道对应的测量数据进行 反量化,得到至少两个音频信号的压缩数据,即至少两个音频信号的压缩采样后的数据。
409、终端获取至少两个音频信号对应的通道所在的组的分组信息。
终端可根据参数中携带的分组信息获取至少两个音频信号对应的通道所在的组的分组信息,分组信息可指示每个通道对应一个组标识。例如有32个通道,共分为8组,每组4个通道,分组信息指示每4个通道携带相同的组标识。
410、终端根据分组信息,将至少两个音频信号的压缩数据进行分组,从而得到压缩数据组。
终端可以根据每个通道对应的组标识,将至少两个通道对应的至少两个音频信号的压缩数据进行分组,从而得到压缩数据组。例如32个通道对应32个音频信号的压缩数据,根据每个通道对应一组标识,共4个组标识,将8个相同组标识下的通道的音频信号的压缩数据分为一组,得到4个压缩数据组。
411、终端获取测量矩阵,根据压缩数据组内的压缩数据和测量矩阵,联合重建压缩数据组内的压缩数据对应的频域系数。
终端从接收到的参数中获取测量矩阵,根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。
具体地,终端可以根据压缩数据组内第i个通道对应的压缩数据对应的频域系数、压缩数据组内第i+1个通道对应的压缩数据以及测量矩阵,计算第i+1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第k个通道对应的压缩数据对应的频域系数,i为小于k的正整数,k为压缩数据组内的通道总数。其计算方法可以采用AMP算法,当终端从每个压缩数据组中的第1个通道计算至第k个通道,以获取第k个通道对应的压缩数据对应的频域系数,可以称为一次前向AMP算法迭代过程。进而可以根据压缩数据组内第j个通道对应的压缩数据对应的频域系数、压缩数据组内第j-1个通道对应的压缩数据以及测量矩阵,计算第j-1个通道对应的压缩数据对应的频域系数,直至计算得到压缩数据组内第1个通道对应的压缩数据对应的频域系数,j为小于或者等于k,并且大于1的正整数。这样当从每个压缩数据组中的第k个通道计算至第1个通道,以获取第1个通道对应的压缩数据对应的频域系数,可以称为一次后向AMP算法迭代过程。
具体地,当实现根据压缩数据组内一个通道对应的压缩数据对应的频域系数、压缩数据组内另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数时,若采用AMP算法,其具体算法可以为:根据一个通道对应的压缩数据对应的频域系数,确定另一个通道对应的压缩数据对应的先验频域系数;将先验频域系数作为另一个通道对应的压缩数据对应的频域系数的先验,并根据另一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内另一个通道对应的压缩数据对应的频域系数。可以理解为,当获取到任一通道对应的压缩数据对应的频域系数时,同时可得到该通道对应的压缩数据对应的频域系数的后验边缘概率,将上一通道对应的压缩数据对应的频域系数作为下一个通道对应的压缩数据对应的先验频域系数,即该先验频域系数为下一个通道对应的压缩数据对应 的频域系数的先验,若根据下一个通道对应的压缩数据以及测量矩阵,计算压缩数据组内下一个通道对应的压缩数据对应的频域系数时,同时也得到下一个通道对应的压缩数据对应的频域系数的后验边缘概率,若通过前向AMP算法迭代过程和后向AMP算法迭代过程中,计算至任一通道对应的压缩数据对应的频域系数的后验边缘概率达到预设值,则确定当前通道对应的压缩数据对应的频域系数最为精准,将该通道对应的压缩数据对应的频域系数作为该通道对应的压缩数据组内的压缩数据对应的频域系数。
需要说明的是,本发明在音频信号压缩采样过程中采用结构化的测量矩阵,例如部分傅里叶矩阵、部分DCT矩阵和部分伯努利随机矩阵等。对于非结构化的测量矩阵如随机高斯矩阵等,矩阵乘法的时间复杂度为ab(a,b分别为矩阵的行和列),而采用结构化的测量矩阵,可以在保证重建精度的前提下显著地降低算法复杂度,可以使得矩阵乘法的时间复杂度降为nlog(n)。此外,由于非结构化矩阵的乘法运算时间与信号帧长并非线性关系,而是平方关系,会导致信号帧长选择受到限制,帧长越长,时间复杂度越大,对于乐音信号这种时变特性相对平缓的信号,不能通过增加信号帧长来提升信号的重建精度。而本申请采用结构化的测量矩阵,可以使得信号帧长的选择更加灵活,从而在降低运算时间的同时可以提升重建精度。
412、终端对频域系数进行频域到时域的变换,从而获得压缩数据组内的压缩数据对应的音频信号。
在得到每个压缩数据组内的压缩数据对应的频域系数后,根据稀疏基确定从频域到时域的逆变换的算法,例如稀疏基为MDCT算法,那么逆变换就采用逆MDCT算法,即采用逆MDCT算法对每个压缩数据组被的压缩数据的频域系数进行逆变换,逆变换后得到的信号就为重建后压缩数据组内的压缩数据对应的时域信号,即音频信号。
因此,对于音频信号的重建终端来说,在接收到至少两个通道的至少两个音频信号的压缩数据后,可根据至少两个音频信号对应的通道所在组的分组信息将至少两个音频信号的压缩数据进行分组,这样可对得到的压缩数据组内的压缩数据进行联合重建,可提升组内联合重建的精度,各组间进行联合重建可提升联合重建的速度。。
上述主要从各个网元之间交互的角度对本发明实施例提供的方案进行了介绍。可以理解的是,各个网元,例如计算设备、终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
本发明实施例可以根据上述方法示例对计算设备、终端进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件 功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图5示出了上述实施例中所涉及的终端的一种可能的结构示意图,终端包括:获取单元501、分组单元502、重建单元503、变换单元504。获取单元501用于支持终端执行图3中的过程205,206,208,图4中的过程408,409,411,,分组单元502用于支持终端执行图3中的过程207,图4中的过程410,重建单元503用于支持终端执行图3中的过程208,图4中的过程411,变换单元504用于支持终端执行图3中的过程209,图4中的过程412。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图6示出了上述实施例中所涉及的终端的一种可能的结构示意图。终端包括:处理模块602和通信模块603。处理模块602用于对终端的动作进行控制管理,例如,处理模块602用于支持终端执行图3中的过程205、206、207、208、209,图4中的过程408、409、410、411、412,通信模块603用于支持终端执行图4中的过程407,和/或用于本文所描述的技术的其它过程。通信模块603用于支持终端与其他网络实体的通信,例如与图1、图2、或4中示出的计算设备的通信。终端还可以包括存储模块601,用于存储终端的程序代码和数据。
其中,处理模块602可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块603可以是收发器、收发电路或通信接口等。存储模块601可以是存储器。
当处理模块602为处理器,通信模块603为收发器,存储模块601为存储器时,本发明实施例所涉及的终端可以为图7所示的终端。
参阅图7所示,该终端包括:处理器712、收发器713、存储器711以及总线714。其中,收发器713、处理器712以及存储器711通过总线714相互连接;总线714可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在采用对应各个功能划分各个功能模块的情况下,图8示出了上述实施例中所涉及的计算设备的一种可能的结构示意图,计算设备包括:获取单元801、分组单元802、变换单元803、采样单元804、确定单元805。获取单元801用于支持计算设备执行图2中的过程201,204,图4中的过程401,分组单元802用于支持计算设备执行图2中的过程202,图4中的过程404,变换单元803用于支持计算设备执行图2中的203,图4中的过程405,采样单元804用于支持计算设备执行图2中的204, 图4中的过程406,确定单元805用于支持计算设备执行图4中的402,403。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图9示出了上述实施例中所涉及的计算设备的一种可能的结构示意图。计算设备包括:处理模块902和通信模块903。处理模块902用于对计算设备的动作进行控制管理,例如,处理模块902用于支持计算设备执行图2中的过程202、203、204,图4中的过程401,402,403,404,405,406,通信模块903用于支持计算设备执行图2中的过程201,和/或用于本文所描述的技术的其它过程。通信模块903用于支持计算设备与其他网络实体的通信,例如与图1、图3、或4中示出的终端的通信。计算设备还可以包括存储模块901,用于存储计算设备的程序代码和数据。
其中,处理模块902可以是处理器或控制器,例如可以是中央处理器CPU,通用处理器,数字信号处理器DSP,专用集成电路ASIC,现场可编程门阵列FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块903可以是收发器、收发电路或通信接口等。存储模块901可以是存储器。
当处理模块902为处理器,通信模块903为收发器,存储模块901为存储器时,本发明实施例所涉及的计算设备可以为图10所示的计算设备。
参阅图10所示,该计算设备包括:阵列麦克风101、声源处理模块102和音频数据输出模块103,阵列麦克风、声源处理模块和音频数据输出模块通过总线104相互连接;总线104可以是外设部件互连标准PCI总线或扩展工业标准结构EISA总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
结合本发明公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令 或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (24)

  1. 一种音频信号的重建方法,其特征在于,包括:
    获取至少两个通道的至少两个音频信号对应的压缩数据,所述至少两个通道与所述至少两个音频信号一一对应;
    获取所述至少两个音频信号对应的通道所在的组的分组信息;
    根据所述分组信息,将所述至少两个音频信号对应的压缩数据进行分组,从而得到压缩数据组;
    获取测量矩阵,根据所述压缩数据组内的压缩数据和所述测量矩阵,联合重建所述压缩数据组内的压缩数据对应的频域系数;
    对所述频域系数进行频域到时域的变换,从而获得所述压缩数据组内的压缩数据对应的音频信号。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述至少两个音频信号的语乐音标签信息,所述语乐音标签信息用于指示所述至少两个音频信号为语音信号或乐音信号;
    所述获取至少两个通道的至少两个音频信号对应的压缩数据包括:根据所述语乐音标签信息,获取所述语乐音标签信息对应的帧长;
    根据所述帧长,提取所述至少两个音频信号对应的测量数据;
    对所述测量数据进行反量化,从而获得所述至少两个音频信号对应的压缩数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述压缩数据组内的压缩数据和所述测量矩阵,联合重建所述压缩数据组内的压缩数据对应的频域系数包括:
    根据压缩数据组内一个通道对应的压缩数据对应的频域系数、所述压缩数据组内另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组内所述另一个通道对应的压缩数据对应的频域系数。
  4. 根据权利要求3所述的方法,其特征在于,根据压缩数据组内一个通道对应的压缩数据对应的频域系数、所述压缩数据组内另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组内所述另一个通道对应的压缩数据对应的频域系数包括:
    根据压缩数据组内第i个通道对应的压缩数据对应的频域系数、所述压缩数据组内第i+1个通道对应的压缩数据以及所述测量矩阵,计算所述第i+1个通道对应的压缩数据对应的频域系数,直至计算得到所述压缩数据组内第k个通道对应的压缩数据对应的频域系数,i为小于k的正整数,k为所述压缩数据组内的通道总数。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    根据压缩数据组内第j个通道对应的压缩数据对应的频域系数、所述压缩数据组内第j-1个通道对应的压缩数据以及所述测量矩阵,计算所述第j-1个通道对应的压缩数据对应的频域系数,直至计算得到所述压缩数据组内第1个通道对应的压缩数据对应的频域系数,j为小于或者等于k,并且大于1的正整数。
  6. 根据权利要求4或5所述的方法,其特征在于,所述方法还包括:
    根据预设的初始化频域系数、所述第1个通道对应的压缩数据以及所述测量矩阵,计算所述第2个通道对应的压缩数据对应的频域系数。
  7. 根据权利要求3至6任一项所述的方法,其特征在于,根据压缩数据组内一个通道对应的压缩数据对应的频域系数、所述压缩数据组内另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组内所述另一个通道对应的压缩数据对应的频域系数包括:
    根据所述一个通道对应的压缩数据对应的频域系数,确定所述另一个通道对应的压缩数据对应的先验频域系数;
    将所述先验频域系数作为所述另一个通道对应的压缩数据对应的频域系数的先验,并根据所述另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组内所述另一个通道对应的压缩数据对应的频域系数。
  8. 一种音频信号的压缩采样方法,其特征在于,包括:
    获取至少两个通道的至少两个音频信号,所述至少两个通道与所述至少两个音频信号一一对应;
    计算所述至少两个音频信号之间的相关性,根据所述相关性对所述至少两个音频信号进行分组,从而得到所述至少两个通道中的通道所在的组的分组信息;
    对所述至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,所述至少两组频域系数与所述至少两个音频信号一一对应;
    获取测量矩阵,根据所述测量矩阵对所述至少两组频域系数进行采样,从而获得所述至少两个音频信号对应的压缩数据。
  9. 根据权利要求8所述的方法,其特征在于,在获取测量矩阵之前,所述方法还包括:
    确定所述至少两个音频信号的语乐音标签信息,所述语乐音标签信息用于指示所述至少两个音频信号为语音信号或乐音信号;
    根据所述所述语乐音标签信息,确定所述至少两个音频信号的帧长。
  10. 根据权利要求9所述的方法,其特征在于,所述获取测量矩阵包括:
    根据所述帧长,获得所述帧长对应的所述测量矩阵。
  11. 根据权利要求8至10任一项所述的方法,其特征在于,所述计算所述至少两个音频信号之间的相关性,根据所述相关性对所述至少两个音频信号进行分组包括:
    获取所述至少两个音频信号中的第一音频信号,获取除所述第一音频信号外其余音频信号中与所述第一音频信号相关性最高的前m个音频信号,并将所述第一音频信号和与所述第一音频信号相关性最高的前m个音频信号作为一组音频信号,m为大于或等于1的正整数;
    从除所述第一音频信号和与所述第一音频信号相关性最高的前m个音频信号外其余音频信号中继续选取第二音频信号并获取除所述第一音频信号、第二音频信号和与所述第一音频信号相关性最高的前m个音频信号外其余通道中与所述第二音频信号相关性最高的前m个通道,并将所述第而音频信号和与所述第而音频信号相关性最高的前m个音频信号作为另一组音频信号,直至所述至少两个音频的分组完成。
  12. 根据权利要求8至11任一项所述的方法,其特征在于,所述至少两个音频信号之间的相关性包括所述至少两个音频信号之间的距离。
  13. 一种音频信号的重建装置,其特征在于,包括:
    获取单元,用于获取至少两个通道的至少两个音频信号对应的压缩数据,所述至少两个通道与所述至少两个音频信号一一对应;
    所述获取单元,还用于获取所述至少两个音频信号对应的通道所在的组的分组信息;
    分组单元,用于根据所述分组信息,将所述至少两个音频信号对应的压缩数据进行分组,从而得到压缩数据组;
    重建单元,用于获取测量矩阵,根据所述压缩数据组内的压缩数据和所述测量矩阵,联合重建所述压缩数据组内的压缩数据对应的频域系数;
    变换单元,用于对所述频域系数进行频域到时域的变换,从而获得所述压缩数据组内的压缩数据对应的音频信号。
  14. 根据权利要求13所述的装置,其特征在于,所述获取单元还用于:
    获取所述至少两个音频信号的语乐音标签信息,所述语乐音标签信息用于指示所述至少两个音频信号为语音信号或乐音信号;
    所述获取单元,用于:
    根据所述语乐音标签信息,获取所述语乐音标签信息对应的帧长;
    根据所述帧长,提取所述至少两个音频信号对应的测量数据;
    对所述测量数据进行反量化,从而获得所述至少两个音频信号的压缩数据。
  15. 根据权利要求13或14所述的装置,其特征在于,所述重建单元用于:
    根据压缩数据组内一个通道对应的压缩数据对应的频域系数、所述压缩数据组内另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组内所述另一个通道对应的压缩数据对应的频域系数。
  16. 根据权利要求15所述的装置,其特征在于,所述重建单元用于:
    根据压缩数据组内第i个通道对应的压缩数据对应的频域系数、所述压缩数据组内第i+1个通道对应的压缩数据以及所述测量矩阵,计算所述第i+1个通道对应的压缩数据对应的频域系数,直至计算得到所述压缩数据组内第k个通道对应的压缩数据对应的频域系数,i为小于k的正整数,k为所述压缩数据组内的通道总数。
  17. 根据权利要求16所述的装置,所述重建单元还用于:
    根据压缩数据组内第j个通道对应的压缩数据对应的频域系数、所述压缩数据组内第j-1个通道对应的压缩数据以及所述测量矩阵,计算所述第j-1个通道对应的压缩数据对应的频域系数,直至计算得到所述压缩数据组内第1个通道对应的压缩数据对应的频域系数,j为小于或者等于k,并且大于1的正整数。
  18. 根据权利要求16或17所述的装置,其特征在于,所述重建单元还用于:
    根据预设的初始化频域系数、所述第1个通道对应的压缩数据以及所述测量矩阵,计算所述第2个通道对应的压缩数据对应的频域系数。
  19. 根据权利要求15-18任一项所述的装置,其特征在于,所述重建单元用于:
    根据所述一个通道对应的压缩数据对应的频域系数,确定所述另一个通道对应的压缩数据对应的先验频域系数;
    将所述先验频域系数作为所述另一个通道对应的压缩数据对应的频域系数的先验,并根据所述另一个通道对应的压缩数据以及所述测量矩阵,计算所述压缩数据组 内所述另一个通道对应的压缩数据对应的频域系数。
  20. 一种音频信号的压缩采样装置,其特征在于,包括:
    获取单元,用于获取至少两个通道的至少两个音频信号,所述至少两个通道与所述至少两个音频信号一一对应;
    分组单元,用于计算所述至少两个音频信号之间的相关性,根据所述相关性对所述至少两个音频信号进行分组,从而得到所述至少两个通道中的通道所在的组的分组信息;
    变换单元,用于对所述至少两个音频信号进行时域到频域的变换,从而获得至少两组频域系数,所述至少两组频域系数与所述至少两个音频信号一一对应;
    所述获取单元,还用于获取测量矩阵;
    采样单元,用于根据所述测量矩阵对所述至少两组频域系数进行采样,从而获得所述至少两个音频信号对应的压缩数据。
  21. 根据权利要求20所述的装置,其特征在于,还包括确定单元,用于:
    确定所述至少两个音频信号的语乐音标签信息,所述语乐音标签信息用于指示所述至少两个音频信号为语音信号或乐音信号;
    根据所述所述语乐音标签信息,确定所述至少两个音频信号的帧长。
  22. 根据权利要求21所述的装置,其特征在于,所述获取单元用于:
    根据所述帧长,获得所述帧长对应的所述测量矩阵。
  23. 根据权利要求20至22任一项所述的装置,其特征在于,所述分组单元用于:
    获取所述至少两个音频信号中的第一音频信号,获取除所述第一音频信号外其余音频信号中与所述第一音频信号相关性最高的前m个音频信号,并将所述第一音频信号和与所述第一音频信号相关性最高的前m个音频信号作为一组音频信号,m为大于或等于1的正整数;
    从除所述第一音频信号和与所述第一音频信号相关性最高的前m个音频信号外其余音频信号中继续选取第二音频信号并获取除所述第一音频信号、第二音频信号和与所述第一音频信号相关性最高的前m个音频信号外其余通道中与所述第二音频信号相关性最高的前m个通道,并将所述第而音频信号和与所述第而音频信号相关性最高的前m个音频信号作为另一组音频信号,直至所述至少两个音频的分组完成。
  24. 根据权利要求20-23任一项所述的装置,其特征在于,所述至少两个音频信号之间的相关性包括所述至少两个音频信号之间的距离。
PCT/CN2017/086390 2016-09-30 2017-05-27 一种音频信号的重建方法和装置 WO2018058989A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610879165.XA CN107895580B (zh) 2016-09-30 2016-09-30 一种音频信号的重建方法和装置
CN201610879165.X 2016-09-30

Publications (1)

Publication Number Publication Date
WO2018058989A1 true WO2018058989A1 (zh) 2018-04-05

Family

ID=61763093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086390 WO2018058989A1 (zh) 2016-09-30 2017-05-27 一种音频信号的重建方法和装置

Country Status (2)

Country Link
CN (1) CN107895580B (zh)
WO (1) WO2018058989A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874626B (zh) * 2018-09-03 2023-07-18 华为技术有限公司 一种量化方法及装置
WO2020211017A1 (zh) * 2019-04-17 2020-10-22 深圳市大疆创新科技有限公司 音频信号处理方法、设备及存储介质
WO2020211004A1 (zh) * 2019-04-17 2020-10-22 深圳市大疆创新科技有限公司 音频信号处理方法、设备及存储介质
CN111128230B (zh) * 2019-12-31 2022-03-04 广州市百果园信息技术有限公司 语音信号重建方法、装置、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819022A (zh) * 2006-03-23 2006-08-16 北京东方利优科技发展有限公司 快变音频信号的编解码方法
JP2009151183A (ja) * 2007-12-21 2009-07-09 Ntt Docomo Inc マルチチャネル音声音響信号符号化装置および方法、並びにマルチチャネル音声音響信号復号装置および方法
US20090210222A1 (en) * 2008-02-15 2009-08-20 Microsoft Corporation Multi-Channel Hole-Filling For Audio Compression
CN102081926A (zh) * 2009-11-27 2011-06-01 中兴通讯股份有限公司 格型矢量量化音频编解码方法和系统
CN103714825A (zh) * 2014-01-16 2014-04-09 中国科学院声学研究所 基于听觉感知模型的多通道语音增强方法
CN103745724A (zh) * 2014-01-13 2014-04-23 电子科技大学 一种应用于多声道音频解码的时频混合缩混方法

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030169886A1 (en) * 1995-01-10 2003-09-11 Boyce Roger W. Method and apparatus for encoding mixed surround sound into a single stereo pair
US6345246B1 (en) * 1997-02-05 2002-02-05 Nippon Telegraph And Telephone Corporation Apparatus and method for efficiently coding plural channels of an acoustic signal at low bit rates
US7240001B2 (en) * 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
US7502743B2 (en) * 2002-09-04 2009-03-10 Microsoft Corporation Multi-channel audio encoding and decoding with multi-channel transform selection
CN101247129B (zh) * 2004-09-17 2012-05-23 广州广晟数码技术有限公司 用于音频信号编码的码书分配方法
WO2007011083A1 (en) * 2005-07-18 2007-01-25 Lg Electronics Inc. Apparatus and method of encoding and decoding audio signal
US8190425B2 (en) * 2006-01-20 2012-05-29 Microsoft Corporation Complex cross-correlation parameters for multi-channel audio
CN101281749A (zh) * 2008-05-22 2008-10-08 上海交通大学 可分级的语音和乐音联合编码装置和解码装置
US8447591B2 (en) * 2008-05-30 2013-05-21 Microsoft Corporation Factorization of overlapping tranforms into two block transforms
CN101447190A (zh) * 2008-06-25 2009-06-03 北京大学深圳研究生院 基于嵌套子阵列的后置滤波与谱减法联合语音增强方法
WO2010003521A1 (en) * 2008-07-11 2010-01-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and discriminator for classifying different segments of a signal
JP4983845B2 (ja) * 2009-04-17 2012-07-25 株式会社Jvcケンウッド 音声信号伝送装置、音声信号受信装置及び音声信号伝送システム
CN102982805B (zh) * 2012-12-27 2014-11-19 北京理工大学 一种基于张量分解的多声道音频信号压缩方法
TWI618050B (zh) * 2013-02-14 2018-03-11 杜比實驗室特許公司 用於音訊處理系統中之訊號去相關的方法及設備
EP2838086A1 (en) * 2013-07-22 2015-02-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. In an reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
CN104934032B (zh) * 2014-03-17 2019-04-05 华为技术有限公司 根据频域能量对语音信号进行处理的方法和装置
CN104240712B (zh) * 2014-09-30 2018-02-02 武汉大学深圳研究院 一种三维音频多声道分组聚类编码方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819022A (zh) * 2006-03-23 2006-08-16 北京东方利优科技发展有限公司 快变音频信号的编解码方法
JP2009151183A (ja) * 2007-12-21 2009-07-09 Ntt Docomo Inc マルチチャネル音声音響信号符号化装置および方法、並びにマルチチャネル音声音響信号復号装置および方法
US20090210222A1 (en) * 2008-02-15 2009-08-20 Microsoft Corporation Multi-Channel Hole-Filling For Audio Compression
CN102081926A (zh) * 2009-11-27 2011-06-01 中兴通讯股份有限公司 格型矢量量化音频编解码方法和系统
CN103745724A (zh) * 2014-01-13 2014-04-23 电子科技大学 一种应用于多声道音频解码的时频混合缩混方法
CN103714825A (zh) * 2014-01-16 2014-04-09 中国科学院声学研究所 基于听觉感知模型的多通道语音增强方法

Also Published As

Publication number Publication date
CN107895580B (zh) 2021-06-01
CN107895580A (zh) 2018-04-10

Similar Documents

Publication Publication Date Title
US20210089967A1 (en) Data training in multi-sensor setups
US10891967B2 (en) Method and apparatus for enhancing speech
WO2018058989A1 (zh) 一种音频信号的重建方法和装置
JP2024084842A (ja) 高次アンビソニックス信号表現を圧縮又は圧縮解除するための方法又は装置
WO2018223727A1 (zh) 识别声纹的方法、装置、设备及介质
Lin et al. Speech enhancement using multi-stage self-attentive temporal convolutional networks
WO2014056326A1 (zh) 语音质量评估的方法和装置
WO2020224226A1 (zh) 基于语音处理的语音增强方法及相关设备
CN111542877A (zh) 空间音频参数编码和相关联的解码的确定
WO2022166710A1 (zh) 语音增强方法、装置、设备及存储介质
CN117693791A (zh) 言语增强
KR101483513B1 (ko) 음원위치추적장치 및 음원위치추적방법
CN113345465B (zh) 语音分离方法、装置、设备及计算机可读存储介质
CN110544485A (zh) 一种利用cnn的se-ed网络进行远场语音去混响的方法
CN117133307A (zh) 低功耗单声道语音降噪方法、计算机装置及计算机可读存储介质
WO2020253054A1 (zh) 评估音频信号损失的方法、装置及存储介质
CN111261194A (zh) 一种基于pcm技术的音量分析方法
WO2022068675A1 (zh) 发声者语音抽取方法、装置、存储介质及电子设备
Al-Jouhar et al. Feature combination and mapping using multiwavelet transform
Lee et al. DEMUCS-Mobile: On-Device Lightweight Speech Enhancement.
WO2019100327A1 (zh) 一种信号处理方法、装置及终端
Wu et al. Time-Domain Mapping with Convolution Networks for End-to-End Monaural Speech Separation
WO2017193551A1 (zh) 多声道信号的编码方法和编码器
CN112466325B (zh) 声源定位方法和装置,及计算机存储介质
Bao et al. Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17854445

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17854445

Country of ref document: EP

Kind code of ref document: A1