CN106981292B - Multi-channel spatial audio signal compression and recovery method based on tensor modeling - Google Patents

Multi-channel spatial audio signal compression and recovery method based on tensor modeling Download PDF

Info

Publication number
CN106981292B
CN106981292B CN201710342387.2A CN201710342387A CN106981292B CN 106981292 B CN106981292 B CN 106981292B CN 201710342387 A CN201710342387 A CN 201710342387A CN 106981292 B CN106981292 B CN 106981292B
Authority
CN
China
Prior art keywords
tensor
channel
audio signal
frame
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710342387.2A
Other languages
Chinese (zh)
Other versions
CN106981292A (en
Inventor
王晶
谢湘
刘敏
单亚慧
费泽松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201710342387.2A priority Critical patent/CN106981292B/en
Publication of CN106981292A publication Critical patent/CN106981292A/en
Application granted granted Critical
Publication of CN106981292B publication Critical patent/CN106981292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a multi-channel spatial audio signal compression and recovery method based on tensor modeling, belongs to the technical field of audio signal processing, and particularly belongs to the technical field of spatial audio coding and decoding. And performing sound channel energy normalization on the multi-channel spatial audio signals, simultaneously obtaining sound channel energy adjustment parameters, and performing framing and time-frequency transformation on the audio signals of each sound channel to obtain characteristic parameters on a frequency domain. For a training sample set, a fourth-order audio tensor is established, three low-rank factor matrixes are obtained through tensor decomposition, tensor operation is carried out on the three-order audio tensor constructed by the test sample set, a compressed core tensor and a channel energy adjustment parameter are obtained and are transmitted together in a coding mode, tensor reconstruction is carried out on the core tensor transmitted at a decoding end and the trained low-rank factor matrix, and the reconstructed tensor signals are subjected to inverse transformation, overlap addition and energy adjustment on each channel to recover the multi-channel spatial audio signals. According to the method, a unique factor matrix training mode is adopted to perform tensor modeling on the multi-path spatial audio signals, so that higher compression efficiency can be achieved.

Description

Multi-channel spatial audio signal compression and recovery method based on tensor modeling
Technical Field
The invention relates to a multi-channel spatial audio signal compression and recovery method, which belongs to the technical field of audio signal processing, in particular to the technical field of spatial audio coding and decoding.
Background
With the research and development of digital multimedia technology, more channels of spatial audio are gradually applied to people's lives, the addition of more channels increases the surround feeling and improves the audio-visual enjoyment of people, however, the amount of data of multi-channel audio is also doubled with the rapid increase of the number of channels, so some spatial audio coding and decoding techniques, such as MPEG surround, MPEG AAC, Dolby AC-3 and the like, are developed, and most of them are implemented by down-mixing multiple channels into fewer channels at the encoding end, extracting spatial parameter information for transmission together, and up-mixing the spatial parameter information into multi-channel signals at the decoding end.
The traditional spatial parameter extraction method is generally vectorization, which destroys the spatial structure and the internal relation of original data, and a plurality of spatial audio signals (for example, a plurality of channels or a plurality of audio signals based on scene acquisition) require a greater degree of compression for a transmission requirement of a lower rate, so how to reasonably model, efficiently compress and effectively reconstruct the plurality of spatial audio signals is a key problem of spatial audio coding and decoding. The multi-path spatial audio signal can be decomposed into signals influenced by various factors, the signal space is suitable to be represented by a high-order tensor, and then tensor analysis is utilized to carry out low-rank tensor decomposition for compression and reconstruction.
Tensor analysis is derived from multilateral analysis, is a high-order generalization of vectors and matrices, can analyze and process a high-order tensor structure of a certain object by using tensor algebra, and has been widely applied in various fields. In recent years, in the field of multimedia signal processing, tensors have many successful application examples, such as constructing a tensor face based on 4 factors of characters, expressions, visual angles and light; expanding the traditional characteristic sound technology into a speaker space represented by tensor in speaker self-adaptation and speaker conversion; in the field of multi-channel audio signal processing, a patent with a publication number of 'CN 102982805A' (publication date is 3/20/2013) in China 'a multi-channel audio signal compression method based on tensor decomposition' introduces tensor into compression of multi-channel audio signals for the first time.
Disclosure of Invention
The invention mainly aims to construct a high-order model for multi-channel spatial audio signals to perform high-efficiency compression, and provides a multi-channel spatial audio signal compression and recovery method based on tensor modeling, which can not only consider the multi-factor problem of sound channels and time frequency, but also optimize the training of a factor matrix.
In order to achieve the above purpose, the basic idea of the method of the invention is as follows: for multi-channel audio signals (such as multi-channel audio), firstly, time domain channel energy normalization is carried out to obtain channel energy adjustment parameters, the audio signals of each channel are subjected to energy adjustment, windowing, framing and time-frequency transformation, then, the multi-channel audio signals are divided into a training sample set and a test sample set, for the multi-channel audio signals of the training sample set, a fourth-order audio tensor space is established on a sample, a channel, a time domain and a frequency domain, tensor decomposition is utilized to carry out low-rank approximation, three low-rank factor matrixes (a projection matrix on the sample space is set as an identity matrix) are obtained to be used for compression and recovery of the test sample set, for the multi-channel signals of the test sample set, a third-order tensor signal containing the channel, the time domain and the frequency domain is established, a low-rank nuclear tensor is obtained through tensor operation with the trained three low-rank, and finally, performing inverse transformation on the signal of each channel, and recovering the original multi-channel spatial audio signal by overlapping and adding channel energy adjustment.
The invention provides a multi-channel spatial audio signal compression method based on tensor modeling, which comprises the following steps of:
the method comprises the following steps: obtaining a channel energy adjusting parameter for time domain channel energy normalization, and obtaining an average energy value E of each channel audio signal for a plurality of channels of spatial audio signals with N channels and M sampleschThen, the values of the P-M × N energy are averaged as a standard normalized parameter E0And the average energy value of each channel is divided by the average energy value of each channel to obtain a channel energy adjusting parameter e corresponding to each channelch
Figure BDA0001295534150000021
Figure BDA0001295534150000022
Figure BDA0001295534150000023
Wherein x isiC is the number of samples per channel.
Step two: using e obtained in step onechMultiplying the audio data of the corresponding sound channel to obtain a normalized multi-channel spatial audio signal, then framing the audio signal of each sound channel by adopting a Hamming window, wherein the frame length is L, the frame shift is K, and the audio signal of each sound channel is divided into T frame sequences;
step three: performing time-frequency transformation on each frame of audio signal in each sound channel to obtain characteristic parameters with the length of F on a frequency domain;
time-frequency transformation is orthogonal transformation, preferably Discrete Cosine Transformation (DCT);
step four: taking the F characteristic parameters corresponding to each frame of audio signal of each channel as each row of the matrix, that is, the characteristic parameters of the T frame of audio signal of each channel may form a coefficient matrix with a size of T × F;
the coefficient matrixes of the N sound channels of each sample are sequentially arranged to form a third-order audio tensor space Z with the size of NxTxF, and the three dimensions are as follows: sound channel, frame sequence, frequency domain coefficient;
step five: randomly selecting M from M multi-channel samples1One as training sample set and the rest as test sample set, M2A plurality of;
for M1The training samples are used for sequentially arranging the three-order audio tensor signals constructed in the step four to form a training sample with the size of M1A fourth order audio tensor space X of X N X T X F, whose four dimensions are: samples, channels, frame sequences, frequency domain coefficients;
the parameter M satisfies M ═ M1+M2And as preferred M1≥10;
Step six: carrying out tensor decomposition on the fourth-order audio tensor space X constructed in the step five as follows:
X=S×1Us×2Uc×3Ut×4Uf(4)
s in the above formula is a four-order low-rank nuclear tensor, and the dimensions of the tensor are M on a sample subspace, a sound channel subspace, a frame sequence subspace and a frequency domain subspace respectively1R, Q, O; wherein U iss、Uc、UtAnd UfLow rank factor matrices for the audio tensor X projection at four subspaces of sample, channel, frame sequence and frequency domain, respectively, are:
Usa low rank matrix projected on the sample space for the audio tensor X through tensor decomposition, with size M1×M1Since it has no specific physical meaning in the present algorithm, it is initially set as an identity matrix I;
Ucthe audio tensor X is a low-rank matrix projected on a sound channel space through tensor decomposition, the size of the low-rank matrix is NxR, and R is more than or equal to 1 and less than or equal to N;
Utthe audio tensor X is a low-rank matrix projected on a frame sequence space through tensor decomposition, the size of the low-rank matrix is T multiplied by Q, and Q is more than or equal to 1 and less than or equal to T;
Ufthe audio tensor X is a low-rank matrix projected on a frequency domain space through tensor decomposition, the size of the low-rank matrix is F multiplied by O, and O is more than or equal to 1 and less than or equal to F;
wherein the extract is1、×2、×3、×4Tensor matrix multiplication respectively representing tensors in a first order, a second order, a third order and a fourth order is defined as: if a tensor of order N
Figure BDA0001295534150000031
And a matrix
Figure BDA0001295534150000032
Multiplication can be expressed as W and znA, the result is a size I1×…×In-1×J×In+1×…×INTensors of order N.
The low rank factor matrix UtThe parameter Q of (a) is preferably Q ═ T;
the tensor decomposition is calculated by an Alternating Least square method (ALS);
step seven: to M2Obtaining M by using each test sample in a tensor modeling mode of step four2Carrying out tensor operation of the following formula on the three different third-order tensor signals Y and the three low-rank factor matrixes obtained in the step six to obtain a compressed nuclear tensor G, wherein the size of the nuclear tensor G is R multiplied by Q multiplied by O;
Figure BDA0001295534150000033
in the above formula
Figure BDA0001295534150000041
Three low rank factor matrixes U obtained in the step six respectivelyc、Ut、UfTransposing;
step eight: converting the kernel tensor G obtained in the step seven and the channel energy adjustment parameters obtained in the step one into one-dimensional signals, and then, quantizing and encoding the signals to transmit the signals to a decoding end, wherein the three low-rank factor matrixes do not need encoding transmission;
corresponding to the multi-path spatial audio signal compression method, the invention also provides a multi-path spatial audio signal recovery method based on tensor modeling, which comprises the following steps:
step nine: at a decoding end, obtaining a compressed core tensor G and a sound channel energy adjustment parameter through decoding and dimension increasing, and carrying out tensor reconstruction on the core tensor G and the trained three low-rank factor matrixes according to the following formula to recover an original three-order tensor space Y';
Y'=G×1Uc×2Ut×3Uf(6)
wherein, Uc、UtAnd UfRespectively receiving low-rank factor matrixes which are not coded, wherein the three-order audio tensor space Y' reconstructed by the above formula and the original three-order audio tensor space Y have the same dimensional combination NxT xF;
step ten: the third-order audio tensor space Y' obtained in the ninth step has N sound channels, each sound channel has T frame sequences, and each frame has F characteristic parameters in the frequency domain, so that the time domain representation of each frame of audio signals is obtained according to the time-frequency inverse transformation corresponding to the three phases in the ninth step;
the time-frequency inverse transformation and the time-frequency transformation in the second step are inverse transformation, the time-frequency transformation adopts DCT, and the time-frequency inverse transformation is Inverse Discrete Cosine Transformation (IDCT);
step eleven: overlapping and adding the audio signals on each frame time domain on each sound channel obtained in the step ten to restore a normalized multi-channel signal, wherein the frame length is L, the frame is shifted to K, and finally, the audio data of each sound channel is adjusted by utilizing the transmitted sound channel energy adjustment parameter, namely, the normalized multi-channel spatial audio data is divided by the corresponding sound channel energy adjustment parameter to restore the original multi-channel spatial audio signal;
compared with the prior art, the invention has the beneficial effects that: the method fully considers the characteristics of four influencing factors of a sample, a sound channel, a time domain and a frequency domain in the aspect of multi-path spatial audio signal modeling, establishes a fourth-order tensor model for a training sample, obtains a factor matrix at one time by tensor decomposition, establishes a third-order tensor model for a test sample, and performs tensor operation with the trained factor matrix to obtain a low-rank nuclear tensor, thereby achieving the purpose of high-efficiency compression. The unique training mode of the factor matrix of the invention not only enhances the compression capability of redundant information between channels and in channels compared with the traditional multichannel audio coding and decoding method but also compared with the training mode of other factor matrices.
Drawings
FIG. 1 is a flow diagram of encoding and decoding a multi-path spatial audio signal using tensor decomposition;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and embodiments, and technical problems and advantages solved by the technical solutions of the present invention will be described, wherein the described embodiments are only intended to facilitate understanding of the present invention, and do not limit the present invention in any way.
The invention takes multi-channel spatial audio signals collected based on sound channels or scenes as an original database, and utilizes a compression algorithm shown in figure 1 to compress and reconstruct the original database:
the method comprises the following steps: obtaining a channel energy adjusting parameter for time domain channel energy normalization, and obtaining an average energy value E of each channel audio signal for a plurality of channels of spatial audio signals with the channel number N being 16 and the sample number M being 28chThen, the P-16 × 28 energy values are averaged as a standard normalized parameter E0And the average energy value of each channel is divided by the average energy value of each channel to obtain a channel energy adjusting parameter e corresponding to each channelch
Figure BDA0001295534150000051
Figure BDA0001295534150000052
Figure BDA0001295534150000053
Wherein xiC is the number of samples per channel.
Step two: multiplying the 16 × 28 channel energy adjustment parameters obtained in the first step by the audio data of the corresponding channel to obtain a normalized multi-channel spatial audio signal, then framing the audio signal of each channel by using a hamming window, wherein the frame length L is 960, the frame shift K is 480, and the frame sequence T of the audio signal of each channel is 899;
step three: performing Discrete Cosine Transform (DCT) on each frame of audio signal in each channel to obtain a characteristic parameter in a frequency domain, where the length F is 960;
step four: taking F feature parameters corresponding to each frame of audio signal of each channel as each row of the matrix, that is, the feature parameters of the T frame of audio signal of each channel may form a coefficient matrix with a size of T × F, that is, with a size of 899 × 960;
the coefficient matrixes of the N sound channels of each sample are sequentially arranged to form a third-order audio tensor space Z with the size of NxTxF, and the three dimensions are respectively as follows: sound channel, frame sequence, frequency domain coefficient;
step five: randomly selecting M from M multi-channel samples1One as training sample set and the rest as test sample set, M2A plurality of;
for M1The training samples are used for sequentially arranging the three-order audio tensor signals constructed in the step four to form a training sample with the size of M1A fourth order audio tensor space X of X N X T X F, whose four dimensions are: samples, channels, frame sequences, frequency domain coefficients;
the training sample set M1Test sample set M, 2028; the size of the formed fourth-order audio tensor space X is 20 multiplied by 16 multiplied by 899 multiplied by 960, and the size of the third-order audio tensor space Y is 16 multiplied by 899 multiplied by 960;
step six: carrying out tensor decomposition on the fourth-order audio tensor space X constructed in the step five as follows:
X≈S×1Us×2Uc×3Ut×4Uf(10)
s in the above formula is a four-order low-rank nuclear tensor, and the dimensions of the tensor are M in a sample subspace, a channel subspace, a frame sequence subspace and a frequency subspace respectively1R, Q, O; wherein U iss、Uc、UtAnd UfLow rank factor matrices for the audio tensor X projection at four subspaces of sample, channel, frame sequence and frequency, respectively, are:
Usa low rank matrix projected on the sample space for the audio tensor X through tensor decomposition, with size M1×M1Since it has no specific physical meaning in the present algorithm, it is initially set as an identity matrix I;
Ucthe audio tensor X is a low-rank matrix projected on a sound channel space through tensor decomposition, the size of the low-rank matrix is NxR, and R is more than or equal to 1 and less than or equal to N;
Utthe audio tensor X is a low-rank matrix projected on a frame sequence space through tensor decomposition, the size of the low-rank matrix is T multiplied by Q, and Q is more than or equal to 1 and less than or equal to T;
Ufthe audio tensor X is a low-rank matrix projected on a frequency domain space through tensor decomposition, the size of the low-rank matrix is F multiplied by O, and O is more than or equal to 1 and less than or equal to F;
therein is1、×2、×3、×4Tensor matrix multiplication respectively representing tensors in a first order, a second order, a third order and a fourth order is defined as: if a tensor of order N
Figure BDA0001295534150000061
And a matrix
Figure BDA0001295534150000062
Multiplication can be expressed as W and znA, the result is a size I1×…×In-1×J×In+1×…×INTensors of order N.
The low rank factor matrix UsThe algorithm has no physical significance, so that the algorithm is set as a unit matrix I at this time, and the size of the unit matrix I is 20 multiplied by 20; for a factor matrix U projected in the time domaintFor example, if the low-rank projection is performed on the reconstructed audio, the reconstructed audio quality is seriously affected, so that the low-rank projection is not performed on the reconstructed audio, and Q is 899; by setting R, O to be lower, low rank projection is carried out on channels and frequency domain to obtain a low rank factor matrix UcAnd UfR is more than or equal to 1 and less than or equal to N, and O is more than or equal to 1 and less than or equal to F.
The tensor decomposition is calculated by an Alternating Least square method (ALS);
step seven: to M2Obtaining M by using each test sample in a tensor modeling mode of step four2Carrying out tensor operation of the following formula on the three different third-order tensor signals Y and the three low-rank factor matrixes obtained in the step six to obtain a compressed nuclear tensor G, wherein the size of the nuclear tensor G is R multiplied by Q multiplied by O;
Figure BDA0001295534150000063
in the above formula
Figure BDA0001295534150000064
Three low rank factor matrixes U obtained in the step six respectivelyc、Ut、UfTransposing;
step eight: converting the kernel tensor G obtained in the step seven and the channel energy adjustment parameters obtained in the step one into one-dimensional signals, and then, quantizing and encoding the signals to transmit the signals to a decoding end, wherein the three low-rank factor matrixes do not need encoding transmission;
step nine: at a decoding end, obtaining a compressed core tensor G and a sound channel energy adjustment parameter through decoding and dimension increasing, and carrying out tensor reconstruction on the core tensor G and the trained three low-rank factor matrixes according to the following formula to recover an original three-order tensor space Y';
Y'=G×1Uc×2Ut×3Uf(12)
the third-order audio tensor space Y' reconstructed by the above formula has the same dimensional combination NxTxF with the original third-order audio tensor space Y, namely 16 x 899 x 960;
step ten: the third-order audio tensor space Y' obtained in the step nine has 16 sound channels, each sound channel has 899 frame sequences, and each frame has 960 characteristic parameters in the frequency domain, so that the time domain representation of each frame of audio signal is obtained according to the Inverse Discrete Cosine Transform (IDCT) corresponding to the three phases in the step;
step eleven: overlapping and adding the audio signals on each frame of time domain on each sound channel obtained in the step ten to restore normalized multi-channel spatial audio signals, wherein the frame length L is 960, the frame shift K is 480, and finally, adjusting the audio data of each sound channel by using the transmitted sound channel energy adjustment parameters, namely dividing the normalized multi-channel spatial audio data by the sound channel energy adjustment parameters of the corresponding sound channel to restore the original multi-channel spatial audio signals;
in order to further explain the concrete process of compression, one compression case is selected to give a concrete explanation: at the encoding end, for a fourth-order audio tensor X (with a size of 20 × 16 × 899 × 960), a tensor decomposition of low-rank approximation is performed on the audio tensor X in the channel, time domain, and frequency domain, respectively, where the parameters R ═ 2, Q ═ 899, and O ═ 400, then the audio tensor X may be subjected to a tensor decomposition in the channel, time domain, and frequency domain, respectivelyTo obtain a low rank factor matrix U of size 16 x 2c899 × 899 low rank factor matrix Ut960 x 400 low rank factor matrix UfPerforming tensor operation with a third-order audio tensor Y (size 16 × 899 × 960) to obtain a core tensor G with the size of 2 × 899 × 400; at the decoding end, the nuclear tensor G and the trained low-rank factor matrix are subjected to tensor reconstruction to recover the original third-order audio tensor Y'. The factor matrix is trained by adopting a modeling mode different from that of a test sample, taking the number of the considered samples as a subspace to construct a fourth-order audio tensor, setting the fourth-order audio tensor as an identity matrix through tensor decomposition of low-rank approximation, and performing low-rank projection in other three subspaces to obtain three low-rank factor matrices for compression and reconstruction of subsequent multi-path spatial audio.
By setting different R and O, the kernel tensors G with different sizes can be obtained, so different compression efficiencies are obtained, and because only the sound channel energy adjusting parameters and the low-rank kernel tensor are used for transmission, and the sound channel energy adjusting parameters only occupy a few bit numbers compared with the kernel tensor, the compression effect can be approximately used as the compression percentage
Figure BDA0001295534150000071
In this regard, the experiment selects 8 channels of spatial audio data with 16 channels as the test, and the compression percentage is shown in table 1, and the experimental result shows that the compression percentage is 94.79% when R is 2 and O is 400, which is much higher than the compression efficiency of the multi-channel audio signal based on tensor decomposition described in the patent with publication number "CN 102982805A". It has been shown through a number of experiments that the novel training pattern of the factor matrix of the present invention can provide higher compression efficiency for multi-path spatial audio signal compression.
TABLE 1 results of percentage compression of a multi-path spatial audio signal
Figure BDA0001295534150000081

Claims (8)

1. A multi-channel spatial audio signal compression method based on tensor modeling is disclosed, wherein the multi-channel spatial audio signal refers to an audio signal with N channels and M samples, and the method is characterized by comprising the following steps of:
(S1) obtaining an average energy value E of each channel audio signalchNormalized parameter E0And a channel energy adjustment parameter e corresponding to each channelchThe formula is as follows:
Figure FDA0002345746020000011
Figure FDA0002345746020000012
Figure FDA0002345746020000013
wherein x isiC is the number of sampling points of each sound channel, and P is M multiplied by N;
(S2) Using e obtained in step S1chMultiplying the audio data of the corresponding sound channel to obtain a normalized multi-channel spatial audio signal, then framing the audio signal of each sound channel by adopting a Hamming window, wherein the frame length is L, the frame shift is K, and the audio signal of each sound channel is divided into T frame sequences;
(S3) dividing the audio signal of each sound channel into T frame sequences, and performing time-frequency transformation on each frame of audio signal to obtain characteristic parameters with the length of F on a frequency domain;
(S4) taking the F feature parameters corresponding to each frame of audio signal of each channel as each row of the matrix, that is, the feature parameters of the T frame of audio signal of each channel form a coefficient matrix with a size of T × F; sequentially arranging coefficient matrixes of N sound channels of each sample to form a third-order audio tensor space Z with the size of NxTxF, wherein three dimensions of the Z are the sound channels, a frame sequence and frequency domain coefficients;
(S5) randomly selecting M from the multi-channel audio with M samples1One sample is used as a training sample set, and the rest areNumber of samples M2=M-M1As a test sample set; to M1Four-order audio tensor signals are constructed by the training samples and are sequentially arranged to form a four-order audio tensor space X, and the size of the four-order audio tensor space is M1X N X T X F, wherein M1N, T and F are the corresponding dimensions of the four dimensions of sample, sound channel, frame sequence and frequency domain coefficient;
(S6) performing tensor decomposition on the X in step S5 as follows: x ═ S-1Us×2Uc×3Ut×4UfTherein, is1、×2、×3And-4Tensor matrix multiplication, U, representing tensors in first, second, third and fourth orders, respectivelys、Uc、UtAnd UfLow rank factor matrix of X projection under four subspaces of sample, sound channel, frame sequence and frequency domain, S is four-order low rank kernel tensor whose dimension numbers on the sample subspace, sound channel subspace, frame sequence subspace and frequency domain subspace are M1R, Q and O;
(S7) for M2Obtaining M by using each test sample in a tensor modeling mode of the step S42Carrying out tensor operation of the following formula on the three different third-order tensor signals Y and the three low-rank factor matrixes of the step S6 to obtain a compressed core tensor G, wherein the size of the compressed core tensor G is R multiplied by Q multiplied by O;
Figure FDA0002345746020000021
in the formula
Figure FDA0002345746020000022
And
Figure FDA0002345746020000023
three low rank factor matrices U obtained in step S6c、UtAnd UfTransposing; and
(S8) the nuclear tensor G obtained in the step S7 and the e obtained in the step S1chConverting into one-dimensional signal, quantizing, encoding, and transmittingThe low rank factor matrix does not require coded transmission.
2. The method of claim 1, wherein the low-rank matrix U is applied to the sequence of frames in a subspacetThe size is T multiplied by Q, wherein Q is more than or equal to 1 and less than or equal to T.
3. The method of claim 2, wherein Q ═ T is determined.
4. The method of compressing a multi-channel spatial audio signal according to claim 1, wherein said time-frequency transform is an orthogonal transform.
5. The method of claim 4, wherein the orthogonal transform is preferably a discrete cosine transform.
6. The method of claim 1, wherein the low rank matrix U is a matrix of a plurality of spatial audio signalssIs an identity matrix I with a size M1×M1(ii) a The low rank matrix UcThe size is NxR, wherein R is more than or equal to 1 and less than or equal to N; the low rank matrix UfThe size is F multiplied by O, wherein O is more than or equal to 1 and less than or equal to F.
7. A multi-path spatial audio signal restoration method based on tensor modeling, comprising the following steps in addition to the steps S1 to S8 recited in claim 1:
(S9) is decoded and updated according to the formula Y ═ G1Uc×2Ut×3UfCarrying out tensor reconstruction to recover the original third-order tensor space Y', wherein Uc、UtAnd UfRespectively, received low-rank factor matrices which are not subjected to coding, and G is a received quantized and coded kernel tensor;
(S10) solving time-frequency inverse transformation opposite to the encoding time for Y' to obtain time domain representation of each frame of audio signal; and
(S11) overlapping and adding the audio signals on each frame time domain on each sound channel obtained in the step (S10) to restore a normalized multichannel signal, wherein the frame length is L, the frame shift is K, and the normalized multichannel spatial audio data is divided by the corresponding channel energy adjustment parameter by using the received quantized and coded channel energy adjustment parameter to restore the original multichannel spatial audio signal.
8. The tensor modeling-based multi-path spatial audio signal recovery method as recited in claim 7, further comprising a step of the multi-path spatial audio signal compression method as recited in any one of claims 2 to 6.
CN201710342387.2A 2017-05-16 2017-05-16 Multi-channel spatial audio signal compression and recovery method based on tensor modeling Active CN106981292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710342387.2A CN106981292B (en) 2017-05-16 2017-05-16 Multi-channel spatial audio signal compression and recovery method based on tensor modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710342387.2A CN106981292B (en) 2017-05-16 2017-05-16 Multi-channel spatial audio signal compression and recovery method based on tensor modeling

Publications (2)

Publication Number Publication Date
CN106981292A CN106981292A (en) 2017-07-25
CN106981292B true CN106981292B (en) 2020-04-14

Family

ID=59342790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710342387.2A Active CN106981292B (en) 2017-05-16 2017-05-16 Multi-channel spatial audio signal compression and recovery method based on tensor modeling

Country Status (1)

Country Link
CN (1) CN106981292B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107566383B (en) * 2017-09-12 2019-10-18 南京师范大学 A kind of Higher Dimensional Space Time field data live transmission method under limited network bandwidth constraint
CN108322858B (en) * 2018-01-25 2019-11-22 中国科学技术大学 Multi-microphone sound enhancement method based on tensor resolution
CN108595927B (en) * 2018-04-04 2023-09-19 北京市商汤科技开发有限公司 Identity authentication, unlocking and payment method and device, storage medium, product and equipment
WO2021063317A1 (en) * 2019-10-01 2021-04-08 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Tensor processing method and apparatus, electronic device
CN114235413B (en) * 2021-12-28 2023-06-30 频率探索智能科技江苏有限公司 Method for constructing multi-channel signal third-order tensor model
CN114067820B (en) * 2022-01-18 2022-06-28 深圳市友杰智新科技有限公司 Training method of voice noise reduction model, voice noise reduction method and related equipment
CN115309713B (en) * 2022-09-29 2022-12-23 江西锦路科技开发有限公司 Traffic data compression method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982805A (en) * 2012-12-27 2013-03-20 北京理工大学 Multi-channel audio signal compressing method based on tensor decomposition
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
CN104123948A (en) * 2013-04-25 2014-10-29 索尼公司 Sound processing apparatus, method, and program
CN104375976A (en) * 2014-11-04 2015-02-25 西安电子科技大学 Hybrid matrix recognition method in underdetermined blind source separation based on tensor regular decomposition
JP2015129785A (en) * 2014-01-06 2015-07-16 日本電信電話株式会社 encoding device, decoding device, encoding method, decoding method, and program
US9576583B1 (en) * 2014-12-01 2017-02-21 Cedar Audio Ltd Restoring audio signals with mask and latent variables

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982805A (en) * 2012-12-27 2013-03-20 北京理工大学 Multi-channel audio signal compressing method based on tensor decomposition
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
CN104123948A (en) * 2013-04-25 2014-10-29 索尼公司 Sound processing apparatus, method, and program
JP2015129785A (en) * 2014-01-06 2015-07-16 日本電信電話株式会社 encoding device, decoding device, encoding method, decoding method, and program
CN104375976A (en) * 2014-11-04 2015-02-25 西安电子科技大学 Hybrid matrix recognition method in underdetermined blind source separation based on tensor regular decomposition
US9576583B1 (en) * 2014-12-01 2017-02-21 Cedar Audio Ltd Restoring audio signals with mask and latent variables

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种新的基于张量表示和分解的多声道音频信号压缩方法(英文);王晶; 谢湘; 匡镜明;《中国通信》;20140331;全文 *

Also Published As

Publication number Publication date
CN106981292A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
CN106981292B (en) Multi-channel spatial audio signal compression and recovery method based on tensor modeling
CN102982805B (en) Multi-channel audio signal compressing method based on tensor decomposition
RU2377653C2 (en) Reversible overlap operator for efficient lossless data compression
CN107801026A (en) Method for compressing image and device, compression of images and decompression systems
CN104240712B (en) A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN109785847B (en) Audio compression algorithm based on dynamic residual error network
CN106373583B (en) Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM
CN102123278A (en) Signal source encoding method based on distributed compressive sensing technology
CN110739000B (en) Audio object coding method suitable for personalized interactive system
US9998763B2 (en) Compression of signals, images and video for multimedia, communications and other applications
CN104506752B (en) A kind of similar image compression method based on residual error compressed sensing
CN105741844B (en) A kind of digital audio watermarking algorithm based on DWT-SVD-ICA
CN105405445A (en) Parameter stereo coding, decoding method based on inter-channel transfer function
CN103237204A (en) Video signal collection and reconfiguration system based on high-dimension compressed sensing
CN107610710A (en) A kind of audio coding and coding/decoding method towards Multi-audio-frequency object
CN103903261A (en) Spectrum image processing method based on partition compressed sensing
CN106385584A (en) Spatial correlation-based distributed video compressive sensing adaptive sampling and coding method
CN101099669A (en) Electrocardiogram data compression method and decoding method based on optimum time frequency space structure code
CN104427349A (en) Bayer image compression method
CN101300633A (en) Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor
Ahmed et al. Audio compression using transforms and high order entropy encoding
CN115439565A (en) Image compression method based on Transformer
CN102036075A (en) Image and digital video coding and decoding methods
CN106056640B (en) The method for compressing image for combining compressed sensing is decomposed based on anatomic element
CN107622777B (en) High-code-rate signal acquisition method based on over-complete dictionary pair

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant