CN113096672B - Multi-audio object coding and decoding method applied to low code rate - Google Patents

Multi-audio object coding and decoding method applied to low code rate Download PDF

Info

Publication number
CN113096672B
CN113096672B CN202110312781.8A CN202110312781A CN113096672B CN 113096672 B CN113096672 B CN 113096672B CN 202110312781 A CN202110312781 A CN 202110312781A CN 113096672 B CN113096672 B CN 113096672B
Authority
CN
China
Prior art keywords
side information
audio object
module
decoding
code stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110312781.8A
Other languages
Chinese (zh)
Other versions
CN113096672A (en
Inventor
胡瑞敏
吴玉林
王晓晨
胡晨昊
柯善发
张灵鲲
刘文可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110312781.8A priority Critical patent/CN113096672B/en
Publication of CN113096672A publication Critical patent/CN113096672A/en
Application granted granted Critical
Publication of CN113096672B publication Critical patent/CN113096672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention discloses a multi-audio object coding and decoding method applied to low code rate, which comprises the steps of firstly transforming a plurality of input audio objects to a frequency domain in a coding stage; then, the audio object signals of the frequency domain are mixed down to obtain mixed signals, and a side information matrix after the sub-bands of the single audio object are subdivided is calculated; secondly, performing dimensionality reduction expression on the edge information matrix by using a coding module in the convolutional self-coder; and finally, synthesizing the mixed signal and the side information after dimensionality reduction into a code stream. In the decoding stage, firstly, received code streams are decomposed to obtain downmix signals and side information; then a dense connection module is introduced into a convolutional self-encoder decoder network to reconstruct original high-dimensional side information data from a low-dimensional structure of side information, and finally the reconstructed frequency domain audio object signal is converted into a time domain signal. The invention can comprehensively improve the decoding quality of the audio object signal under low code rate so as to meet the requirement of the user on the personalized control of the audio object.

Description

Multi-audio object coding and decoding method applied to low code rate
Technical Field
The invention belongs to the technical field of digital audio signal processing, and relates to an audio object coding and decoding method for compressing side information and reconstructing by using a convolution self-encoder and a dense connection mixed network, which is suitable for a spatial audio personalized interaction system under a low code rate and allows a user to adjust an audio object according to the requirement of the user.
Background
Three-dimensional (3D) audio represents an audio object with 3 degrees of freedom (e.g., azimuth, elevation, and distance). It can form sound images anywhere in 3D space. 3D audio technology is mainly used in entertainment systems to provide an immersive and personalized experience. Immersive spatial sound representation is divided into three types: channel-based coding techniques, higher-order ambient sound-based coding techniques, and object-based coding techniques. The channel-based sound representation is fed each channel signal to a loudspeaker that is fixed in position relative to the listener. Although channel-based coding techniques are well established, the audio content produced by the techniques is associated with a particular speaker configuration, and the techniques are limited by the number of channels and do not meet the user's needs for personalized manipulation of audio objects, especially in immersive scenes, such as virtual reality and augmented reality somatosensory interactive games. Higher-order ambient sound based coding techniques use coefficient signals to reconstruct a 3D spatial sound field. Although the coefficient signal has no direct relation to the channel or object, the coding techniques of the fundamental and higher order ambient sounds are not suitable for controlling a single object in a sound scene. Each audio object position in the object-based coding method is completely independent of the loudspeaker position, and an object signal is rendered to a target position by combining a personalized rendering system. The object-based encoding method overcomes the dependency of the generated audio content on the loudspeaker position. And realize the high immersive effect in the sound scene, for example, bird or helicopter fly through at the top of the head, rain falls from the sky, thunder comes from any direction listening effect. The object-based coding framework has been successfully used in Dolby Atmos.
A typical representative of Object-based Coding is Spatial Audio Object Coding (SAOC), and the core idea of SAOC is that only one downmix and side-information parameter is required for transmitting a plurality of Object signals, so that various Audio objects can be coded at a low bit rate at the same time. However, as the number of audio objects increases and the bitrate is lower, the SAOC reconstructed audio objects will bring spectral aliasing.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multi-audio object coding and decoding method applied to low code rate, which can comprehensively improve the decoding quality of audio object signals and improve the coding efficiency under low code rate.
The invention provides a multi-audio object coding and decoding method applied to low code rate, which is used for the dimension reduction expression of audio object side information, wherein the dimension reduction expression of the audio object side information comprises the following steps:
step A1: performing time domain-frequency domain transformation on J input independent audio signals through Modified Discrete Cosine Transform (MDCT) to obtain frequency spectrums of object signals;
step A2: performing fine sub-band division on each frame of frequency spectrum data obtained in the step A1; determining the number of fine sub-band partitions according to the influence of the number of sub-bands on the frequency spectrum aliasing distortion;
step A3: calculating the downmix signals of all objects for the sub-band in the step A2 to obtain a downmix signal code stream;
step A4: calculating the side information of each object for the sub-band in the step A2 to obtain a side information matrix;
step A5: transmitting the side information matrix obtained from A4 into an encoder module of a convolution self-encoder to obtain a low-dimensional feature expression result R of the side information of the audio object, and then quantizing the side information value according to a table look-up method to obtain a side information code stream;
step A6: and D, synthesizing the code streams obtained in the step A3 and the step A5 into an output code stream, and transmitting the output code stream to a decoding end.
The invention provides a multi-audio object coding and decoding method applied to a low code rate, which is used for reconstructing original high-dimensional data from a low-dimensional structure and specifically comprises the following steps:
step B1: decomposing the received code stream to obtain a down-mixing signal code stream and a side information code stream;
step B2: decoding the down-mixing signal code stream obtained in the step B1 to obtain a down-mixing signal;
step B3: b1, the side information code stream obtained in the step B is subjected to dequantization operation to obtain side information;
step B4: inputting the side information obtained in the step B3 into a convolutional self-encoder decoder module with a dense connection module to obtain reconstructed audio object side information;
step B5: obtaining a reconstructed audio object spectrum according to the downmix signal obtained by the B2 and the object side information obtained by the B4;
step B6: and performing Inverse Modified Discrete Cosine Transform (IMDCT) processing according to the audio object frequency spectrum obtained by the B5 to obtain a reconstructed time domain signal of a single object.
Compared with the existing audio object coding, the invention has the advantages that: the effective characteristics of the side information are extracted from the coding module of a Convolutional Auto Encoder (CAE), and the dimension of the side information parameter is reduced to save the bit rate. And dense connections (DenseNet) are introduced in the decoding module of the convolutional auto-encoder to enhance feature transfer between layers of the decoding neural network. Thereby well reconstructing the audio object. Therefore, the invention can comprehensively improve the decoding quality of the audio object signals under low code rate so as to meet the requirement of the user on the personalized control of the audio object.
Drawings
FIG. 1 is a flow chart of encoding according to an embodiment of the present invention.
Fig. 2 is a decoding flow diagram of an embodiment of the present invention.
FIG. 3 is a block diagram of a convolutional autoencoder model structure according to an embodiment of the present invention.
Detailed Description
For the convenience of those skilled in the art to understand and implement the present invention, the following technical solutions will be further described with reference to the accompanying drawings and specific embodiments, it should be understood that the embodiments described herein are only used for illustrating and explaining the present invention, and are not used for limiting the present invention:
the invention develops research on the basis of the existing audio object coding method and provides a multi-audio object coding and decoding method applied to low code rate. The method comprises the steps of firstly utilizing an encoding module in a convolution self-encoder to carry out dimension reduction expression on side information, then introducing dense connection into a decoding module of the convolution self-encoder to enhance feature transfer among layers of a decoding neural network, and realizing reconstruction of original high-dimensional side information data from a low-dimensional structure of the side information, so that the low-dimensional features of the side information are fully utilized, and the aim of reducing code rate is fulfilled.
The invention provides a multi-audio object coding and decoding method applied to low code rate, which comprises a coding method and a decoding method;
referring to fig. 1, the encoding method of the present embodiment is specifically implemented by the following steps:
step A1: input as a time-domain signal S of a plurality of audio objects1,S2,…,SJFor different kinds of audio object signals, such as drum set, bass, human voice, etc., the sampling frequency is 44.1kHz, the bit depth is 16 bits, and the audio format is the wav format.
In this embodiment, J independent audio signals S are inputted1,S2,…,SJPerforming time-frequency domain transformation by improving discrete cosine transform (MDCT) to obtain frequency spectrum O of object signal1,O2,…,OJ
In this embodiment, time-domain-frequency-domain conversion is performed on the audio object signal in the time domain through 2048-point modified discrete cosine transform MDCT during time synchronization to obtain a spectrum matrix of a single object, where the number of rows (number of columns) of the matrix is equal to the number of frames, and the number of columns (number of rows) is equal to the number of frequency points.
It should be noted that the frame length, the type of window function, the transformation method, etc. specified herein are only for illustrating the specific implementation steps of the present invention, and are not used to limit the present invention.
Step A2: for the spectrum O obtained in step A11,O2,…,OJCarrying out fine sub-band division on each frame of data;
in the embodiment, according to the influence of the number of subbands on the aliasing distortion of the restored audio object frequency spectrum, the evaluation index SDR is used for determining the number of fine subband divisions.
In this embodiment, since ERB divides each frame signal into 28 subbands, each subband is uniformly subdivided into 10 subbands on the basis of 2ERB subbands.
It should be noted that the number of sub-bands specified herein is only for illustrating the specific implementation flow of the present invention, and is not used to limit the present invention.
Step A3: calculating the downmix signals of all objects for the sub-band in the step A2 to obtain a downmix signal code stream;
in this embodiment, the spectrum information of all objects is subjected to matrix addition to obtain downmix signal data, and the calculation of the downmix signal is as follows:
Figure GDA0003583833260000041
sign () is a sign function, and is used for obtaining a sign of a variable; o isj(i, m) is the spectrum information of the jth object, j is the number of the object, and b is the number of the frequency point.
In this embodiment, the downmix signal is encoded by an AAC encoder, and the code rate is controlled to 128kbps, so as to obtain a downmix signal code stream;
it should be noted that the use of AAC 128kbps coding for the final downmix signal is merely to illustrate the specific implementation steps of the present invention and is not intended to limit the present invention.
Step A4: for the sub-band in the step A2, calculating the side information of each object to obtain a side information matrix G1,G2,…,GJ
In this embodiment, the side information of the object is
Figure GDA0003583833260000042
Wherein, Pj(I, B) represents the energy of object J in subband (I, B), I is the total frame number, J is the number of objects, B is the subband number; i is more than or equal to 1 and less than or equal to I, J is more than or equal to 1 and less than or equal to J, and B is more than or equal to 1 and less than or equal to B.
Step A5: side information matrix G obtained for A41,G2,…,GJTransmitting the audio object side information to an encoder module of a convolution self-encoder to obtain a low-dimensional feature expression result R of the audio object side information and obtain a side information code stream;
in the embodiment, the encoder module of the convolutional self-encoder is used for carrying out dimension reduction expression on the side information, so that the data volume of the side original information is reduced. And then quantizing the edge information value according to a table look-up method, and finally forming a code stream by the corresponding quantization index for output.
Step A6: and (C) synthesizing the code streams obtained in the step (A3) and the step (A5) into an output code stream, and transmitting the output code stream to a decoding end.
The synthesizing of the output code stream in this embodiment means integrating the code stream of the final downmix signal with the side information code stream. And finally, the down-mixing signal code stream refers to an output code stream after AAC coding, and the side information code stream refers to a quantization index code stream output by the encoder module of the convolutional self-encoder.
Referring to fig. 2, the decoding method of the present embodiment includes the following steps:
step B1: decomposing the received code stream to obtain a down-mixing signal code stream and a side information code stream;
in this embodiment, the downmix signal code stream and the side information code stream are obtained by parsing the code stream according to the code stream received by the decoding end.
Step B2: b1, carrying out AAC decoding on the down-mixed signal code stream obtained in the step B to obtain a down-mixed signal;
step B3: b1, the side information code stream obtained in the step B is subjected to dequantization operation to obtain side information;
step B4: inputting the side information obtained in the step B3 into a convolutional self-encoder decoder module with a dense connection module to obtain reconstructed audio object side information;
in this embodiment, the side information obtained in step B3 is input into a decoder module of a convolutional self-encoder, wherein a dense connection network is added into the decoder module of the convolutional self-encoder to enhance feature transfer between layers of a decoding neural network, so as to obtain reconstructed side information of the audio object
Figure GDA0003583833260000051
The original high-dimensional side information data is reconstructed from the low-dimensional structure of the side information, the low-dimensional characteristics of the side information are fully utilized, and the aim of reducing the code rate is fulfilled.
Referring to fig. 3, in the embodiment of the present invention, a dense connection network is added to a convolutional autocoder decoding module, and the structure includes three modules: module 1, module 2, and module 3;
the module 1 consists of a convolution layer, a remolding layer, a pooling layer and a flattening layer and is used for extracting characteristics of input side information data through a convolutional neural network, compressing the extracted characteristics by utilizing a pooling technology, and further performing low-dimensional expression processing on the characteristics by the convolution layer;
the module 2 consists of a remolding layer, a deconvolution layer and a deconvolution layer, wherein the remolding layer is densely connected with the two deconvolution layers and is used for decoding the low-dimensional expression of the side information data characteristics, and the dense connection is introduced to enhance the characteristic transfer among the layers of the decoding neural network;
module 3 consists of an deconvolution layer, a remodeling layer and a convolution layer for further decoding of the low-dimensional expression of the side information data features, which can be seen as the inverse operation of module 1.
In this embodiment, the decoded side information is input to a decoding portion of a convolutional auto-encoder that introduces dense connections, and high-dimensional side information data is reconstructed from a low-dimensional side information structure.
Step B5: obtaining a reconstructed audio object spectrum according to the downmix signal obtained by the B2 and the object side information obtained by the B4;
in this embodiment, the reconstructed audio object spectrum
Figure GDA0003583833260000052
Wherein the content of the first and second substances,
Figure GDA0003583833260000053
is the frequency domain of the reconstructed audio object j,
Figure GDA0003583833260000054
is a down-mix signal that has been coded and decoded,
Figure GDA0003583833260000055
is the dequantized side information; m is the number of the frequency points, Ab-1And Ab-1 represents the start and end frequency points of subband b; i is more than or equal to 1 and less than or equal to I, J is more than or equal to 1 and less than or equal to J, B is more than or equal to 1 and less than or equal to B, Ab-1≤m≤Ab-1。
Step B6: audio object spectra obtained from B5
Figure GDA0003583833260000061
Performing Inverse Modified Discrete Cosine Transform (IMDCT) processing to obtain reconstructed time domain signal of single object
Figure GDA0003583833260000062
In this embodiment, frequency domain-time domain transform is performed by using inverse modified discrete cosine transform IMDCT, and finally a time domain signal of a reconstructed audio object is obtained.
The invention extracts the effective characteristics of the side information from the coding module of the Convolutional Auto Encoder (CAE), and reduces the dimension of the side information parameters to save the bit rate. And dense connections are introduced into a decoding module of the convolutional self-encoder, so that feature transfer between layers of the decoding neural network is enhanced. Thereby well reconstructing the audio object. Therefore, the invention can comprehensively improve the decoding quality of the audio object signals under low code rate so as to meet the requirement of the user on the personalized control of the audio object.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A multi-audio object coding and decoding method applied under low code rate is characterized in that: comprises an encoding method and a decoding method;
the coding method is specifically realized by the following steps:
step A1: performing time domain-frequency domain transformation on J input independent audio signals through Modified Discrete Cosine Transform (MDCT) to obtain frequency spectrums of object signals;
step A2: performing fine sub-band division on each frame of frequency spectrum data obtained in the step A1; determining the number of fine sub-band partitions according to the influence of the number of sub-bands on the aliasing distortion of the frequency spectrum;
step A3: calculating the downmix signals of all objects for the sub-band in the step A2 to obtain a downmix signal code stream;
step A4: calculating the side information of each object for the sub-band in the step A2 to obtain a side information matrix;
step A5: transmitting the side information matrix obtained from A4 into an encoder module of a convolution self-encoder to obtain a low-dimensional feature expression result R of the side information of the audio object, and then quantizing the side information value according to a table look-up method to obtain a side information code stream;
step A6: synthesizing the code streams obtained in the step A3 and the step A5 into an output code stream, and transmitting the output code stream to a decoding end;
the decoding method is specifically realized by the following steps:
step B1: decomposing the received code stream to obtain a down-mixing signal code stream and a side information code stream;
step B2: decoding the down-mixing signal code stream obtained in the step B1 to obtain a down-mixing signal;
step B3: b1, the side information code stream obtained in the step B is subjected to dequantization operation to obtain side information;
step B4: inputting the side information obtained in the step B3 into a convolutional self-encoder decoder module with a dense connection module to obtain the reconstructed audio object side information;
step B5: obtaining a reconstructed audio object spectrum according to the downmix signal obtained by the B2 and the object side information obtained by the B4;
step B6: carrying out Inverse Modified Discrete Cosine Transform (IMDCT) processing according to the audio object frequency spectrum obtained by the B5 to obtain a reconstructed time domain signal of a single object;
wherein, a dense connection network is added in a decoding module of the convolution self-encoder to reconstruct original high-dimensional side information data from a low-dimensional structure of the side information;
the convolutional self-encoder decoding module is added with a dense connection network, and the structure of the convolutional self-encoder decoding module comprises three modules: module 1, module 2, and module 3;
the module 1 consists of a convolution layer, a remolding layer, a pooling layer and a flattening layer and is used for extracting features of input side information data through a convolutional neural network, compressing the extracted features by utilizing a pooling technology, and further performing low-dimensional expression processing on the features by the convolution layer;
the module 2 consists of a remolding layer, a deconvolution layer and a deconvolution layer, wherein the remolding layer is densely connected with the two deconvolution layers and is used for decoding the low-dimensional expression of the edge information data characteristics;
the module 3, which is composed of an deconvolution layer, a remodeling layer and a convolution layer, is used for further decoding the low-dimensional expression of the side information data characteristics, and the operation is the reverse operation of the module 1.
2. The method of claim 1, wherein the method comprises: in step a1, a time-frequency domain transform is performed on the audio object signal in the time domain by a 2048-point modified discrete cosine transform MDCT to obtain a spectrum of a single object.
3. The method of claim 1, wherein the method comprises: in step a2, the evaluation index SDR is used to determine the number of fine subband divisions according to the influence of the number of subbands on the aliasing distortion of the restored audio object spectrum.
4. The method of claim 1, wherein the method comprises: in step a3, the spectral information of all objects is matrix-added to obtain downmix signal data.
5. The method for multi-audio-object coding and decoding at low code rate according to any of claims 1-4, wherein: in step A4, the edge of the objectThe information is
Figure FDA0003583833250000021
Wherein, Pj(I, B) represents the energy of object J in subband (I, B), I is the total frame number, J is the number of objects, B is the subband number; i is more than or equal to 1 and less than or equal to I, J is more than or equal to 1 and less than or equal to J, and B is more than or equal to 1 and less than or equal to B.
6. The method of claim 1, wherein the method comprises: in step B2, the AAC is used to decode the down-mix signal code stream to obtain the down-mix signal before encoding.
7. The method of claim 5, wherein the method comprises: in step B5, the reconstructed audio object spectrum
Figure FDA0003583833250000022
Wherein the content of the first and second substances,
Figure FDA0003583833250000023
is the frequency domain of the reconstructed audio object j,
Figure FDA0003583833250000024
is a down-mix signal that has been coded and decoded,
Figure FDA0003583833250000025
is the dequantized side information; m is the number of the frequency points, Ab-1And Ab-1 represents the start and end frequency points of subband b; i is more than or equal to 1 and less than or equal to I, J is more than or equal to 1 and less than or equal to J, B is more than or equal to 1 and less than or equal to B, Ab-1≤m≤Ab-1。
8. The method of claim 1, wherein the method comprises: in step B6, frequency domain-time domain transform is performed by using inverse modified discrete cosine transform IMDCT to finally obtain a time domain signal of the reconstructed audio object.
CN202110312781.8A 2021-03-24 2021-03-24 Multi-audio object coding and decoding method applied to low code rate Active CN113096672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110312781.8A CN113096672B (en) 2021-03-24 2021-03-24 Multi-audio object coding and decoding method applied to low code rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110312781.8A CN113096672B (en) 2021-03-24 2021-03-24 Multi-audio object coding and decoding method applied to low code rate

Publications (2)

Publication Number Publication Date
CN113096672A CN113096672A (en) 2021-07-09
CN113096672B true CN113096672B (en) 2022-06-14

Family

ID=76669589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110312781.8A Active CN113096672B (en) 2021-03-24 2021-03-24 Multi-audio object coding and decoding method applied to low code rate

Country Status (1)

Country Link
CN (1) CN113096672B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610710A (en) * 2017-09-29 2018-01-19 武汉大学 A kind of audio coding and coding/decoding method towards Multi-audio-frequency object
CN108596213A (en) * 2018-04-03 2018-09-28 中国地质大学(武汉) A kind of Classification of hyperspectral remote sensing image method and system based on convolutional neural networks
CN110739000A (en) * 2019-10-14 2020-01-31 武汉大学 Audio object coding method suitable for personalized interactive system
CN111476342A (en) * 2019-01-23 2020-07-31 斯特拉德视觉公司 CNN method and device using 1xH convolution
CN111508524A (en) * 2020-03-05 2020-08-07 合肥工业大学 Method and system for identifying voice source equipment
CN112365896A (en) * 2020-10-15 2021-02-12 武汉大学 Object-oriented encoding method based on stack type sparse self-encoder

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1691348A1 (en) * 2005-02-14 2006-08-16 Ecole Polytechnique Federale De Lausanne Parametric joint-coding of audio sources

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610710A (en) * 2017-09-29 2018-01-19 武汉大学 A kind of audio coding and coding/decoding method towards Multi-audio-frequency object
CN108596213A (en) * 2018-04-03 2018-09-28 中国地质大学(武汉) A kind of Classification of hyperspectral remote sensing image method and system based on convolutional neural networks
CN111476342A (en) * 2019-01-23 2020-07-31 斯特拉德视觉公司 CNN method and device using 1xH convolution
CN110739000A (en) * 2019-10-14 2020-01-31 武汉大学 Audio object coding method suitable for personalized interactive system
CN111508524A (en) * 2020-03-05 2020-08-07 合肥工业大学 Method and system for identifying voice source equipment
CN112365896A (en) * 2020-10-15 2021-02-12 武汉大学 Object-oriented encoding method based on stack type sparse self-encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张钢等.基于多尺度AlexNet网络的健康因子构建方法.《系统工程与电子技术》.2020,(第01期), *

Also Published As

Publication number Publication date
CN113096672A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
US11798568B2 (en) Methods, apparatus and systems for encoding and decoding of multi-channel ambisonics audio data
CA2697830C (en) A method and an apparatus for processing a signal
JP6346278B2 (en) Audio encoder, audio decoder, method, and computer program using joint encoded residual signal
JP2022160597A (en) Apparatus and method for stereo filling in multichannel coding
EP2297728B1 (en) Apparatus and method for adjusting spatial cue information of a multichannel audio signal
CN107610710B (en) Audio coding and decoding method for multiple audio objects
CN109448741B (en) 3D audio coding and decoding method and device
CN110739000B (en) Audio object coding method suitable for personalized interactive system
WO2008100099A1 (en) Methods and apparatuses for encoding and decoding object-based audio signals
US20220139409A1 (en) Audio scene encoder, audio scene decoder and related methods using hybrid encoder-decoder spatial analysis
CN110660401B (en) Audio object coding and decoding method based on high-low frequency domain resolution switching
JP2022548038A (en) Determining Spatial Audio Parameter Encoding and Related Decoding
EP2489036B1 (en) Method, apparatus and computer program for processing multi-channel audio signals
CN108417219B (en) Audio object coding and decoding method suitable for streaming media
CN113314132B (en) Audio object coding method, decoding method and device in interactive audio system
CN113096672B (en) Multi-audio object coding and decoding method applied to low code rate
CN112365896B (en) Object-oriented encoding method based on stack type sparse self-encoder
CN113314131B (en) Multistep audio object coding and decoding method based on two-stage filtering
CN113314130B (en) Audio object coding and decoding method based on frequency spectrum movement
US20240127831A1 (en) Methods, apparatus and systems for encoding and decoding of multi-channel ambisonics audio data
CN117136406A (en) Combining spatial audio streams

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant