CN110739000A - Audio object coding method suitable for personalized interactive system - Google Patents

Audio object coding method suitable for personalized interactive system Download PDF

Info

Publication number
CN110739000A
CN110739000A CN201910972165.8A CN201910972165A CN110739000A CN 110739000 A CN110739000 A CN 110739000A CN 201910972165 A CN201910972165 A CN 201910972165A CN 110739000 A CN110739000 A CN 110739000A
Authority
CN
China
Prior art keywords
matrix
code stream
audio
objects
side information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910972165.8A
Other languages
Chinese (zh)
Other versions
CN110739000B (en
Inventor
胡瑞敏
胡晨昊
王晓晨
武庭照
吴玉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910972165.8A priority Critical patent/CN110739000B/en
Publication of CN110739000A publication Critical patent/CN110739000A/en
Application granted granted Critical
Publication of CN110739000B publication Critical patent/CN110739000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an audio object coding method suitable for a personalized interactive system, which comprises the steps of firstly, framing and converting a plurality of audio objects to be coded from a time domain to a frequency domain, sequencing according to the energy of each object, determining the coding sequence of the objects, circularly extracting each step of coded objects and corresponding downmix signals, calculating parameters and residual errors of each step according to the parameters, decomposing large-size residual matrixes by using singular values, decompressing the final mixed signals, the parameters and the residual decomposition matrixes into code streams, reconstructing the residual errors by using the decomposition matrixes in a decoding stage, and then gradually decoding and reconstructing the objects from the downmix signals according to the residual errors and the parameters of each object.

Description

Audio object coding method suitable for personalized interactive system
Technical Field
The invention belongs to the technical field of digital audio signal processing, and particularly relates to an multi-step progressive downmixing and reconstructed audio object coding and decoding method which is suitable for a personalized interactive system of spatial audio and allows a user to adjust an audio object according to the requirement of the user.
Background
The spatial audio technology based on channel coding can realize coding and reconstruction of three-dimensional audio scenes, and can provide more immersive auditory experience than mono or stereo audio technologies, such as MPEG spatial audio coding, NHK22.2 speaker arrays and the like, so that the spatial audio technology is more and more popular with people.
Many internationally scholars and research institutes have conducted research work on audio object coding, and proposed various audio object coding methods. The most representative of these is Spatial audio object joint coding (SAOC) proposed by Fraunhofer, the german well-known research institute [ document 1], which encodes a downmix signal transmitting a plurality of audio objects and side information, and separates and reconstructs the audio objects from the downmix signal based on the side information at a decoding end. The SAOC method can transmit a large number of audio objects at a low bit rate, greatly improving the coding efficiency of the audio objects, and enabling a user to perform personalized adjustment and interaction according to the listening needs of the user [ document 2 ].
In the SAOC framework, in order to obtain a lower coding bit rate, the same parameters are used as side information in the same subband, which results in aliasing distortion in the frequency domain, and severely degrades the hearing experience, for example, audio object signals may contain other object signal components to be mixed when played [ document 3 ]. even, this problem may affect the spatial audio personalized interactive service at the subsequent user end.
Document 1: breebaart, J., Engdeg. ard, J., Falch, C., et al., Spatial Audio object coding (saoc) -the upper case standard on parameter object based Audio coding. in: Audio Engineering Society Convention 124.Audio Engineering Society (2008).
Document 2: coleman, P., Franck, A., Francombe, J., et al, An audio-visual system for object based audio: From recording to listing. IEEE Transactions on multimedia 20(8), 1919-.
Document 3: wu, T., Hu, R., Wang, X., Ke, S.: Audio object coded based on optimal parameter frequency resolution. multimedia Tools and Applications pp.1-16(2019). Ref.4: spatial audio objects with two-step coding structure for interactive audio service IEEETransactions on Multimedia 13(6),1208-1216(2011).
Document 5: lee, B., Kim, K., Hahn, M. effective residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. E.E. TRANSACTIONS on Information and Systems 99(7), 1949-.
Disclosure of Invention
In order to solve the technical problems, the invention provides audio object coding and decoding methods for multi-step progressive downmixing and reconstruction, which can perform high-quality audio coding and decoding at medium and low bit rates and ensure that all audio objects have good decoding tone quality.
audio object coding method suitable for personalized interactive system, characterized by comprising the following steps:
step A1: performing frame windowing on an input audio object sequence, converting a time domain signal into a frequency domain signal, and obtaining a time-frequency matrix of each audio object;
step A2: according to the time-frequency matrix of each object, calculating the frequency domain energy of the objects to sort, and determining the object to be coded in each step in multi-step progressive coding;
step A3, according to the determined coding sequence, gradually down-mixing and calculating corresponding side information, wherein the step-by-step down-mixing refers to adding matrixes to data of objects input in the current processing flow to obtain sum matrixes, the step-by-step down-mixing signals are not transmitted as transmission code streams, the side information comprises object residual errors and object gain parameter matrixes, and the object gain parameters are calculated through the energy ratio of two input signals in an object pair;
step A4: decomposing the object residual error in the side information into a left singular matrix, a right singular matrix and singular values by singular value decomposition;
step A5: quantizing the singular matrix, the singular value and the object gain parameter to obtain a side information code stream;
step A6: coding the final downmix signal in the step A3 to obtain a downmix signal code stream;
step A7: and synthesizing the code streams obtained in the step A5 and the step A6 into an output code stream, and transmitting the output code stream to a decoding end.
Compared with the existing audio object coding technology, the invention has the advantages that: multi-step progressive encoding and decoding are utilized, residual errors are utilized to compensate decoding distortion to the maximum extent, and each audio object is guaranteed to have good listening quality; and simultaneously, singular value decomposition is introduced to decompress residual error information in a dividing mode, so that the code rate is reduced. Therefore, the invention can ensure that high-quality audio objects are obtained by decoding under medium and low code rates so as to meet the use requirements of the audio personalized interaction system.
Drawings
FIG. 1 is a diagram of the encoding principle of an embodiment of the present invention;
fig. 2 is a decoding schematic diagram of an embodiment of the present invention.
Detailed Description
To facilitate understanding and practice of the present invention for those skilled in the art, the following technical solution is described with reference to the accompanying drawings and specific examples, it should be understood that the examples described herein are only for illustration and explanation of the present invention and are not intended to limit the present invention:
firstly, according to the optimal coding sequence of the object frequency domain energy research, determining the object which needs to be coded and calculate side information in each step, finally obtaining the residual error information of each object, effectively reducing the signal distortion and confusion of all reconstructed objects, and then dividing the residual error information into three low-dimensional matrixes by using a singular value decomposition method, thereby achieving the purposes of compressing the residual error information and reducing the bit rate.
Referring to fig. 1, the present invention proposes a multi-audio object coding method adapted to a personalized interactive system, where the present embodiment is illustrated by inputting A, B, C, D four objects, and the specific embodiment includes the following steps:
step A1: inputting audio objects A, B, C, D (which may include various objects such as human voice, piano, guitar, etc.), framing and windowing each object, converting the time domain signal to the frequency domain signal, and obtaining a time-frequency matrix of each audio object;
in this embodiment, an -dimensional sound signal in an original time domain is converted into a two-dimensional spectrogram in a frequency domain by framing, windowing and modified discrete cosine transform MDCT, and the obtained object data in a matrix form is output.
The input audio object signal sample rate is 44.1Khz, bit depth is 16 bits, wav audio format.
It should be noted that the audio parameters and object types specified herein are only for illustrating the implementation process of the present invention, and are not used to limit the present invention.
In the frame windowing, each frame is 1024 in length, a hanning window is selected as a window function, and 50% of time domains are overlapped; selecting Modified Discrete Cosine Transform (MDCT) by time-frequency transform, wherein the transform length is 2048 points; finally, a plurality of audio object signals in the form of a matrix are output, wherein the number of rows of the matrix is equal to the number of frames (or the number of columns is equal to the number of frames), and the number of columns of the matrix is equal to the number of frequency points (or the number of rows is equal to the number of frequency points).
It should be noted that the frame length, the type of window function, the transformation method, etc. specified herein are only for illustrating the specific implementation steps of the present invention, and are not used to limit the present invention.
Step A2: according to the time-frequency matrix of each object, calculating the frequency domain energy of the objects to sort, and determining the object to be coded in each step in multi-step progressive coding;
in the embodiment, according to the object data in the form of a matrix, the frequency domain energy of the object is calculated, a large-to-small energy sorting mode is selected, and the sequence of the object to be coded in each step is determined; the coding order refers to the priority of coding audio objects with larger energy.
The calculation of the object frequency domain energy is shown as follows:
Figure BDA0002232447780000041
wherein, | | SiI | represents the total energy of the ith audio object, OiRepresenting the proportion of the ith object in the total energy of all the objects; according to each object OiThe values are sorted from big to small in the order of D (S)1)、B(S2)、A(S3)、C(S4) Preferably encoding OiObjects with large values; it should be noted that i ∈ [1, 4] specified here]And the order of the steps from large to small, are merely examples of the specific implementation steps of the present invention and are not intended to limit the present invention.
Step A3: according to the coding sequence, gradually down-mixing and calculating corresponding side information (object residual error, singular matrix and singular value);
in the embodiment, the step-by-step down mixing refers to performing matrix addition on data by using an object input in the current processing flow to obtain sum matrixes, wherein step-by-step down mixing signals are not transmitted as a transmission code stream, and side information comprises an object residual error and an object gain parameter matrix, wherein the object gain parameter is obtained by calculating the energy ratio of two input signals in an object pair;
the calculation formula of the object residual and the object gain parameter is as follows:
Figure BDA0002232447780000042
Figure BDA0002232447780000051
wherein R (i) is the residual signal of the i +1 th object, Go(i) Gain parameter for the i +1 th object, Gd(i) A gain parameter for an ith downmix signal; x in the formulaiRepresenting the downmix signal, P, obtained in step io(i) Is the energy of object i, Pd(i) Is the energy of the downmix signal of the ith step. In this embodiment, N is 4, which indicates the number of objects to be encoded.
It should be noted that the number N of objects defined herein is 4, which is merely an example of the implementation steps of the present invention and is not used to limit the present invention.
In connection with this example, the multi-step down-mix calculation procedure according to the above formula determined in step A2 is as follows, step , down-mix and parameter extraction is performed with object D, B as object pair (in step , D is regarded as down-mix signal for calculation), and the down-mix signal X of two objects is obtained1And calculating to obtain a gain parameter G of the second object Bo(1) And its residual R (1); second, down-mix signal X1A is taken as an object pair to carry out down mixing and parameter extraction to obtain a down mixing signal X of the second step2And calculating a gain parameter G of a third object Ao(2) And its residual R (2); third, down-mix signal X2C, performing down-mixing and parameter extraction on the object pair to obtain a down-mixing signal X of the third step3(i.e., the final downmix signal that needs to be transmitted to the decoding end), and calculates a gain parameter G of the fourth object Co(3) And its residual R (3). At this point, the four objects complete the down-mixing and parameter extraction through the above three steps.
It should be noted that the encoding sequence and the number of steps specified herein are only for illustrating the specific implementation steps of the present invention, and are not used to limit the present invention.
Step A4: decomposing the object residual in the side information into a coefficient matrix and a kernel vector by using singular value decomposition;
in the embodiment, the dimension reduction compression is carried out on the residual error matrixes of a plurality of objects by a singular value decomposition method, so that the data volume increase caused by residual error information is reduced; the residual matrix is decomposed into three small matrixes which are a left singular matrix, a singular value matrix and a right singular matrix respectively; wherein the singular value matrix transmits only the values on the matrix diagonal.
SVD is a matrix eigenvalue decomposition, a matrix decomposition method for reducing a matrix into its constituent parts, so that a high-dimensional matrix is decomposed into several low-dimensional matrices for representation, and the purpose of data compression is achieved.
Figure BDA0002232447780000052
Figure BDA0002232447780000061
Wherein, R (i)P×QThe residual signal of the (i + 1) th object is obtained, the row number P is halves of the MDCT transformation length, the column number Q is the frame number of the audio object, U is a left singular matrix, Lambda is a singular value matrix, V is a right singular value matrix, and the singular values on the diagonal line in the Lambda matrix are sorted from large to small.
For dimensionality reduction, the first r singular values (r-50) and the corresponding singular matrix approximation r (i) may be selected as follows:
Figure BDA0002232447780000062
Figure BDA0002232447780000063
wherein the content of the first and second substances,
Figure BDA0002232447780000064
which is the portion of the matrix of singular values,
Figure BDA0002232447780000065
and
Figure BDA0002232447780000066
first 5 of the original left and right singular matricesRow (or column) 0. Residual signals can be approximately represented by the three matrixes, matrix dimensionality is reduced, and side information data volume is compressed.
It should be noted that r-50 is only given to illustrate the specific implementation steps of the present invention and is not used to limit the present invention.
Step A5: quantizing the singular value, the singular matrix and the object gain parameter to obtain a side information code stream;
in the quantization operation, the value ranges of elements in the residual decomposition matrix and the gain parameter are different, so that the quantization table is unified by performing quantization before quantization, then the closest quantization value is searched in the quantization table according to the size of each element value, and the corresponding quantization index is output as a side information quantization code stream.
Step A6: coding the final downmix signal in the step A3 to obtain a downmix signal code stream;
in this embodiment, the final downmix signal is a basis for reconstructing the object signal at the decoding end, and is encoded by using AAC128 k.
It should be noted that the AAC128k coding of the final downmix signal is only to illustrate the specific implementation steps of the present invention and is not used to limit the present invention.
Step A7: and synthesizing the code streams obtained in the step A5 and the step A6 into an output code stream, and transmitting the output code stream to a decoding end.
Referring to fig. 2, the invention also provides multi-audio object decoding methods suitable for a personalized interactive system, wherein the embodiment is exemplified by inputting A, B, C, D four objects, and the specific implementation example comprises the following steps:
step B1: analyzing the received code stream to obtain a side information code stream and a final downmix signal code stream;
in this embodiment, parsing the code stream refers to performing a back-stepping according to a method for synthesizing the output code stream to obtain a final downmix signal code stream and a side information code stream.
Step B2: carrying out AAC decoding on the down-mixed signal code stream to obtain a down-mixed signal;
in this embodiment, the final downmix signal code stream is a data stream obtained after AAC encoding and compressing, and the final downmix signal before transmission can be obtained after AAC decoding.
Step B3: the side information code stream is dequantized to obtain a left singular matrix, a right singular matrix, singular values and object gain parameters;
in this embodiment, the side information is classified into when quantization is performed, and is classified into when dequantization is performed.
Step B4: performing matrix synthesis on the left singular matrix, the right singular matrix and the singular value to recover an object residual error;
in this embodiment, the matrix synthesis is to multiply the left singular matrix, the singular value matrix, and the right singular matrix to obtain an approximate object residual, which is specifically shown in the formula:
Figure BDA0002232447780000071
Figure BDA0002232447780000072
step B5: decoding backward according to the coding order, and circularly reconstructing an audio object frequency domain signal from the transmission downmix signal by using the side information;
separating the object from the corresponding downmix signal by using the object gain parameter, and calculating with the residual signal to compensate for aliasing distortion to obtain a reconstructed audio object frequency domain signal, as shown in the following formula:
Figure BDA0002232447780000073
Figure BDA0002232447780000074
Figure BDA0002232447780000075
wherein, S'iIs a reconstructed frequency domain object signal, X'iIs a reconstructed progressive downmix signal, Gd(i) For each step corresponds to a gain parameter of the downmix signal.
Figure BDA0002232447780000076
Is the residual information obtained by matrix synthesis at the decoding end, i.e. the work done in step B4. The decoding order of the objects is opposite to the encoding order, each object being analytically reconstructed from the stepwise downmix signal in a corresponding decoding step.
In connection with the present example, the multi-step progressive reconstruction of the object according to the above equations (8), (9) and (10) according to the decoding order determined in step B5 is as follows, step , using the gain parameter Go(3) And its residual error
Figure BDA0002232447780000081
From the final downmix signal X3Middle reconstructed object C (i.e., S'4) Using the gain parameter Gd(3) From the final downmix signal X3The reconstruction obtains a progressive down-mixing signal X'2(ii) a Secondly, gain parameter Go (2) and residual error thereof are utilized
Figure BDA0002232447780000082
From the progressive downmix Signal X'2Middle reconstructed object A (i.e., S'3) Using the gain parameter Gd(2) From most gradually downmix signal X'2The reconstruction obtains a progressive down-mixing signal X'1(ii) a Third, using the gain parameter Go(1) And its residual error
Figure BDA0002232447780000083
From the progressive downmix Signal X'1Middle reconstructed object B (i.e., S'2) Using progressive downmix signal X'1Is subtracted from the reconstructed object B to obtain a reconstructed object D (i.e., S'1). And finally, sequentially restoring the object from the corresponding gradually-mixed down signal through three-step decoding, and compensating the reconstructed signal by using residual information to reduce the tone quality reduction caused by aliasing distortion.
It should be noted that A, B, C, D the four objects and the number of decoding steps are only used to illustrate the implementation steps of the present invention and are not used to limit the present invention.
Step B6: and converting the audio object signal in the frequency domain into the time domain by using time-frequency inverse transformation.
In this embodiment, the gradually reconstructed object signal is still a frequency domain signal, and the time-frequency inverse transformation is performed to convert the object signal into a time domain, so that subsequent functions such as rendering, personalized interaction, playing and the like can be performed. Therefore, the inverse transform in the decoding method is to perform windowing on the object frequency domain signal, and improve the inverse discrete cosine transform operation to obtain the time domain connection signal.
Compared with the existing audio object coding method, the method has the advantages and characteristics that:
multi-step progressive encoding and decoding are utilized, residual errors are utilized to compensate decoding distortion to the maximum extent, and each audio object is guaranteed to have good listening quality; and simultaneously, singular value decomposition is introduced to decompress residual error information in a dividing mode, so that the code rate is reduced. Therefore, the invention can ensure that high-quality audio objects are obtained by decoding under medium and low code rates so as to meet the use requirements of the audio personalized interaction system.

Claims (10)

1, method for encoding audio objects adapted to a personalized interactive system, comprising the steps of:
step A1: performing frame windowing on an input audio object sequence, converting a time domain signal into a frequency domain signal, and obtaining a time-frequency matrix of each audio object;
step A2: according to the time-frequency matrix of each object, calculating the frequency domain energy of the objects to sort, and determining the object to be coded in each step in multi-step progressive coding;
step A3, according to the determined coding sequence, gradually down-mixing and calculating corresponding side information, wherein the step-by-step down-mixing refers to adding matrixes to data of objects input in the current processing flow to obtain sum matrixes, the step-by-step down-mixing signals are not transmitted as transmission code streams, the side information comprises object residual errors and object gain parameter matrixes, and the object gain parameters are calculated through the energy ratio of two input signals in an object pair;
step A4: decomposing the object residual error in the side information into a left singular matrix, a right singular matrix and singular values by singular value decomposition;
step A5: quantizing the singular matrix, the singular value and the object gain parameter to obtain a side information code stream;
step A6: coding the final downmix signal in the step A3 to obtain a downmix signal code stream;
step A7: and synthesizing the code streams obtained in the step A5 and the step A6 into an output code stream, and transmitting the output code stream to a decoding end.
2. The audio object encoding method adapted to the personalized interactive system as set forth in claim 1, wherein in step A1, the original time domain dimensional sound signal is transformed into the frequency domain two dimensional spectrogram by framing, windowing and Modified Discrete Cosine Transform (MDCT), and the obtained matrix-form object data is output.
3. The audio object coding method adapted to a personalized interaction system according to claim 1, characterized in that: in step A2, according to the object data in the form of matrix, calculating the energy of object frequency domain, selecting the energy sorting mode from big to small, and determining the object sequence to be coded in each step; coding order, which means that audio objects with larger coding energy are preferentially coded;
the calculation of the frequency domain energy of the object is shown as follows:
Figure FDA0002232447770000011
wherein, | | SiI | represents the total energy of the ith audio object, OiRepresenting the ith subject in the total energy of all subjectsThe proportion of the components is calculated; according to each object OiThe values are sorted from big to small in the order of D (S)1)、B(S2)、A(S3)、…、C(SN) N is the number of objects to be encoded, and O is preferentially encodediObjects with large values.
4. The audio object coding method adapted to the personalized interaction system of claim 1, wherein in the step A3, side information of the coded objects is down-mixed and calculated step by step, and only object side information is calculated per step;
the calculation formula of the object residual and the object gain parameter is as follows:
Figure FDA0002232447770000021
Figure FDA0002232447770000022
wherein R (i) is the residual signal of the i +1 th object, Go(i) Gain parameter for the i +1 th object, Gd(i) A gain parameter for an ith downmix signal; xiRepresenting the downmix signal, P, obtained in step io(i) Is the energy of object i, Pd(i) The energy of the mixed signal in the ith step; n represents the number of objects to be encoded.
5. The audio object coding method adapted to a personalized interaction system according to claim 1, characterized in that: in the step A4, carrying out dimension reduction compression on residual error matrixes of a plurality of objects by a singular value decomposition method, and reducing data volume increase brought by residual error information; decomposing the residual matrix into three small matrixes, namely a left singular matrix, a singular value matrix and a right singular matrix; wherein the singular value matrix transmits only the values on the matrix diagonal.
6. The method of claim 1, wherein in the step A5, the side information is quantized by a table lookup method, the element values of the residual decomposition matrix and the gain parameter matrix are normalized before quantization, the closest quantization value is looked up in a quantization table according to the size of each element value, and the corresponding quantization index is outputted as the side information quantization code stream.
7. The audio object coding method adapted to a personalized interaction system according to claim 1, characterized in that: in step a6, the final downmix signal is encoded by an AAC encoder and then a code stream is output.
8. The audio object coding method adapted to a personalized interaction system according to claim 1, characterized in that: in step a7, synthesizing an output code stream refers to merging the final downmix signal code stream and the side information code stream, and adding a flag bit for identifier resolution; and finally, the down-mixing signal code stream refers to an output code stream after AAC coding, and the side information code stream refers to a quantization index code stream output after the residual decomposition matrix and the gain parameter are quantized.
An audio object decoding method adapted to a personalized interactive system, characterized by decoding the code generated by the method of any of claims 1-8, ;
the specific implementation comprises the following substeps:
step B1: analyzing the received code stream to obtain a side information code stream and a down-mixing signal code stream;
step B2: carrying out AAC decoding on the down-mixed signal code stream to obtain a down-mixed signal;
step B3: the side information is dequantized to obtain a left singular matrix, a right singular matrix, a singular value and an object gain parameter;
step B4: performing matrix synthesis on the left singular matrix, the right singular matrix and the singular value to recover an object residual error;
step B5: decoding backward according to the coding order, and circularly reconstructing an audio object frequency domain signal from the transmission downmix signal by using the side information;
step B6: the audio object signals in the frequency domain are converted to the time domain using a time-frequency transform.
10. The audio object decoding method adapted to a personalized interactive system according to claim 9, characterized in that: in step B4, the matrix synthesis is to multiply the left singular matrix, the singular value matrix, and the right singular matrix to obtain an approximate object residual error.
CN201910972165.8A 2019-10-14 2019-10-14 Audio object coding method suitable for personalized interactive system Active CN110739000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910972165.8A CN110739000B (en) 2019-10-14 2019-10-14 Audio object coding method suitable for personalized interactive system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910972165.8A CN110739000B (en) 2019-10-14 2019-10-14 Audio object coding method suitable for personalized interactive system

Publications (2)

Publication Number Publication Date
CN110739000A true CN110739000A (en) 2020-01-31
CN110739000B CN110739000B (en) 2022-02-01

Family

ID=69270038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910972165.8A Active CN110739000B (en) 2019-10-14 2019-10-14 Audio object coding method suitable for personalized interactive system

Country Status (1)

Country Link
CN (1) CN110739000B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365896A (en) * 2020-10-15 2021-02-12 武汉大学 Object-oriented encoding method based on stack type sparse self-encoder
CN112885364A (en) * 2021-01-21 2021-06-01 维沃移动通信有限公司 Audio encoding method and decoding method, audio encoding device and decoding device
CN113096672A (en) * 2021-03-24 2021-07-09 武汉大学 Multi-audio object coding and decoding method applied to low code rate
CN113314130A (en) * 2021-05-07 2021-08-27 武汉大学 Audio object coding and decoding method based on frequency spectrum moving
CN113314131A (en) * 2021-05-07 2021-08-27 武汉大学 Multistep audio object coding and decoding method based on two-stage filtering

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067931A (en) * 2007-05-10 2007-11-07 芯晟(北京)科技有限公司 Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system
WO2008145894A1 (en) * 2007-05-10 2008-12-04 France Telecom Audio encoding and decoding method and associated audio encoder, audio decoder and computer programs
US20090125314A1 (en) * 2007-10-17 2009-05-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio coding using downmix
CN101609674A (en) * 2008-06-20 2009-12-23 华为技术有限公司 Decoding method, device and system
JP2010109631A (en) * 2008-10-29 2010-05-13 Kyocera Corp Wireless communication system, transmission device, and communication signal transmission method
EP2690621A1 (en) * 2012-07-26 2014-01-29 Thomson Licensing Method and Apparatus for downmixing MPEG SAOC-like encoded audio signals at receiver side in a manner different from the manner of downmixing at encoder side
CN103778919A (en) * 2014-01-21 2014-05-07 南京邮电大学 Speech coding method based on compressed sensing and sparse representation
CN103928030A (en) * 2014-04-30 2014-07-16 武汉大学 Gradable audio coding system and method based on sub-band space attention measure
CN103974076A (en) * 2014-05-19 2014-08-06 华为技术有限公司 Image decoding and coding method, device and system
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
US20140358554A1 (en) * 2011-04-08 2014-12-04 Dolby International Ab Audio encoding method and system for generating a unified bitstream decodable by decoders implementing different decoding protocols
US20150371644A1 (en) * 2012-11-09 2015-12-24 Stormingswiss Gmbh Non-linear inverse coding of multichannel signals
CN105556596A (en) * 2013-07-22 2016-05-04 弗朗霍夫应用科学研究促进协会 Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal
CN107533845A (en) * 2015-02-02 2018-01-02 弗劳恩霍夫应用研究促进协会 Apparatus and method for handling coded audio signal
CN107610710A (en) * 2017-09-29 2018-01-19 武汉大学 A kind of audio coding and coding/decoding method towards Multi-audio-frequency object

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008145894A1 (en) * 2007-05-10 2008-12-04 France Telecom Audio encoding and decoding method and associated audio encoder, audio decoder and computer programs
CN101067931A (en) * 2007-05-10 2007-11-07 芯晟(北京)科技有限公司 Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system
US20090125314A1 (en) * 2007-10-17 2009-05-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio coding using downmix
CN101609674A (en) * 2008-06-20 2009-12-23 华为技术有限公司 Decoding method, device and system
JP2010109631A (en) * 2008-10-29 2010-05-13 Kyocera Corp Wireless communication system, transmission device, and communication signal transmission method
US20140358554A1 (en) * 2011-04-08 2014-12-04 Dolby International Ab Audio encoding method and system for generating a unified bitstream decodable by decoders implementing different decoding protocols
EP2690621A1 (en) * 2012-07-26 2014-01-29 Thomson Licensing Method and Apparatus for downmixing MPEG SAOC-like encoded audio signals at receiver side in a manner different from the manner of downmixing at encoder side
US20150371644A1 (en) * 2012-11-09 2015-12-24 Stormingswiss Gmbh Non-linear inverse coding of multichannel signals
CN105556596A (en) * 2013-07-22 2016-05-04 弗朗霍夫应用科学研究促进协会 Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal
US20160275958A1 (en) * 2013-07-22 2016-09-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Multi-Channel Audio Decoder, Multi-Channel Audio Encoder, Methods and Computer Program using a Residual-Signal-Based Adjustment of a Contribution of a Decorrelated Signal
CN103778919A (en) * 2014-01-21 2014-05-07 南京邮电大学 Speech coding method based on compressed sensing and sparse representation
CN103928030A (en) * 2014-04-30 2014-07-16 武汉大学 Gradable audio coding system and method based on sub-band space attention measure
CN103974076A (en) * 2014-05-19 2014-08-06 华为技术有限公司 Image decoding and coding method, device and system
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN107533845A (en) * 2015-02-02 2018-01-02 弗劳恩霍夫应用研究促进协会 Apparatus and method for handling coded audio signal
CN107610710A (en) * 2017-09-29 2018-01-19 武汉大学 A kind of audio coding and coding/decoding method towards Multi-audio-frequency object

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WU, T. , ET AL.: "Audio object coding based on optimal parameter frequency resolution", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *
王晶 等: "一种结合G.719编解码器的参数立体声音频编解码扩展方法", 《北京理工大学学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365896A (en) * 2020-10-15 2021-02-12 武汉大学 Object-oriented encoding method based on stack type sparse self-encoder
CN112365896B (en) * 2020-10-15 2022-06-14 武汉大学 Object-oriented encoding method based on stack type sparse self-encoder
CN112885364A (en) * 2021-01-21 2021-06-01 维沃移动通信有限公司 Audio encoding method and decoding method, audio encoding device and decoding device
WO2022156601A1 (en) * 2021-01-21 2022-07-28 维沃移动通信有限公司 Audio encoding method and apparatus, and audio decoding method and apparatus
CN112885364B (en) * 2021-01-21 2023-10-13 维沃移动通信有限公司 Audio encoding method and decoding method, audio encoding device and decoding device
CN113096672A (en) * 2021-03-24 2021-07-09 武汉大学 Multi-audio object coding and decoding method applied to low code rate
CN113096672B (en) * 2021-03-24 2022-06-14 武汉大学 Multi-audio object coding and decoding method applied to low code rate
CN113314130A (en) * 2021-05-07 2021-08-27 武汉大学 Audio object coding and decoding method based on frequency spectrum moving
CN113314131A (en) * 2021-05-07 2021-08-27 武汉大学 Multistep audio object coding and decoding method based on two-stage filtering
CN113314130B (en) * 2021-05-07 2022-05-13 武汉大学 Audio object coding and decoding method based on frequency spectrum movement
CN113314131B (en) * 2021-05-07 2022-08-09 武汉大学 Multistep audio object coding and decoding method based on two-stage filtering

Also Published As

Publication number Publication date
CN110739000B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
CN110739000B (en) Audio object coding method suitable for personalized interactive system
US11798568B2 (en) Methods, apparatus and systems for encoding and decoding of multi-channel ambisonics audio data
CN101120615B (en) Multi-channel encoder/decoder and related encoding and decoding method
US8964994B2 (en) Encoding of multichannel digital audio signals
JP4685925B2 (en) Adaptive residual audio coding
US9514759B2 (en) Method and apparatus for performing an adaptive down- and up-mixing of a multi-channel audio signal
CN107610710B (en) Audio coding and decoding method for multiple audio objects
CN113728382A (en) Spatialized audio codec with rotated interpolation and quantization
CN110660401B (en) Audio object coding and decoding method based on high-low frequency domain resolution switching
EP2489036B1 (en) Method, apparatus and computer program for processing multi-channel audio signals
CN108417219B (en) Audio object coding and decoding method suitable for streaming media
CN113314132A (en) Audio object coding method, decoding method and device applied to interactive audio system
CN113314131B (en) Multistep audio object coding and decoding method based on two-stage filtering
CN112365896B (en) Object-oriented encoding method based on stack type sparse self-encoder
US20240153512A1 (en) Audio codec with adaptive gain control of downmixed signals
CN113096672B (en) Multi-audio object coding and decoding method applied to low code rate
Hu et al. Multi-step coding structure of spatial audio object coding
JP2016539358A (en) A decorrelator structure for parametric reconstruction of audio signals.
Hu et al. Efficient multi-step audio object coding with limited residual information
CN113314130B (en) Audio object coding and decoding method based on frequency spectrum movement
CN116486822A (en) Adaptive audio object coding and decoding method and device in immersive audio system
EP3424048A1 (en) Audio signal encoder, audio signal decoder, method for encoding and method for decoding
Auristin et al. New Ieee Standard For Advanced Audio Coding In Lossless Audio Compression: A Literature Review
CN116508098A (en) Quantizing spatial audio parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant