WO2023005415A1 - 一种多声道信号的编解码方法和装置 - Google Patents

一种多声道信号的编解码方法和装置 Download PDF

Info

Publication number
WO2023005415A1
WO2023005415A1 PCT/CN2022/096602 CN2022096602W WO2023005415A1 WO 2023005415 A1 WO2023005415 A1 WO 2023005415A1 CN 2022096602 W CN2022096602 W CN 2022096602W WO 2023005415 A1 WO2023005415 A1 WO 2023005415A1
Authority
WO
WIPO (PCT)
Prior art keywords
blocks
transient
channel
group
block
Prior art date
Application number
PCT/CN2022/096602
Other languages
English (en)
French (fr)
Inventor
孟宪波
夏丙寅
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020247004632A priority Critical patent/KR20240032117A/ko
Priority to EP22848025.7A priority patent/EP4362012A1/en
Publication of WO2023005415A1 publication Critical patent/WO2023005415A1/zh
Priority to US18/423,990 priority patent/US20240169998A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes

Definitions

  • the present application relates to the technical field of audio processing, and in particular to a method and device for encoding and decoding multi-channel signals.
  • Compression of audio data is an indispensable link in media applications such as media communication and media broadcasting.
  • media applications such as media communication and media broadcasting.
  • high-definition audio industry and three-dimensional audio industry people's demand for audio quality is getting higher and higher, followed by the rapid growth of audio data volume in media applications.
  • the current audio data compression technology is based on the basic principle of signal processing, and uses the correlation of signals in time and space to compress the original audio signal.
  • the audio signal includes a stereo signal to reduce the amount of data, thereby facilitating audio data. transmission or storage.
  • Embodiments of the present application provide a multi-channel signal encoding and decoding method and device, which are used to improve the encoding quality of the multi-channel signal and the reconstruction effect of the multi-channel signal.
  • the embodiment of the present application provides a method for encoding a multi-channel signal, including:
  • the M first transient identifiers of the first channel are used to indicate that the first block is a transient block, or indicate that the first block is a non-transient block;
  • the M blocks of the second channel include the first
  • the second transient identifier of the second block is used to indicate that the second block is a transient block, or indicate that the second block is a non-transient block;
  • first adjusted group information and second adjusted group information are obtained according to the first group information and the second group information, and the first adjusted group information is obtained.
  • An adjusted group information corresponds to the first group information
  • the second adjusted group information corresponds to the second group information; wherein, the first adjusted group information is the same as the first group information and the first adjusted group information
  • the second adjusted group information is obtained based on the adjustment of the second group information; or, the first adjusted group information is obtained based on the adjustment of the first group information and the second adjusted group information is the same as the obtained
  • the second group information is the same; or, the first adjusted group information is obtained based on the adjustment of the first group information and the second adjusted group information is obtained based on the adjustment of the second group information ;
  • the current frame of the multi-channel signal to be encoded includes the first channel and the second channel, and each channel includes the spectrum of M blocks, according to the first channel of the current frame of the multi-channel signal to be encoded
  • the spectrum of the M blocks of the first channel is obtained from the M first transient identifiers of the M blocks of the first channel, and the first grouping information of the M blocks of the first channel is obtained according to the M first transient identifiers, and the same
  • the method can obtain the second grouping information of M blocks of the second channel, and when the first grouping information and the second grouping information meet the preset condition, the first adjustment grouping information and the first adjustment grouping information are obtained according to the first grouping information and the second grouping information The second adjustment grouping information;
  • the first frequency spectrum to be encoded can be obtained, and the second frequency spectrum to be encoded can be obtained similarly, and finally the first frequency spectrum to be encoded can be obtained by using the en
  • the spectrum to be encoded and the second spectrum to be encoded are encoded to obtain a spectrum encoding result, which can be carried by the code stream. Therefore, in the embodiment of the present application, the grouping information of M blocks of each channel is obtained according to the M transient identifiers of each channel of the current frame, and each channel is obtained when the grouping information of M blocks of each channel satisfies the preset condition.
  • the adjustment grouping information of M blocks of the channel, the spectrum to be encoded is obtained according to the adjustment grouping information of the M blocks of each channel and the frequency spectrum of the M blocks of each channel, so that it is possible to realize the encoding for blocks with different transient identities Grouping, alignment and encoding to improve the encoding quality of multi-channel signals.
  • the method further includes: encoding the first adjusted group information and the second adjusted group information to obtain a group information coding result; writing the group information coding result to into the code stream.
  • the encoding end encodes the first adjustment group information and the second adjustment group information to obtain the group information coding result, and the adjustment group information
  • the encoding method used is not limited here.
  • the group information coding result By encoding the adjusted group information, the group information coding result can be obtained, and the group information coding result can be written into the code stream, so that the code stream can carry the group information coding result, so that the decoding end can obtain the group by parsing the code stream
  • the information encoding result is analyzed to obtain the first adjustment group information and the second adjustment group information.
  • the first group information includes: a first group number or a first group number identifier of the M blocks of the first channel, and the first group number identifier is used to indicate the The first group number, when the first group number is greater than 1, the first group information also includes: the M first transient identifiers; or, the first group information includes: the M first transient identification;
  • the second grouping information includes: a second grouping number or a second grouping number identifier of the M blocks of the second channel, and the second grouping number identifier is used to indicate the second grouping number, when the When the number of the second group is greater than 1, the second group information further includes: the M second transient identifiers; or, the second group information includes: the M second transient identifiers;
  • the first adjustment group information includes: the first adjustment group number or the first adjustment group number identifier of the M blocks of the first channel, and the first adjustment group number identifier is used to indicate the first adjustment group number, when the number of the first adjustment group is greater than 1, the first adjustment group information also includes: M first adjustment transient identifiers of the M blocks of the first channel, the first block’s The first adjusted transient identifier is different from the first transient identifier of the first block or the first adjusted transient identifier of the first block is the same as the first transient identifier of the first block; or, the The first adjustment group information includes: the M first adjustment transient identifiers;
  • the second adjustment group information includes: the second adjustment group number or the second adjustment group number identifier of the M blocks of the second channel, and the second adjustment group number identifier is used to indicate the second adjustment group number number, when the number of the second adjustment group is greater than 1, the second adjustment group information also includes: M second adjustment transient identifiers of the M blocks of the second sound channel, the second block’s The second adjusted transient identifier is different from the second transient identifier of the second block or the second adjusted transient identifier of the second block is the same as the second transient identifier of the second block; or, the The second adjustment group information includes: the M second adjustment transient identifiers.
  • the first adjusted group information and the first group information may be the same or different.
  • the first grouping information includes: the first grouping number or the first grouping number identification of the M blocks of the first channel
  • the first adjustment grouping information includes: the first adjusting grouping number or the first grouping number of the M blocks of the first channel.
  • the first group quantity and the first adjusted group quantity can be the same or different, for example, the adjustment for the first group information does not change the group quantity, then the first group quantity and the first adjusted group quantity The number is the same, if the adjustment of the first group information changes the group number, the first group number is different from the first adjusted group number, for example, before the first group information is adjusted, the first group number is 2, and the first group information is adjusted Afterwards, the first adjustment group number is 1.
  • the first group quantity identifier and the first adjusted group quantity identifier may be the same or different. For example, before the first group information is adjusted, the first group number is 2, and the first group number identifier is 1. After the first group information is adjusted, if the first adjusted group number is 2, the first group number identifier is still 1.
  • the second adjusted group information and the second group information may be the same or different.
  • the preset condition includes: the first group information is inconsistent with the second group information.
  • the inconsistency between the first group information and the second group information means that the first group information and the second group information are not completely consistent.
  • the first group information is inconsistent with the second group information, it can be considered that the first group information and the second group information
  • the grouping information satisfies the preset condition, and when the first grouping information is consistent with the second grouping information, it may be considered that the first grouping information and the second grouping information do not satisfy the preset condition.
  • the number of groups of M blocks in the first group information is the same as the number of groups of M blocks in the second group information, but the M first transient identifiers included in the first group information are the same as the M first transient identifiers included in the second group information.
  • the two transient identities are different.
  • the number of groups of M blocks in the first group information is different from the number of groups of M blocks in the second group information.
  • This preset condition needs to be determined in conjunction with specific application scenarios, and is not limited here. By setting the above preset conditions, it can be determined whether to adjust the first group information and the second group information.
  • the inconsistency between the first group information and the second group information includes: the M first transient identifiers indicate that the M blocks of the first channel include transient blocks and non-transient blocks, the M second transient flags indicate that the M blocks of the second channel include transient blocks and non-transient blocks, and the M first transient flags and the M The second transient identification is inconsistent;
  • the inconsistency between the first group information and the second group information includes: the M first transient identifiers indicate that the M blocks of the first channel include transient blocks and non-transient blocks, and the M The second transient flag indicates that the M blocks of the second channel include transient blocks and non-transient blocks, and the number of transient blocks of the first channel is the same as the number of transient blocks of the second channel Inconsistent;
  • the inconsistency between the first group information and the second group information includes: the M first transient identifiers indicate that the M blocks of the first channel include transient blocks and non-transient blocks, and the M The second transient flag indicates that the M blocks of the second audio channel include transient blocks and non-transient blocks, the M first transient flags are inconsistent with the M second transient flags, and the The Nth block among the M blocks of the first channel and the Nth block among the M blocks of the second channel are all transient, and 0 ⁇ N ⁇ M.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks.
  • the second The M blocks of a channel include transient blocks and non-transient blocks.
  • the inconsistency between the M first transient identifiers and the M second transient identifiers refers to the value of at least one transient identifier of the M first transient identifiers and the same index of the M second transient identifiers. different.
  • one block A in the M blocks of the first channel is a transient block
  • one block B in the M blocks of the second channel is a transient block.
  • the first transient identifier of the block A is consistent with the second transient identifier of the block B.
  • one block C in the M blocks of the first channel is a non-transient block
  • one block D in the M blocks of the second channel is a transient block. If block C is in the M blocks of the first channel
  • the index in is the same as the index of the M blocks of block D in the second channel, then the first transient identifier of block A is inconsistent with the second transient identifier of block B.
  • the M first transient identifiers and the M second transient identifiers are inconsistent, it may be determined that the first group information and the second group information meet a preset condition, and at this time the group information needs to be adjusted.
  • the M first transient identifiers are completely consistent with the M second transient identifiers, it may be determined that the first group information and the second group information do not meet the preset condition, and at this time the group information is not adjusted.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks, so the statistics can be obtained.
  • the M blocks of the second channel include transient blocks and non-transient blocks, so the number of transient blocks included in the second channel can be obtained through statistics.
  • the number of transient blocks of the first channel is different from the number of transient blocks of the second channel, it can be determined that the first grouping information and the second grouping information meet the preset conditions. At this time, it is necessary to carry out grouping information Adjustment.
  • the number of transient blocks of the first channel is the same as the number of transient blocks of the second channel, it may be determined that the first group information and the second group information do not meet the preset condition, and the group information is not adjusted at this time.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks.
  • the second The M blocks of a channel include transient blocks and non-transient blocks.
  • the inconsistency between the M first transient identifiers and the M second transient identifiers refers to the value of at least one transient identifier of the M first transient identifiers and the same index of the M second transient identifiers. different.
  • one block A in the M blocks of the first channel is a transient block
  • one block B in the M blocks of the second channel is a transient block.
  • the first transient identifier of the block A is consistent with the second transient identifier of the block B.
  • one block C in the M blocks of the first channel is a non-transient block
  • one block D in the M blocks of the second channel is a transient block. If block C is in the M blocks of the first channel
  • the index in is the same as the index of the M blocks of block D in the second channel, then the first transient identifier of block A is inconsistent with the second transient identifier of block B.
  • the Nth block of the M blocks of the first channel and the Nth block of the M blocks of the second channel are transient, 0 ⁇ N ⁇ M, the index of the Nth block of the first channel and the Nth block
  • the index of the Nth block of the two channels is the same, and the value of N and the number of values of N are not limited. For example, when the value of N is 1, it means the first channel and the second channel. There is one transient block with the same index, for example, when the value of N is 2, it means that there are two transient blocks with the same index in the first channel and the second channel.
  • the M first transient identifiers are completely consistent with the M second transient identifiers, or the M first transient identifiers are inconsistent with the M second transient identifiers and the first channel and the second channel do not have the same index It may be determined that the first grouping information and the second grouping information do not meet the preset condition when the transient block is used, and the grouping information is not adjusted at this time.
  • the M blocks of the first channel have respective indexes
  • the M blocks of the second channel have respective indexes
  • the M first transient identifiers indicate that the M blocks of the first channel include transient blocks and non-transient blocks
  • the M A second transient identifier indicates that the M blocks of the second channel include transient blocks and non-transient blocks
  • the number of transient blocks of the first channel is the same as the number of transient blocks of the second channel
  • the first group information When the number of transient blocks of the first channel is smaller than the number of transient blocks of the second channel, adjusting the first group information to obtain the first adjusted group information, the first adjusting the number of transient blocks of the first channel indicated by the grouping information to be equal to the number of transient blocks of the second channel indicated by the second grouping information;
  • the second group information When the number of transient blocks of the first channel is greater than the number of transient blocks of the second channel, adjusting the second group information to obtain the second adjusted group information, the second Adjusting the number of transient blocks of the second channel indicated by the grouping information to be equal to the number of transient blocks of the first channel indicated by the first grouping information.
  • the index of the transient block in the M blocks of the first channel is the same as that of the M blocks in the second channel
  • the grouping information of the channel with a smaller number of transient blocks needs to be adjusted, while the grouping information of the channel with a larger number of transient blocks remains unchanged.
  • the number of transient blocks indicated by the grouping information of the two channels is the same.
  • the first group information is adjusted to obtain the first adjusted group information.
  • the adjustment of the first group information may include The first transient identification of the M blocks is adjusted, for example, the first transient identification of the first block in the M blocks is adjusted from non-transient to transient, so that the number of transient blocks of the first channel increases, so that The number of transient blocks of the first channel in the first adjusted group information (that is, the adjusted number of transient blocks of the first channel) and the number of transient blocks of the second channel indicated by the second group information equal.
  • the second group information is adjusted to obtain the second adjusted group information.
  • the adjustment of the second group information may include The second transient identification of the M blocks is adjusted, for example, the second transient identification of the second block in the M blocks is adjusted from non-transient to transient, so that the number of transient blocks of the second channel increases, so that The number of transient blocks of the second channel in the second adjustment grouping information (that is, the adjusted number of transient blocks of the second channel) and the number of transient blocks of the first channel indicated by the first grouping information equal.
  • the M blocks of the first channel have respective indexes
  • the M blocks of the second channel have respective indexes
  • the M first transient identifiers indicate that the M blocks of the first channel include transient blocks and non-transient blocks
  • the M A second transient identifier indicates that the M blocks of the second channel include transient blocks and non-transient blocks
  • the number of transient blocks of the first channel is the same as the number of transient blocks of the second channel
  • the index of the transient block indicated by the M first transient identifiers is a part of the index of the transient block indicated by the M second transient identifiers
  • At least one adjustment is performed to obtain the M first adjusted transient identifiers, the indexes of all transient blocks indicated by the M first adjusted transient identifiers and all the transient blocks indicated by the M second transient identifiers the blocks have the same index;
  • the index of the transient block indicated by the M second transient identifiers is a part of the index of the transient block indicated by the M first transient identifiers, for the M second transient identifiers At least one of them is adjusted to obtain the M second adjusted transient identifiers, and the indexes of all transient blocks indicated by the M second adjusted transient identifiers are the same as the indexes of all the transient blocks indicated by the M first transient identifiers. the blocks have the same index;
  • the index of the transient block indicated by the M first transient identifiers is partly the same as the index of the transient block indicated by the M second transient identifiers
  • the M first transient identifiers at least one of which is adjusted to obtain the M first adjusted transient identifiers
  • at least one of the M second transient identifiers is adjusted to obtain the M second adjusted transient identifiers
  • the M Indexes of all transient blocks indicated by the first adjusted transient identifier are the same as indexes of all transient blocks indicated by the M second adjusted transient identifiers.
  • the number of transient blocks of the first channel is smaller than the number of transient blocks of the second channel, that is, the index of the transient block indicated by the M first transient identifiers is the Mth Part of the index of the transient block indicated by the second transient identifier.
  • the first transient identifier of the M blocks of the first channel needs to be adjusted, and the second transient identifier of the M blocks of the second channel remains unchanged.
  • the transient blocks of the first channel and the second channel can be The numbers are the same, so as to facilitate subsequent encoding for the spectrum of the first channel and the second channel.
  • the number of transient blocks of the second channel is smaller than the number of transient blocks of the first channel, that is, the indexes of the transient blocks indicated by the M second transient identifiers are the Mth A part of the index of the transient block indicated by a transient identifier.
  • the second transient identifier of the M blocks of the second channel needs to be adjusted, and the first transient identifier of the M blocks of the first channel remains unchanged , at least one of the M second transient identifiers is adjusted to obtain M second adjusted transient identifiers, and the indexes of all transient blocks indicated by the M second adjusted transient identifiers are related to the M first transient identifiers
  • the indexes of all the transient blocks indicated are the same, and the number of transient blocks indicated by the grouping information of the two channels is the same after adjustment.
  • the transient blocks of the first channel and the second channel can be The numbers are the same, so as to facilitate subsequent encoding for the spectrum of the first channel and the second channel.
  • the number of transient blocks in the second channel is not equal to the number of transient blocks in the first channel, but the indexes of the transient blocks indicated by the M first transient identifiers are the same as the M
  • the index part of the transient block indicated by the second transient state identification is the same, and the same part here means that the index of the transient block in the M blocks of the first channel is the same as that of the M blocks in the second channel.
  • the indexes of some transient blocks are partly the same, but not completely the same.
  • the first transient markers of the M blocks of the first channel need to be adjusted, and the second transient markers of the M blocks of the second channel need to be adjusted, that is, the transient markers of the M blocks of the two channels are all Adjustment is required, at least one of the M first transient identifiers is adjusted to obtain M first adjusted transient identifiers, and at least one of the M second transient identifiers is adjusted to obtain M second adjusted transient identifiers.
  • the index of all the transient blocks indicated by the M first adjusted transient state identifiers is the same as the index of all the transient blocks indicated by the M second adjusted transient state identifiers.
  • the number of transient blocks indicated by the grouping information of the two channels is the same. Through this adjustment method, the number of transient blocks in the first channel and the second channel can be made the same, so that it is convenient for the subsequent The spectrum of the first and second channel is encoded.
  • the adjusting at least one of the M first transient identifiers to obtain the M first adjusted transient identifiers includes:
  • the first transient identifier of the first block indicates that the first block is a non-transitory block
  • the second transient identifier of the third block of the M blocks of the second channel indicates the first
  • the three blocks are transient blocks, the first transient identifier of the first block is adjusted to the first adjusted transient identifier of the first block, and the first adjusted transient identifier of the first block indicates the first transient identifier of the first block
  • One block is a transient block, and the index of the first block is the same as the index of the third block;
  • the adjusting at least one of the M second transient identifiers to obtain the M second adjusted transient identifiers includes:
  • the second transient identifier of the second block indicates that the second block is a non-transitory block
  • the first transient identifier of the fourth block of the M blocks of the first channel indicates the first
  • the four blocks are transient blocks
  • the second transient identifier of the second block is adjusted to the second adjusted transient identifier of the second block
  • the second adjusted transient identifier of the second block indicates the first
  • the second block is a transient block
  • the index of the second block is the same as the index of the fourth block.
  • the adjustment of the first transient flag is taken as an example for illustration.
  • the first transient flag of the first block indicates that the first block is a non-transient block
  • the first block of the M blocks of the second channel indicates that the third block is a transient block
  • the first transient identification of the first block is adjusted to the first adjusted transient identification of the first block
  • the first adjusted transient identification of the first block is Indicates that the first block is a transient block, and the index of the first block is the same as the index of the third block.
  • the first transient flag of the first block is 1, and the second transient flag of the third block is 0, the index of the first block and the index of the third block are both 4, then the first adjustment of the first block The transient flag is 0.
  • the number of transient blocks of the first channel and the second channel can be made the same, thereby facilitating subsequent encoding of the frequency spectrum of the first channel and the second channel.
  • the obtaining the first frequency spectrum to be encoded according to the first adjustment group information and the frequency spectrum of the M blocks of the first channel includes:
  • the according to the second Adjusting the spectrum of the grouping information and the M blocks of the second channel to obtain the second spectrum to be encoded includes:
  • the frequency spectra of the M blocks of the second channel are grouped and arranged according to the second adjustment grouping information, so as to obtain a second frequency spectrum to be encoded.
  • the encoding end after the encoding end obtains the first adjustment grouping information of M blocks, it can use the first adjustment grouping information of the M blocks to analyze the M blocks of the current frame.
  • the frequency spectra of the M blocks are grouped and arranged, and by grouping and arranging the frequency spectra of the M blocks, the arrangement order of the frequency spectra of the M blocks in the current frame can be adjusted.
  • the above group arrangement is performed according to the first adjusted group information of the M blocks, and the first adjusted group information of the M blocks is obtained according to the M transient identifiers of the M blocks.
  • the obtained The spectrum of M blocks arranged in groups is based on the M transient identifiers of M blocks as the basis for grouping and sorting, and the coding order of the spectrum of M blocks can be changed by grouping and sorting .
  • the above M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the grouping and arranging the frequency spectra of the M blocks of the first channel according to the first adjustment grouping information to obtain the first frequency spectrum to be encoded includes:
  • the spectrum of the M blocks indicated as a non-transient block by the first adjusted transient identifier of the M blocks is divided into a first non-transient group; the spectrum of the blocks in the first transient group is arranged into Before the spectrum of the blocks in the first non-transient group, to obtain the first spectrum to be encoded;
  • the grouping and arranging the spectrums of the M blocks of the second channel according to the second adjustment grouping information to obtain the second spectrum to be coded includes:
  • the spectrum of the M blocks of the second channel indicated as transient blocks by the second adjusted transient identifiers of the M blocks into a second transient group, and dividing the M blocks of the second channel into
  • the spectrum of the M blocks indicated as a non-transient block by the second adjusted transient identification of the M blocks is divided into a second non-transient group; the spectrum of the blocks in the second transient group is arranged into Before the spectrum of the blocks in the second non-transient group, the second spectrum to be encoded is obtained.
  • the encoding end groups the M blocks based on the differences in the transient state identifiers, so that the transient group and the non-transient group can be obtained, and then the M blocks are The positions in the frequency spectrum of the current frame are arranged, and the frequency spectrum of the blocks in the transient group is arranged before the frequency spectrum of the blocks in the non-transient group, so as to obtain the frequency spectrum to be encoded.
  • the spectrum of all transient blocks in the spectrum to be encoded is located before the spectrum of the non-transient block, so that the spectrum of the transient block can be adjusted to a position with higher coding importance, so that the reconstructed audio after encoding and decoding using the neural network
  • the signal can better preserve the transient characteristics.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the grouping and arranging the frequency spectra of the M blocks of the first channel according to the first adjustment grouping information to obtain the first frequency spectrum to be encoded includes:
  • the first adjusted transient identifier of the block indicates that it is before the frequency spectrum of the non-transient block, so as to obtain the first frequency spectrum to be encoded;
  • the grouping and arranging the spectrums of the M blocks of the second channel according to the second adjustment grouping information to obtain the second spectrum to be coded includes:
  • the second adjusted transient identifier of the block indicates that it is before the frequency spectrum of the non-transient block, so as to obtain the second frequency spectrum to be encoded.
  • the spectrum of all transient blocks in the spectrum to be encoded is located before the spectrum of the non-transient block, so that the spectrum of the transient block can be adjusted to a position with higher coding importance, so that the reconstructed audio after encoding and decoding using the neural network
  • the signal can better preserve the transient characteristics.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the method before encoding the first frequency spectrum to be encoded and the second frequency spectrum to be encoded by using an encoding neural network, the method further includes:
  • the encoding of the first frequency spectrum to be encoded and the second frequency spectrum to be encoded by using an encoding neural network includes:
  • the encoding neural network is used to encode the first frequency spectrum after the intra-group interleaving process and the second frequency spectrum after the intra-group interleaving process.
  • the coder can perform interleaving processing in the group according to the grouping of M blocks of each channel, so as to obtain the group Frequency spectra of the M blocks after inner interleaving. Then the frequency spectrum of the M blocks after intra-group interleaving may be the input data of the encoding neural network.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the number of transient blocks indicated by the M first adjusted transient identifiers is P
  • the intra-group interleaving processing of the first frequency spectrum to be encoded includes:
  • interleaving the spectrum of P blocks includes interleaving the spectrum of P blocks as a whole; similarly, interleaving the spectrum of Q blocks includes The frequency spectrum is interleaved as a whole. If the number of adjusted groups of the M blocks of the first channel is 1, the frequency spectrum of the M blocks of the first channel needs to be interleaved within the group to obtain the intra-group interleaving of the M blocks of the first channel after the spectrum.
  • the M first transient states of the M blocks of the first channel are obtained according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded Before identification, the method also includes:
  • first window type of the first channel where the first window type is a short window type or a non-short window type
  • the second window type is a short window type or a non-short window type
  • the first sound is obtained according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded.
  • the step of M first transient identifications of M blocks of the track.
  • the encoding end may first determine the window type of the current frame, and the window type may be a short window type or a non-short window type, for example, the encoding end determines the window type according to the current frame of the multi-channel signal to be encoded.
  • the short window may also be called a short frame
  • the non-short window may also be called a non-short frame.
  • the method further includes:
  • the window type after obtaining the first window type of the first channel of the current frame and the second window type of the second channel at the encoding end, the window type can be carried in the code stream, and the window type is first encoded , there is no limitation for the encoding method adopted by this window type.
  • the window type encoding result By encoding the window type, the window type encoding result can be obtained, and the window type encoding result can be written into the code stream, so that the code stream can carry the window type encoding result.
  • the decoding end can obtain the window type encoding result through the code stream, and analyze the window type encoding result to obtain the first window type of the first channel of the current frame and the second window type of the second channel. Determine whether to continue decoding the code stream according to the first window type of the first channel and the second window type of the second channel, so as to obtain the first decoding group information of the M blocks of the first channel.
  • the M first transient states of the M blocks of the first channel are obtained according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded identification, including:
  • the M first transient identifiers are obtained according to the M first spectral energies and the first spectral energy average value.
  • the encoder after the encoder obtains M spectral energies, it can average the M spectral energies to obtain the average value of the spectral energy, or remove the maximum value or the largest values among the M spectral energies, and then Averaging is performed to obtain a spectral energy average.
  • Averaging is performed to obtain a spectral energy average.
  • Transient identification wherein the transient identification of a block can be used to represent the transient characteristics of a block.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the transient identifier of each block can be determined through the spectral energy and the average value of the spectral energy of each block, so that the transient identifier of a block can determine the grouping information of the block.
  • the first transient identifier of the first block indicates that the first the block is a transient block
  • the transient flag of the first block indicates that the first block is a non-transient block
  • the K is a real number greater than or equal to 1.
  • K there are many values of K, which are not limited here.
  • the spectral energy of the first block is greater than K times the average value of the spectral energy, it means that the first block has a larger frequency spectrum than the other blocks of the M blocks. If the change is too large, the transient flag of the first block indicates that the first block is a transient block.
  • the spectrum energy of the first block is less than or equal to K times the average value of the spectrum energy, it means that the spectrum of the first block has little change compared with the other blocks of M blocks, and the transient flag of the first block indicates that the first block is non-transient blocks.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame. It is not limited, the encoder can also obtain M transient identifiers of M blocks in other ways, for example, obtain the difference or ratio between the spectral energy of the first block and the average value of the spectral energy, and according to the obtained difference or ratio value to determine M transient identifiers for M blocks.
  • the embodiment of the present application also provides a method for decoding a multi-channel signal, including:
  • the first decoding group information is used to indicate the first decoding group information of the M blocks of the first channel - Decode the transient flag;
  • Decoding the code stream by using a decoding neural network to obtain the decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel;
  • the first decoded group information of M blocks of the first channel of the current frame of the multi-channel signal is obtained from the code stream, and the first decoded group information is used to indicate the M blocks of the first channel
  • the first decoding transient identifier of the block, the second decoding group information of the M blocks of the second channel is obtained from the code stream in the same way, and the code stream is decoded by the decoding neural network to obtain the M of the first channel
  • the decoded spectrum of blocks and the decoded spectrum of M blocks of the second channel; the first reconstructed signal of the first channel is obtained by using the decoded spectrum of the first decoded packet information and the M blocks of the first channel, and the same , using the second decoded group information and the decoded spectrum of the M blocks of the second channel to obtain a second reconstructed signal of the second channel.
  • the first decoded spectrum of the M blocks of the first channel obtained when decoding the code stream and the second decoded spectrum of the M blocks of the second channel respectively correspond to the M blocks of the first channel after grouping and arrangement at the encoding end
  • the frequency spectrum and the frequency spectrum of the M blocks of the second channel after grouping so the first reconstructed signal of the first channel and the second Reconstruct the signal.
  • decoding and reconstruction can be performed according to blocks with different transient identifiers in the multi-channel signal, so the reconstruction effect of the multi-channel signal can be improved.
  • the obtaining the first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel includes:
  • the first decoding group information indicates that the number of first decoding groups of the M blocks of the first channel is greater than 1, perform inverse grouping processing on the decoded spectrum of the M blocks of the first channel, so as to Obtaining the frequency spectrum after the inverse packet arrangement processing of the M blocks of the first channel;
  • the obtaining the second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel includes:
  • the second decoding group information indicates that the number of second decoding groups of the M blocks of the second channel is greater than 1, perform inverse grouping processing on the decoded spectrum of the M blocks of the second channel, so as to Obtaining the frequency spectrum after the inverse packet arrangement processing of the M blocks of the second sound channel;
  • the decoding end obtains the first decoded packet information of M blocks, and the decoding end also obtains the decoded spectrum of the M blocks of the first channel through the code stream, Since the encoding end has grouped and arranged the decoded spectrum of the M blocks of the first channel, the decoding end needs to perform the process opposite to that of the encoding end.
  • the decoded spectrum of the M blocks of the first channel is subjected to inverse packet permutation processing to obtain the spectrum of the inverse group permutation processing of the M blocks of the first channel.
  • the encoding end After the encoding end obtains the frequency spectrum of the inverse grouping arrangement processing of the M blocks of the first channel, it can transform the frequency domain to the time domain on the spectrum of the inverse grouping arrangement processing of the M blocks of the first channel, so that A first reconstructed signal of the first channel is obtained.
  • the obtaining the first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel includes:
  • the obtaining the second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel includes:
  • the second reconstructed signal is obtained according to the deinterleaved frequency spectrum of the M blocks of the second channel after intra-group deinterleaving processing.
  • the intra-group deinterleaving performed by the decoder is the inverse process of the intra-group interleave performed by the encoder, which will not be described in detail here.
  • the number of transient blocks indicated by the M first decoding transient identifiers is P
  • the obtaining the first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel includes:
  • the first decoded grouping information perform inverse grouping arrangement processing on the frequency spectrum after intra-group deinterleaving processing of the M blocks of the first channel, so as to obtain the inverse grouping arrangement of the M blocks of the first channel processed spectrum;
  • the first reconstructed signal of the first channel is obtained according to the inverse packet permutation processed frequency spectrum of the M blocks of the first channel.
  • performing deinterleaving processing on the frequency spectrum of P blocks includes performing deinterleaving processing on the frequency spectrum of the P blocks as a whole; similarly, performing deinterleaving processing on the frequency spectrum of Q blocks includes combining the The frequency spectrum of the Q blocks is deinterleaved as a whole.
  • the encoding end can perform interleaving processing according to the transient group and the non-transient group respectively, so as to obtain the interleaved frequency spectrum of P blocks and the interleaved frequency spectrum of Q blocks.
  • the interleaved frequency spectrum of the P blocks and the interleaved frequency spectrum of the Q blocks can be used as input data of the encoding neural network.
  • the decoding end By interleaving within the group, the coding side information can also be reduced and the coding efficiency can be improved. Since the encoding end performs intra-group interleaving, the decoding end needs to perform a corresponding inverse process, that is, the decoding end can perform deinterleaving processing. If the number of adjusted groups of the M blocks of the first channel is 1, it is necessary to perform intra-group deinterleaving processing on the decoded spectrum of the M blocks of the first channel to obtain the intra-group deinterleaving process of the M blocks of the first channel. Spectrum after deinterleaving
  • performing inverse grouping and arrangement processing on the frequency spectrum of the M blocks of the first channel after intra-group deinterleaving processing according to the first decoding grouping information includes:
  • the indexes of the M blocks are continuous, for example, from 0 to M-1. After the encoding end performs group arrangement, the indexes of the M blocks are no longer continuous.
  • the decoding end can obtain, according to the first decoded grouping information of the M blocks, indexes of P blocks among the reconstructed grouped and arranged M blocks, and indexes of Q blocks among the reconstructed grouped and arranged M blocks, Through reverse grouping and permutation processing, it can be recovered that the indexes of the M blocks are still continuous.
  • the method further includes:
  • the first decoding group information of the M blocks of the first channel of the current frame of the multi-channel signal is obtained from the code stream A step of.
  • the decoding end performs the reverse process of the encoding end, so the decoding end can also first determine the first window type and the second window type of the current frame.
  • the window type can be a short window type or a non-short window type.
  • the window type of the current frame is obtained from the stream, and the current frame includes the first channel and the second channel, then the first window type of the first channel and the second window type of the second channel can be obtained.
  • the first decoded group information includes: the first decoded group number or the first decoded group number identifier of the M blocks of the first channel, and the first decoded group number identifier Used to indicate the number of the first decoding group, when the number of the first decoding group is greater than 1, the first decoding group information also includes: M first decoding transient identifiers; or, the first decoding group The information includes: the M first decoding transient identifiers;
  • the second decoded group information includes: the second decoded group number or the second decoded group number identifier of the M blocks of the second channel, and the second decoded group number identifier is used to indicate that the second decoded group number, when the number of the second decoding group is greater than 1, the second decoding group information also includes: M second decoding transient identifiers; or, the second decoding group information includes: the M second Decodes the transient identity.
  • the encoding end carries the group information encoding result in the code stream
  • the group information encoding result includes the first adjusted group information and the second adjusted group information
  • the decoder can obtain the first decoded group information and the second adjusted group information by decoding the code stream.
  • Two decoding group information, the first decoding group information corresponds to the first adjustment group information of the encoding end, and the second decoding group information corresponds to the second adjustment group information of the encoding end.
  • the first decoding group information includes: the first decoding group quantity or the first decoding group quantity identification of the M blocks of the first channel, the first decoding group quantity indicates the group quantity or the adjustment group quantity of the first channel, the first A decoded group number identifier is used to indicate the group number of the first channel or the number of adjusted groups.
  • the M first decoded transient identifiers are used to indicate the transient identifiers or adjusted transient identifiers respectively corresponding to the M blocks of the first sound channel.
  • descriptions of the second decoded group information are similar to those of the first decoded group information.
  • the embodiment of the present application also provides a multi-channel signal encoding device, including:
  • the transient identification obtaining module is used to obtain M first transient identifications of the M blocks of the first channel according to the frequency spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded;
  • the M blocks of the first channel include the first block of the first channel, and the first transient identifier of the first block is used to indicate that the first block is a transient block, or indicate that the first One block is a non-transient block;
  • a grouping information obtaining module configured to obtain first grouping information of M blocks of the first sound channel according to the M first transient identifiers
  • the transient identifier obtaining module is configured to obtain M second transient identifiers of the M blocks of the second channel according to the spectrum of the M blocks of the second channel of the current frame; the second The M blocks of the channel include the second block of the second channel, and the second transient identifier of the second block is used to indicate that the second block is a transient block, or indicate that the second block is non-transient blocks;
  • the grouping information obtaining module is configured to obtain the second grouping information of the M blocks of the second sound channel according to the M second transient identifiers;
  • a group information adjustment module configured to obtain first adjusted group information and second group information according to the first group information and the second group information when the first group information and the second group information meet preset conditions. Adjustment group information, the first adjustment group information corresponds to the first group information, and the second adjustment group information corresponds to the second group information; wherein, the first adjustment group information corresponds to the first The grouping information is the same and the second adjusted grouping information is obtained based on adjusting the second grouping information; or, the first adjusted grouping information is obtained based on adjusting the first grouping information and the The second adjusted group information is the same as the second group information; or, the first adjusted group information is obtained based on adjusting the first group information and the second adjusted group information is obtained based on the first adjusted group information Obtained by adjusting the two-group information;
  • a spectrum obtaining module configured to obtain a first spectrum to be encoded according to the first adjustment group information and the spectrum of the M blocks of the first channel;
  • the spectrum obtaining module is configured to obtain a second spectrum to be encoded according to the second adjustment group information and the spectrum of the M blocks of the second channel;
  • An encoding module configured to use an encoding neural network to encode the first spectrum to be encoded and the second spectrum to be encoded to obtain a spectrum encoding result; and write the spectrum encoding result into a code stream.
  • the components of the multi-channel signal encoding device can also perform the steps described in the aforementioned first aspect and various possible implementations.
  • the components of the multi-channel signal encoding device can also perform the steps described in the aforementioned first aspect and various possible implementations.
  • the steps described in the aforementioned first aspect and various possible implementations can also perform the steps described in the aforementioned first aspect and various possible implementations.
  • the embodiment of the present application also provides a multi-channel signal decoding device, including:
  • the grouping information obtaining module is used to obtain the first decoded grouping information of the M blocks of the first channel of the current frame of the multi-channel signal from the code stream, and the first decoded grouping information is used to indicate that the first audio
  • the first decoded transient identifier of the M blocks of the track
  • the grouping information obtaining module is configured to obtain the second decoding grouping information of the M blocks of the second channel of the current frame from the code stream, the second decoding grouping information is used to indicate the second a second decoded transient identifier of the M blocks of the channel;
  • a decoding module configured to use a decoding neural network to decode the code stream to obtain the decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel;
  • a reconstructed signal obtaining module configured to obtain a first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel;
  • the reconstructed signal obtaining module is configured to obtain a second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the constituent modules of the multi-channel signal decoding device can also perform the steps described in the aforementioned second aspect and various possible implementations.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer, the computer executes the above-mentioned first aspect or the second aspect. described method.
  • an embodiment of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method described in the first aspect or the second aspect.
  • the embodiment of the present application provides a computer-readable storage medium, including the code stream generated by the method described in the foregoing first aspect.
  • the embodiment of the present application provides a communication device, which may include entities such as terminal equipment or chips, and the communication device includes: a processor and a memory; the memory is used to store instructions; the processor is used to Executing the instructions in the memory causes the communication device to execute the method as described in any one of the aforementioned first aspect or second aspect.
  • the present application provides a chip system, which includes a processor, configured to support a multi-channel signal encoding device or a multi-channel signal decoding device to implement the functions involved in the above aspects, for example, sending Or process data and/or information involved in the above methods.
  • the chip system further includes a memory, and the memory is used for storing necessary program instructions and data of the multi-channel signal encoding device or the multi-channel signal decoding device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the current frame of the multi-channel signal to be encoded includes a first channel and a second channel, and each channel includes the spectrum of M blocks, according to the first channel of the current frame of the multi-channel signal to be encoded Obtaining M first transient identifiers of the M blocks of the first channel from the spectrum of the M blocks of the first channel, and obtaining the first grouping information of the M blocks of the first channel according to the M first transient identifiers,
  • the second grouping information of the M blocks of the second channel can be obtained, and when the first grouping information and the second grouping information meet the preset conditions, the first adjustment grouping is obtained according to the first grouping information and the second grouping information information and the second adjusted grouping information; next, the first frequency spectrum to be encoded is obtained according to the frequency spectrum of the first adjusted grouping information and the M blocks of the first channel, and the second frequency spectrum to be encoded can be obtained similarly, and finally the encoding neural network is used to The first spectrum to be encoded
  • the grouping information of M blocks of each channel is obtained according to the M transient identifiers of each channel of the current frame, and each channel is obtained when the grouping information of M blocks of each channel satisfies the preset condition.
  • the adjustment grouping information of M blocks of the channel, the spectrum to be encoded is obtained according to the adjustment grouping information of the M blocks of each channel and the frequency spectrum of the M blocks of each channel, so that it is possible to realize the encoding for blocks with different transient identities Grouping, alignment and encoding to improve the encoding quality of multi-channel signals.
  • the first decoding group information of the M blocks of the first channel of the current frame of the multi-channel signal is obtained from the code stream, and the first decoding group information is used to indicate the first The first decoded transient identifiers of the M blocks of the sound channel, the second decoded group information of the M blocks of the second sound channel are obtained from the code stream in the same way, and the code stream is decoded by the decoding neural network to obtain the first The decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel; the first layer of the first channel is obtained by using the first decoded packet information and the decoded spectrum of the M blocks of the first channel Similarly, the second reconstructed signal of the second channel is obtained by using the second decoded group information and the decoded spectrum of the M blocks of the second channel.
  • the first decoded spectrum of the M blocks of the first channel obtained when decoding the code stream and the second decoded spectrum of the M blocks of the second channel respectively correspond to the M blocks of the first channel after grouping and arrangement at the encoding end
  • the frequency spectrum and the frequency spectrum of the M blocks of the second channel after grouping so the first reconstructed signal of the first channel and the second Reconstruct the signal.
  • decoding and reconstruction can be performed according to blocks with different transient identifiers in the multi-channel signal, so the reconstruction effect of the multi-channel signal can be improved.
  • FIG. 1 is a schematic diagram of the composition and structure of an audio processing system provided by an embodiment of the present application
  • FIG. 2a is a schematic diagram of an audio encoder and an audio decoder provided in an embodiment of the present application applied to a terminal device;
  • FIG. 2b is a schematic diagram of an audio encoder provided by an embodiment of the present application applied to a wireless device or a core network device;
  • FIG. 2c is a schematic diagram of an audio decoder provided by an embodiment of the present application applied to a wireless device or a core network device;
  • FIG. 3a is a schematic diagram of a multi-channel encoder and a multi-channel decoder provided in an embodiment of the present application applied to a terminal device;
  • FIG. 3b is a schematic diagram of a multi-channel encoder provided by an embodiment of the present application applied to a wireless device or a core network device;
  • FIG. 3c is a schematic diagram of a multi-channel decoder provided in an embodiment of the present application applied to a wireless device or a core network device;
  • FIG. 4 is a schematic diagram of a method for encoding a multi-channel signal provided in an embodiment of the present application
  • FIG. 5 is a schematic diagram of a decoding method for a multi-channel signal provided in an embodiment of the present application
  • FIG. 6 is a schematic diagram of an audio signal encoding and decoding system provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a method for encoding a multi-channel signal provided in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a multi-channel signal decoding method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a method for encoding a multi-channel signal provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a decoding method for a multi-channel signal provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a method for encoding a multi-channel signal provided in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a decoding method for a multi-channel signal provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a method for encoding a multi-channel signal provided in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a decoding method for a multi-channel signal provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a multi-channel signal encoding device provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a multi-channel signal decoding device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of another multi-channel signal encoding device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of another multi-channel signal decoding device provided by an embodiment of the present application.
  • Sound is a continuous wave produced by the vibration of an object. Objects that vibrate to emit sound waves are called sound sources. When sound waves propagate through a medium (such as air, solid or liquid), the auditory organs of humans or animals can perceive sound.
  • a medium such as air, solid or liquid
  • Characteristics of sound waves include pitch, intensity, and timbre.
  • Pitch indicates how high or low a sound is.
  • Pitch intensity indicates the volume of a sound.
  • Pitch intensity can also be called loudness or volume.
  • the unit of sound intensity is decibel (decibel, dB). Timbre is also called fret.
  • the frequency of sound waves determines the pitch of the sound. The higher the frequency, the higher the pitch.
  • the number of times an object vibrates within one second is called frequency, and the unit of frequency is hertz (Hz).
  • the frequency of sound that can be recognized by the human ear is between 20Hz and 20000Hz.
  • the amplitude of the sound wave determines the intensity of the sound. The greater the amplitude, the greater the sound intensity. The closer the distance to the sound source, the greater the sound intensity.
  • the waveform of the sound wave determines the timbre.
  • the waveforms of sound waves include square waves, sawtooth waves, sine waves, and pulse waves.
  • sounds can be divided into regular sounds and irregular sounds.
  • Random sound refers to the sound produced by the sound source vibrating randomly. Random sounds are, for example, noises that affect people's work, study, and rest.
  • a regular sound refers to a sound produced by a sound source vibrating regularly. Regular sounds include speech and musical tones.
  • regular sound is an analog signal that changes continuously in the time-frequency domain. The analog signals may be referred to as audio signals (acoustic signals).
  • An audio signal is an information carrier that carries speech, music and sound effects.
  • the human sense of hearing has the ability to distinguish the location and distribution of sound sources in space, when the listener hears the sound in the space, he can not only feel the pitch, intensity and timbre of the sound, but also feel the direction of the sound.
  • Sound can also be divided into monophonic and stereophonic.
  • Mono has one sound channel, using a microphone to pick up the sound and using a speaker for playback.
  • Stereo has multiple sound channels, and different sound channels transmit different sound waveforms.
  • the current encoder When the audio signal is a transient signal, the current encoder does not extract the transient feature and transmit it in the code stream.
  • the transient feature is used to represent the change of the adjacent block spectrum in the transient frame of the audio signal, so that When the signal is reconstructed at the decoding end, the transient characteristics of the reconstructed audio signal cannot be obtained from the code stream, and there is a problem of poor audio signal reconstruction effect.
  • the embodiment of the present application provides an audio processing technology, especially an audio coding technology for multi-channel signals to improve the traditional audio coding system.
  • a multi-channel signal refers to an audio signal including multiple channels, such as The multi-channel signal may be a stereo signal.
  • Audio processing includes two parts: audio encoding and audio decoding. Audio encoding is performed on the source side and involves encoding (eg, compressing) raw audio to reduce the amount of data required to represent the audio for more efficient storage and/or transmission. Audio decoding is performed at the destination, including inverse processing relative to the encoder to reconstruct the original audio. The encoding part and the decoding part are also collectively referred to as encoding.
  • the technical solution of the embodiment of the present application can be applied to various audio processing systems, as shown in FIG. 1 , which is a schematic diagram of the composition and structure of the audio processing system provided by the embodiment of the present application.
  • the audio processing system 100 may include: an encoding device 101 for a multi-channel signal and a decoding device 102 for a multi-channel signal.
  • the coding device 101 of the multi-channel signal can also be referred to as an audio coding device, and can be used to generate a code stream, and then the audio coded code stream can be transmitted to the decoding device 102 of the multi-channel signal through an audio transmission channel, and the multi-channel signal
  • the decoding device 102 can also be called a multi-audio decoding device, which can receive the code stream, and then perform the audio decoding function of the multi-channel signal decoding device 102, and finally obtain the reconstructed signal.
  • the multi-channel signal encoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network equipment.
  • the multi-channel signal encoding device can be It is an audio encoder of the above-mentioned terminal equipment or wireless equipment or core network equipment.
  • the multi-channel signal decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the multi-channel signal decoding device can be the above-mentioned terminal device or Audio decoder for wireless devices or core network devices.
  • the audio encoder may include a radio access network, a media gateway of the core network, a transcoding device, a media resource server, a mobile terminal, a fixed network terminal, etc., and the audio encoder may also be a virtual reality (VR) ) audio encoders in streaming services.
  • VR virtual reality
  • the end-to-end encoding and decoding process for audio signals includes: audio signal A is collected After the module (acquisition), the preprocessing operation (audioPReprocessing) is performed.
  • the preprocessing operation includes filtering out the low-frequency part of the signal.
  • the rendered signal is mapped to the listener's headphones (headphones), which may be independent headphones or headphones on a glasses device.
  • FIG. 2a it is a schematic diagram of an audio encoder and an audio decoder provided in the embodiment of the present application applied to a terminal device.
  • Each terminal device may include: an audio encoder, a channel encoder, an audio decoder, and a channel decoder.
  • the channel encoder is used for channel coding the audio signal
  • the channel decoder is used for channel decoding the audio signal.
  • the first terminal device 20 may include: a first audio encoder 201 , a first channel encoder 202 , a first audio decoder 203 , and a first channel decoder 204 .
  • the second terminal device 21 may include: a second audio decoder 211 , a second channel decoder 212 , a second audio encoder 213 , and a second channel encoder 214 .
  • the first terminal device 20 is connected to a wireless or wired first network communication device 22, the first network communication device 22 is connected to a wireless or wired second network communication device 23 through a digital channel, and the second terminal device 21 is connected to a wireless or wired network communication device.
  • the second network communication device 23 may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
  • the terminal device as the sending end first collects audio, performs audio coding on the collected audio signal, and then performs channel coding, and then transmits in a digital channel through a wireless network or a core network.
  • the terminal device as the receiving end performs channel decoding according to the received signal to obtain the code stream, and then recovers the audio signal through audio decoding, and the terminal device at the receiving end enters the audio playback.
  • the wireless device or the core network device 25 includes: a channel decoder 251, other audio decoders 252, an audio encoder 253 provided in the embodiment of the present application, and a channel encoder 254, wherein the other audio decoders 252 refer to Audio codecs other than audio codecs.
  • the channel decoder 251 is first used to perform channel decoding on the signal entering the device, and then other audio decoders 252 are used for audio decoding, and then the audio encoder 253 provided by the embodiment of the present application is used for decoding.
  • the channel coder 254 is used to perform channel coding on the audio signal, and the channel coding is completed before transmission.
  • the other audio decoder 252 performs audio decoding on the code stream decoded by the channel decoder 251 .
  • FIG. 2c it is a schematic diagram of an audio decoder provided by the embodiment of the present application being applied to a wireless device or a core network device.
  • the wireless device or the core network device 25 includes: a channel decoder 251, an audio decoder 255 provided in the embodiment of the present application, other audio encoders 256, and a channel encoder 254, wherein the other audio encoders 256 refer to Audio codecs other than audio codecs.
  • the signal entering the device is first channel-decoded by the channel decoder 251, then the received audio coded stream is decoded using the audio decoder 255, and then other audio encoders 256 are used to Perform audio encoding, and finally use the channel encoder 254 to perform channel encoding on the audio signal, and then transmit it after completing the channel encoding.
  • the wireless device refers to equipment related to radio frequency in communication
  • the core network device refers to equipment related to core network in communication.
  • the multi-channel signal coding device can be applied to various terminal devices that require audio communication, wireless devices and core network devices that require transcoding, such as a multi-channel signal coding device It may be a multi-channel encoder of the above-mentioned terminal equipment, wireless equipment, or core network equipment.
  • the multi-channel signal decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
  • the multi-channel signal decoding device can be the above-mentioned terminal device or Multi-channel decoder for wireless devices or core network devices.
  • a schematic diagram of the application of the multi-channel encoder and multi-channel decoder provided by the embodiment of the present application to the terminal equipment may include: a multi-channel encoder, a channel encoder, Multi-channel decoder, channel decoder.
  • the multi-channel encoder may execute the audio encoding method provided in the embodiment of the present application
  • the multi-channel decoder may execute the audio decoding method provided in the embodiment of the present application.
  • the channel encoder is used to perform channel coding on the multi-channel signal
  • the channel decoder is used to perform channel decoding on the multi-channel signal.
  • the first terminal device 30 may include: a first multi-channel encoder 301 , a first channel encoder 302 , a first multi-channel decoder 303 , and a first channel decoder 304 .
  • the second terminal device 31 may include: a second multi-channel decoder 311 , a second channel decoder 312 , a second multi-channel encoder 313 , and a second channel encoder 314 .
  • the first terminal device 30 is connected to a wireless or wired first network communication device 32, and the first network communication device 32 is connected to a wireless or wired second network communication device 33 through a digital channel, and the second terminal device 31 is connected to a wireless or wired network communication device.
  • the second network communication device 33 is connected to a wireless or wired network communication device.
  • the aforementioned wireless or wired network communication equipment may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
  • the terminal device as the sending end performs multi-channel coding on the collected multi-channel signal, and then performs channel coding, and then transmits it in a digital channel through a wireless network or a core network.
  • the terminal device as the receiving end performs channel decoding according to the received signal to obtain the coded code stream of the multi-channel signal, and then restores the multi-channel signal through multi-channel decoding, which is played back by the terminal device as the receiving end.
  • FIG. 3b it is a schematic diagram of a multi-channel encoder applied to a wireless device or a core network device provided by the embodiment of the present application, wherein the wireless device or the core network device 35 includes: a channel decoder 351, other audio decoders 352 , the multi-channel encoder 353, and the channel encoder 354 are similar to those in FIG. 2b, and will not be repeated here.
  • FIG. 3c it is a schematic diagram of a multi-channel decoder applied to a wireless device or a core network device provided by the embodiment of the present application, wherein the wireless device or the core network device 35 includes: a channel decoder 351, a multi-channel decoder 355 , other audio encoder 356 , and channel encoder 354 are similar to those in FIG. 2 c and will not be repeated here.
  • the audio encoding process can be a part of the multi-channel encoder, and the audio decoding process can be a part of the multi-channel decoder.
  • performing multi-channel encoding on the collected multi-channel signal can be the After the multi-channel signal is processed, the audio signal is obtained, and then the obtained audio signal is encoded according to the method provided in the embodiment of the present application; the decoding end encodes the code stream according to the multi-channel signal, decodes and obtains the audio signal, and after the up-mixing process Recover the multi-channel signal. Therefore, the embodiments of the present application may also be applied to multi-channel encoders and multi-channel decoders in terminal devices, wireless devices, and core network devices. In wireless or core network equipment, if transcoding needs to be implemented, corresponding multi-channel encoding processing needs to be performed.
  • the method can be executed by a terminal device.
  • the terminal device can be a device for encoding a multi-channel signal (hereinafter referred to as an encoding terminal or an encoder,
  • the encoding end may be an artificial intelligence (AI) encoder).
  • the multi-channel signal may include multiple channels, such as a first channel and a second channel, or the multiple channels may include a first channel, a second channel, a third channel, and so on.
  • the encoding process of the first channel will be described emphatically.
  • the M blocks of the first channel include For the first block of the first channel, the first transient flag of the first block is used to indicate that the first block is a transient block, or indicate that the first block is a non-transient block.
  • the encoding end first obtains the multi-channel signal to be encoded, and performs frame division processing on the multi-channel signal to be encoded, so as to obtain the current frame of the multi-channel signal to be encoded.
  • the encoding process of the current frame is taken as an example for illustration, and the encoding manner of other frames of the multi-channel signal to be encoded is similar to the encoding manner of the current frame.
  • the current frame of the multi-channel signal to be encoded includes the first channel and the second channel, and each channel includes the frequency spectrum of M blocks, for example, the first channel can be the left channel, and the second channel can be the right channel road.
  • the first channel and the second channel may be any two channels among the plurality of channels, or the first channel and the second channel may be signals of two channels obtained from a multi-channel signal.
  • the current frame may also include 3 or more sound channels, which is not limited here.
  • the methods of obtaining transient identifiers, obtaining group information, and grouping are similar.
  • only the processing of the first channel is taken as an example.
  • the second channel For the processing of the sound channel reference may be made to the processing method for the first sound channel, and details are not repeated here.
  • the encoder After the encoder determines the current frame, it performs windowing processing on the current frame and performs time-frequency transformation. If the current frame includes M blocks, the spectrum of the M blocks in the current frame can be obtained, and M represents the number of blocks included in the current frame. Number, the value of M is not limited in the embodiment of the present application, for example, the audio signal of the current frame is divided into blocks (block), and the audio signal of M blocks is obtained, and the audio signal of one block and the audio signal of the block The length of the window function used when windowing the signal is the same, and then windowing and time-frequency transformation are performed on the audio signals of the M blocks, so that the spectrum of the M blocks can be obtained.
  • the encoding end performs time-frequency transformation on the windowed audio signals of M blocks in the current frame to obtain the modified discrete cosine transform (modified discrete cosine transform, MDCT) spectrum of M blocks.
  • MDCT modified discrete cosine transform
  • M The spectrum of the blocks is an MDCT spectrum as an example. It is not limited that the spectrum of the M blocks may also be other spectrums.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the encoding end After obtaining the frequency spectra of the M blocks, the encoding end obtains M transient identifiers of the M blocks respectively according to the frequency spectra of the M blocks.
  • the frequency spectrum of each block is used to determine the transient identifier of the block, and each block corresponds to a transient identifier
  • the transient identifier of a block is used to indicate the spectrum change of the block in the M blocks.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the M blocks of the first audio channel include the fourth block, the index of the fourth block is different from that of the first block.
  • the transient flag may indicate that the first block is a transient block, or the transient flag may indicate that the first block is a non-transient block.
  • the transient state of a block is marked as transient, which means that the spectrum of this block has a large change compared with the spectrum of other blocks in the M blocks, and the transient state of a block is marked as non-transient, which means that the spectrum of this block is compared with M The spectrum of other blocks in a block does not change much.
  • the transient flag occupies 1 bit, if the value of the transient flag is 0, it indicates that the corresponding block is a transient block, and if the value of the transient flag is 1, it indicates that the corresponding block is a non-transient block.
  • the value of the transient flag is 1, it indicates that the corresponding block is a transient block, and if the value of the transient flag is 0, it indicates that the corresponding block is a non-transient block, which is not limited here.
  • the M transient identifiers of the M blocks are used to group the M blocks, and the first M identifiers of the M blocks are obtained according to the M transient identifiers of the M blocks
  • a grouping information, the first grouping information of the M blocks can indicate the grouping method of the M blocks, and the M transient identifiers of the M blocks are the basis for grouping the M blocks, for example, blocks with the same transient identifier can be grouped Grouped into one group, blocks with different transient identities are grouped into different groups.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the first grouping information includes: the first grouping quantity or the first grouping quantity identifier of the M blocks of the first sound channel, and the first grouping quantity identifier is used to indicate the first grouping quantity, when the first When the number of a group is greater than 1, the first group information also includes: M first transient identifiers; or, the first group information includes: M first transient identifiers, that is to say, the first group information may not directly include group number, but the number of groups is indirectly indicated by M first transient identifiers, that is, when the M first transient identifiers indicate that the M blocks of the first channel are all transient blocks or non-transient blocks, the grouping The number is 1, and when the M first transient identifiers indicate that the M blocks of the first audio channel include transient blocks and non-transient blocks, the number of groups is 2.
  • the first grouping information of M blocks can have multiple implementations, the first grouping information of M blocks includes: the grouping quantity or grouping quantity identification of M blocks, the grouping quantity identification is used to indicate the grouping quantity, when the grouping quantity When it is greater than 1, the first group information of the M blocks also includes: M transient identifiers of the M blocks; or, the first group information of the M blocks includes: M transient identifiers of the M blocks.
  • the first grouping information of the M blocks can indicate the grouping of the M blocks, so that the encoding end can use the grouping information to arrange the spectrums of the M blocks in groups.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the first grouping information of M blocks includes: the number of groups of M blocks and the transient identifiers of M blocks.
  • the transient identifiers of the M blocks can also be called grouping flag information. Therefore, the grouping information in the embodiment of the present application It can include group number and group flag information. For example, the value of the number of groups may be 1 or 2.
  • the group flag information is used to indicate the transient identity of the M blocks.
  • the first grouping information of M blocks includes: transient identifiers of M blocks, and the transient identifiers of M blocks may also be called grouping flag information, so grouping information in this embodiment of the application may include grouping flag information.
  • group flag information is used to indicate the transient identity of the M blocks.
  • the first grouping information of M blocks includes: the grouping number of M blocks is 1, that is, when the grouping number is equal to 1, the first grouping information of M blocks does not include M transient identifiers, and when the grouping number is greater than 1
  • the first group information of the M blocks further includes: M transient identifiers of the M blocks.
  • the number of groups in the first grouping information of M blocks can also be replaced by a group number identifier, which is used to indicate the number of groups.
  • the quantity is 2.
  • the M blocks of the second channel include the second block, the second transient identifier of the second block is used to indicate that the second block is a transient block, or indicates that the second block is a non-transient block;
  • steps 403 to 404 are implemented similarly to steps 401 to 402 described above, and will not be repeated here.
  • the encoding end After obtaining the frequency spectra of the M blocks of the second channel of the current frame, the encoding end obtains M transient identifiers of the M blocks according to the frequency spectra of the M blocks.
  • the frequency spectrum of each block is used to determine the transient identifier of the block, and each block corresponds to a transient identifier
  • the transient identifier of a block is used to indicate the spectrum change of the block in the M blocks.
  • a certain block included in the M blocks is the second block, and the second block corresponds to a transient identifier.
  • the M blocks of the second audio channel include the third block, the index of the third block is different from that of the second block.
  • the first adjusted group information is the same as the first group information and the second adjusted group information is obtained based on adjusting the second group information; or, the first adjusted group information is obtained based on adjusting the first group information and The second adjusted group information is the same as the second group information; or, the first adjusted group information is obtained based on adjusting the first group information, and the second adjusted group information is obtained based on adjusting the second group information.
  • the first grouping information includes: the first grouping quantity or the first grouping quantity identifier of the M blocks of the first sound channel, and the first grouping quantity identifier is used to indicate the first grouping quantity, when the first When the number of a group is greater than 1, the first group information also includes: M first transient identifiers; or, the first group information includes: M first transient identifiers;
  • the second grouping information includes: the second grouping number or the second grouping number identification of the M blocks of the second channel, the second grouping number identification is used to indicate the second grouping number, when the second grouping number is greater than 1, the second grouping number
  • the grouping information also includes: M second transient identifiers; or, the second grouping information includes: M second transient identifiers;
  • the first adjustment group information includes: the first adjustment group number or the first adjustment group number identification of the M blocks of the first channel, the first adjustment group number identification is used to indicate the first adjustment group number, when the first adjustment group number When greater than 1, the first adjustment grouping information also includes: M first adjusted transient identifiers of M blocks of the first sound channel, first adjusted transient identifiers of the first block and first transient identifiers of the first block Different or the first adjusted transient identifier of the first block is the same as the first transient identifier of the first block; or, the first adjusted group information includes: M first adjusted transient identifiers;
  • the second adjustment group information includes: the second adjustment group number or the second adjustment group number identification of the M blocks of the second channel, the second adjustment group number identification is used to indicate the second adjustment group number, when the second adjustment group number When it is greater than 1, the second adjustment group information also includes: M second adjusted transient identifiers of M blocks of the second channel, the second adjusted transient identifier of the second block and the second transient identifier of the second block It is different or the second adjusted transient identifier of the second block is the same as the second transient identifier of the second block; or, the second adjusted group information includes: M second adjusted transient identifiers.
  • the first grouping information, the second grouping information, the first adjusted grouping information, and the second adjusted grouping information may be implemented in any one of the aforementioned specific implementations for grouping information, which is not limited here.
  • the first group information includes: The first group number or the first group number identifier of the block
  • the first adjusted group information includes: the first adjusted group number or the first adjusted group number identifier of the M blocks of the first channel, when the first group information has not been adjusted , the first group number is the same as the first adjusted group number, and the first group number identifier is the same as the first adjusted group number identifier.
  • the first group quantity and the first adjusted group quantity can be the same or different, for example, the adjustment for the first group information does not change the group quantity, then the first group quantity and the first adjusted group quantity The number is the same, if the adjustment of the first group information changes the group number, the first group number is different from the first adjusted group number, for example, before the first group information is adjusted, the first group number is 2, and the first group information is adjusted Afterwards, the first adjustment group number is 1.
  • the first group quantity identifier and the first adjusted group quantity identifier may be the same or different. For example, before the first group information is adjusted, the first group number is 2, and the first group number identifier is 1. After the first group information is adjusted, if the first adjusted group number is 2, the first group number identifier is still 1.
  • the second adjusted group information and the second group information may be the same or different, which will not be repeated here.
  • the number of transient blocks in the M blocks of the first channel indicated by the first adjustment group information is the same as the number of transient blocks in the M blocks of the second channel indicated by the second adjustment group information. the same amount.
  • the position (index) of the transient block in the M blocks of the first channel indicated by the first adjustment group information is the same as the position (index) of the transient block in the M blocks of the second channel indicated by the second adjustment group information (index) may be the same, or the position (index) of the transient block in the M blocks of the first channel indicated by the first adjustment group information is the same as the position (index) of the M blocks in the second channel indicated by the second adjustment group information
  • the transient block positions (indexes) may also be different.
  • the number of transient blocks in the M blocks of the first channel indicated by the first adjustment group information is the same as the number of transient blocks in the M blocks of the second channel indicated by the second adjustment group information.
  • the number of blocks is the same, and the position (index) of the transient block in the M blocks of the first channel indicated by the first adjustment group information is the same as the position (index) of the M blocks in the second channel indicated by the second adjustment group information.
  • the transient block position (index) is also the same.
  • the current frame includes the first channel and the second channel. If the grouping information of the above two channels meets the preset conditions, the grouping information needs to be adjusted.
  • the preset conditions need to be determined in combination with specific application scenarios , is not limited here.
  • at least one of the first grouping information and the second grouping information can be adjusted so that the number of transient blocks of the first sound channel and the second sound The number of transient blocks of the same track is the same, which facilitates the subsequent encoding operation.
  • the encoder needs to adjust at least one group information in the first group information and the second group information to obtain the first adjusted group information and the second adjusted group information. For example, only the first group information is adjusted, then the first adjusted group information is obtained based on adjusting the first group information, and the second adjusted group information is the same as the second group information. In another example, only the second group information is adjusted, the first adjusted group information is the same as the first group information, and the second adjusted group information is obtained based on adjusting the second group information.
  • the first adjusted group information is obtained based on an adjustment to the first group information
  • the second adjusted group information is obtained based on an adjustment to the second group information.
  • the encoding end adjusts at least one grouping information in the first grouping information and the second grouping information, so that the adjusted grouping information can be used for grouping arrangement, so as to obtain the frequency spectrum to be encoded.
  • the preset condition includes: the first group information is inconsistent with the second group information.
  • the inconsistency between the first group information and the second group information means that the first group information and the second group information are not completely consistent, and when the first group information is inconsistent with the second group information, it can be considered that the first group information and the second group information satisfy A preset condition, when the first group information is consistent with the second group information, it may be considered that the first group information and the second group information do not satisfy the preset condition.
  • the number of groups of M blocks in the first group information is the same as the number of groups of M blocks in the second group information, but the M first transient identifiers included in the first group information are the same as the M first transient identifiers included in the second group information.
  • the two transient identities are different.
  • Another example is that the number of groups of M blocks in the first group information is different from the number of groups of M blocks in the second group information.
  • This preset condition needs to be determined in conjunction with specific application scenarios, and is not limited here. By setting the above preset conditions, it can be determined whether to adjust the first group information and the second group information.
  • the inconsistency between the first group information and the second group information includes: The M blocks of include transient blocks and non-transient blocks, the M second transient flags indicate that the M blocks of the second channel include transient blocks and non-transient blocks, and the M first transient flags and M The second transient identifier is inconsistent;
  • Inconsistency between the first group information and the second group information includes: M first transient identifiers indicating that the M blocks of the first channel include transient blocks and non-transient blocks, and M second transient identifiers indicating the second channel
  • the M blocks of include transient blocks and non-transient blocks, and the number of transient blocks in the first channel is different from the number of transient blocks in the second channel;
  • Inconsistency between the first group information and the second group information includes: M first transient identifiers indicating that the M blocks of the first channel include transient blocks and non-transient blocks, and M second transient identifiers indicating the second channel
  • the M blocks of include transient blocks and non-transient blocks, the M first transient flags and the M second transient flags are inconsistent, and the Nth block in the M blocks of the first channel and the second channel
  • the Nth block among the M blocks of is transient, 0 ⁇ N ⁇ M.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks.
  • the blocks of the second channel The M blocks include transient blocks and non-transient blocks.
  • the inconsistency between the M first transient identifiers and the M second transient identifiers refers to the value of at least one transient identifier of the M first transient identifiers and the same index of the M second transient identifiers. different.
  • one block A in the M blocks of the first channel is a transient block
  • one block B in the M blocks of the second channel is a transient block.
  • the first transient identifier of the block A is consistent with the second transient identifier of the block B.
  • one block C in the M blocks of the first channel is a non-transient block
  • one block D in the M blocks of the second channel is a transient block. If block C is in the M blocks of the first channel
  • the index in is the same as the index of the M blocks of block D in the second channel, then the first transient identifier of block A is inconsistent with the second transient identifier of block B.
  • the M first transient identifiers and the M second transient identifiers are inconsistent, it may be determined that the first group information and the second group information meet a preset condition, and at this time the group information needs to be adjusted.
  • the M first transient identifiers are completely consistent with the M second transient identifiers, it may be determined that the first group information and the second group information do not meet the preset condition, and at this time the group information is not adjusted.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks, so the statistics of the first channel can be obtained
  • the number of transient blocks included similarly, the M blocks of the second channel include transient blocks and non-transient blocks, so the number of transient blocks included in the second channel can be obtained through statistics.
  • the number of transient blocks of the first channel is different from the number of transient blocks of the second channel, it can be determined that the first grouping information and the second grouping information meet the preset conditions. At this time, it is necessary to carry out grouping information Adjustment.
  • the number of transient blocks of the first channel is the same as the number of transient blocks of the second channel, it may be determined that the first group information and the second group information do not meet the preset condition, and the group information is not adjusted at this time.
  • some of the M blocks of the first channel are transient blocks, and some of the M blocks of the first channel are non-transient blocks.
  • the blocks of the second channel The M blocks include transient blocks and non-transient blocks.
  • the inconsistency between the M first transient identifiers and the M second transient identifiers refers to the value of at least one transient identifier of the M first transient identifiers and the same index of the M second transient identifiers. different.
  • one block A in the M blocks of the first channel is a transient block
  • one block B in the M blocks of the second channel is a transient block.
  • the first transient identifier of the block A is consistent with the second transient identifier of the block B.
  • one block C in the M blocks of the first channel is a non-transient block
  • one block D in the M blocks of the second channel is a transient block. If block C is in the M blocks of the first channel
  • the index in is the same as the index of the M blocks of block D in the second channel, then the first transient identifier of block A is inconsistent with the second transient identifier of block B.
  • the Nth block of the M blocks of the first channel and the Nth block of the M blocks of the second channel are transient, 0 ⁇ N ⁇ M, the index of the Nth block of the first channel and the Nth block
  • the index of the Nth block of the two channels is the same, and the value of N and the number of values of N are not limited. For example, when the value of N is 1, it means the first channel and the second channel. There is one transient block with the same index, for example, when the value of N is 2, it means that there are two transient blocks with the same index in the first channel and the second channel.
  • the M first transient identifiers are completely consistent with the M second transient identifiers, or the M first transient identifiers are inconsistent with the M second transient identifiers and the first channel and the second channel do not have the same index It may be determined that the first grouping information and the second grouping information do not meet the preset condition when the transient block is used, and the grouping information is not adjusted at this time.
  • the M blocks of the first channel have respective indexes
  • the M blocks of the second channel have respective indexes
  • the first group information When the first group information is inconsistent with the second group information, it includes: M first transient identifiers indicating that the M blocks of the first channel include transient blocks and non-transient blocks, and M second transient identifiers indicating the second audio
  • the M blocks of the channel include transient blocks and non-transient blocks, and the number of transient blocks in the first channel is inconsistent with the number of transient blocks in the second channel, if the transient blocks in the M blocks of the first channel
  • the index of the state block has no intersection with the index of the transient state block in the M blocks of the second channel, and step 405 obtains the first adjustment group information and the second adjustment group information according to the first group information and the second group information, including:
  • the first group information is adjusted to obtain the first adjusted group information, the first adjusted group information indicates the The number of transient blocks is equal to the number of transient blocks of the second channel indicated by the second grouping information;
  • the second group information is adjusted to obtain the second adjusted group information, and the second adjusted group information indicates the The number of transient blocks is equal to the number of transient blocks of the first channel indicated by the first group information.
  • the M blocks of the first channel have indexes respectively, such as from 0 to M-1 being the indexes of M blocks
  • the M blocks of the second channel have indexes respectively, such as from 0 to M-1 1 is the index of M blocks.
  • the index of the transient block in the M blocks of the first channel has no intersection with the index of the transient block in the M blocks of the second channel, that is, the index of the transient block in the M blocks of the first channel Quite different from the index of the transient block in the M blocks of the second channel.
  • the transient flag of a transient block is 0, and the transient flag of a non-transient block is 1.
  • the value of M is 4, and the transient identifiers of the four blocks (indices 0-3) of the first audio channel are 1011 (respectively corresponding to the indices 0-3, that is, the transient identifiers of the block whose index is 0 value 1, the block with index 1 has a transient flag value of 0, the block with index 2 has a transient flag value of 1, and the block with index 3 has a transient flag value of 1), the second The transient identifiers of the 4 blocks (indices 0-3) of the audio channel are 0110 (corresponding to indexes 0-3 respectively, that is, the value of the transient identifier of the block whose index is 0 is 0, and the value of the transient identifier of the block whose index is 1 The value of the transient flag is 1, the value of the transient flag of the block whose index is 2 is 1, and the value of the transient flag of the block whose index is 3 is 0), then the first channel sum has a transient block, There are two transient block,
  • the grouping information of the channel with the smaller number of transient blocks needs to be adjusted, while the grouping information of the channel with the larger number of transient blocks remains unchanged, and after adjustment, the two The number of transient blocks indicated by the grouping information of the channel is the same.
  • the number of transient blocks of the first channel and the second channel can be made the same, so that it is convenient for the subsequent The frequency spectrum of the channel is encoded.
  • the index of the transient block in the M blocks of the first channel and the index of the transient block in the M blocks of the second channel have no intersection means that among the M blocks of the first channel and the second
  • the transient identifiers of the two blocks corresponding to the same index among the M blocks of the channel are different, that is, taking M as 4 as an example for illustration, the transient state of the block whose index is 0 among the M blocks of the first channel
  • the identifier is different from the transient identifier of the block whose index is 0 in the M blocks of the second channel
  • the transient identifier of the block whose index is 1 among the M blocks of the first channel is the same as that of the M blocks of the second channel
  • the block whose index is 1 is different from the transient identifier
  • the transient identifier of the block whose index is 2 among the M blocks of the first channel is different from the transient identifier of the block whose index is 2 among the M blocks of the second channel
  • the first group information is adjusted to obtain the first adjusted group information
  • the adjustment of the first group information can be Including adjusting the first transient flags of the M blocks, for example, adjusting the first transient flag of the first block in the M blocks from non-transient to transient, so that the number of transient blocks in the first channel increases , so that the number of transient blocks of the first channel in the first adjustment grouping information (that is, the number of adjusted transient blocks of the first channel) is the same as the transient number of the second channel indicated by the second grouping information The number of blocks is equal.
  • the second group information is adjusted to obtain the second adjusted group information.
  • the adjustment of the second group information may include The second transient identification of the M blocks is adjusted, for example, the second transient identification of the second block in the M blocks is adjusted from non-transient to transient, so that the number of transient blocks of the second channel increases, so that The number of transient blocks of the second channel in the second adjustment grouping information (that is, the adjusted number of transient blocks of the second channel) and the number of transient blocks of the first channel indicated by the first grouping information equal.
  • the M blocks of the first channel have respective indexes
  • the M blocks of the second channel have respective indexes
  • the first group information When the first group information is inconsistent with the second group information, it includes: M first transient identifiers indicating that the M blocks of the first channel include transient blocks and non-transient blocks, and M second transient identifiers indicating the second audio
  • the M blocks of the channel include transient blocks and non-transient blocks, and the number of transient blocks in the first channel is inconsistent with the number of transient blocks in the second channel, if the transient blocks in the M blocks of the first channel
  • the index of the state block has an intersection with the index of the transient state block in the M blocks of the second channel, and step 405 obtains the first adjustment group information and the second adjustment group information according to the first group information and the second group information, including:
  • the indexes of the transient blocks indicated by the M first transient identifiers are part of the indexes of the transient blocks indicated by the M second transient identifiers, at least one of the M first transient identifiers is adjusted to obtain M first adjusted transient identifiers, the indexes of all the transient blocks indicated by the M first adjusted transient identifiers are the same as the indexes of all the transient blocks indicated by the M second transient identifiers;
  • the indexes of the transient blocks indicated by the M second transient identifiers are part of the indexes of the transient blocks indicated by the M first transient identifiers, at least one of the M second transient identifiers is adjusted to obtain M second adjusted transient identifiers, the indexes of all transient blocks indicated by the M second adjusted transient identifiers are the same as the indexes of all transient blocks indicated by the M first transient identifiers;
  • At least one of the M first transient identifiers is adjusted to obtain M first adjusted transient identifiers, adjusting at least one of the M second transient identifiers to obtain M second adjusted transient identifiers, indexes of all transient blocks indicated by the M first adjusted transient identifiers It is the same as the indexes of all transient blocks indicated by the M second adjusted transient identifiers.
  • the M blocks of the first channel have indexes respectively, such as from 0 to M-1 being the indexes of M blocks
  • the M blocks of the second channel have indexes respectively, such as from 0 to M-1 1 is the index of M blocks.
  • the index of the transient block in the M blocks of the first channel intersects the index of the transient block in the M blocks of the second channel, that is, the index of the transient block in the M blocks of the first channel Partially the same as the index of the transient block in the M blocks of the second channel, but not exactly the same.
  • the transient flag bit is 0 for a transient block
  • the transient flag bit is 1 for a non-transient block.
  • the value of M is 4, the transient identifier of the 4 blocks of the first channel is 0011, and the transient identifier of the 4 blocks of the second channel is 0111, then the first channel has two transient blocks, There is one transient block for the second channel, two transient blocks for the first channel with indices 0 and 1, one transient block for the second channel with index 0, and one transient block for the first channel Index 0 and a transient block index 0 of the second channel are the same, that is, the index of the transient block in the 4 blocks of the first channel is the same as the index of the transient block in the 4 blocks of the second channel There is an intersection.
  • the index of the transient block in the M blocks of the first audio channel is intersected with the index of the transient block in the M blocks of the second audio channel, and there are multiple implementations.
  • the number of transient blocks of the first channel is smaller than the number of transient blocks of the second channel, that is, the indexes of the transient blocks indicated by M first transient identifiers are M second transient A part of the index of the transient block indicated by the identifier.
  • the first transient identifiers of the M blocks of the first channel need to be adjusted, and the second transient identifiers of the M blocks of the second channel remain unchanged.
  • M At least one of the first transient identifiers is adjusted to obtain M first adjusted transient identifiers.
  • the indices of the transient blocks are the same, and the number of transient blocks indicated by the grouping information of the two channels is the same after adjustment. Through this adjustment method, the number of transient blocks of the first channel and the second channel can be the same. In this way, it is convenient to perform subsequent coding on the frequency spectrum of the first channel and the second channel.
  • the number of transient blocks of the second channel is smaller than the number of transient blocks of the first channel, that is, the indexes of the transient blocks indicated by M second transient identifiers are M first transients A part of the index of the transient block indicated by the identifier.
  • the second transient identifiers of the M blocks of the second channel need to be adjusted, and the first transient identifiers of the M blocks of the first channel remain unchanged.
  • M At least one of the second transient identifiers is adjusted to obtain M second adjusted transient identifiers.
  • the indices of the transient blocks are the same, and the number of transient blocks indicated by the grouping information of the two channels is the same after adjustment. Through this adjustment method, the number of transient blocks of the first channel and the second channel can be the same. In this way, it is convenient to perform subsequent coding on the frequency spectrum of the first channel and the second channel.
  • the number of transient blocks of the second channel is not equal to the number of transient blocks of the first channel, but the indexes of the transient blocks indicated by the M first transient identifiers are consistent with the indexes of the M second transient blocks.
  • the index part of the transient block indicated by the state identifier is the same, and the same part here means that the index of some transient block in the M blocks of the first channel is the same as the index of some transient block in the M blocks of the second channel.
  • the indexes of the status blocks are partially the same, but not identical.
  • the first transient markers of the M blocks of the first channel need to be adjusted, and the second transient markers of the M blocks of the second channel need to be adjusted, that is, the transient markers of the M blocks of the two channels are all Adjustment is required, at least one of the M first transient identifiers is adjusted to obtain M first adjusted transient identifiers, and at least one of the M second transient identifiers is adjusted to obtain M second adjusted transient identifiers.
  • the index of all the transient blocks indicated by the M first adjusted transient state identifiers is the same as the index of all the transient blocks indicated by the M second adjusted transient state identifiers.
  • the number of transient blocks indicated by the grouping information of the two channels is the same. Through this adjustment method, the number of transient blocks in the first channel and the second channel can be made the same, so that it is convenient for the subsequent The spectrum of the first and second channel is encoded.
  • adjusting at least one of the M first transient identifiers to obtain M first adjusted transient identifiers includes:
  • the first transient identification of the first block indicates that the first block is a non-transient block
  • the second transient identification of the third block of the M blocks of the second channel indicates that the third block is a transient block
  • the The first adjusted transient identifier of the first block is adjusted to the first adjusted transient identifier of the first block
  • the first adjusted transient identifier of the first block indicates that the first block is a transient block
  • the index of the first block and the third block have the same index
  • Adjusting at least one of the M second transient identifiers to obtain M second adjusted transient identifiers includes:
  • the second transient identification of the second block indicates that the second block is a non-transient block
  • the first transient identification of the fourth block of the M blocks of the first channel indicates that the fourth block is a transient block
  • the The second transient identification of the second block is adjusted to the second adjusted transient identification of the second block
  • the second adjusted transient identification of the second block indicates that the second block is a transient block
  • the index of the second block is the same as that of the fourth block have the same index.
  • the adjustment of the M first transient flags is similar to the adjustment of the M second transient flags.
  • the adjustment of the first transient flags is used as an example for illustration.
  • the first transient flag of the first block indicates When the first block is a non-transient block
  • the second transient identification of the third block of the M blocks of the second sound channel indicates that the third block is a transient block
  • the first transient identification of the first block is adjusted to The first adjusted transient identifier of the first block
  • the first adjusted transient identifier of the first block indicates that the first block is a transient block
  • the index of the first block is the same as the index of the third block.
  • the first transient flag of the first block is 1, and the second transient flag of the third block is 0, the index of the first block and the index of the third block are both 4, then the first adjustment of the first block The transient flag is 0.
  • the number of transient blocks of the first channel and the second channel can be made the same, thereby facilitating subsequent encoding of the frequency spectrum of the first channel and the second channel.
  • the method executed by the encoding end further includes:
  • A1 Encoding the first adjusted group information and the second adjusted group information to obtain a group information encoding result.
  • the coding end encodes the first adjustment group information and the second adjustment group information to obtain the group information coding result, and the coding adopted for the adjustment group information
  • the method is not limited here.
  • the group information coding result can be obtained, and the group information coding result can be written into the code stream, so that the code stream can carry the group information coding result, so that the decoding end can obtain the group by parsing the code stream
  • the information encoding result is analyzed to obtain the first adjustment group information and the second adjustment group information.
  • step 409 may be executed first, and then step A2 may be executed, or step A2 may be executed first, and then step 409 may be executed, or step A2 and step 409 may be executed simultaneously. There is no limit.
  • the first frequency spectrum to be encoded is the first frequency spectrum to be encoded of the first channel of the current frame, and the first frequency spectrum to be encoded may also be referred to as the frequency spectrum of M blocks arranged in groups of the first channel.
  • the encoding end Taking the encoding end obtaining the first adjustment group information as an example, after the encoding end obtains the first adjustment group information of M blocks, it can use the first adjustment group information of the M blocks to process the spectrum of the M blocks of the current frame,
  • the first adjustment group information may be used to adjust the arrangement order of the frequency spectra of the M blocks in the current frame, and the first frequency spectrum to be encoded may be generated through the first adjustment group information.
  • first adjustment transient flags indicate that the M blocks of the first channel include transient blocks and non-transient blocks
  • Obtaining the first frequency spectrum to be coded by the frequency spectrum of the M blocks of the grouping information and the first channel includes:
  • the spectrums of the M blocks of the first channel are grouped and arranged according to the first adjustment grouping information, so as to obtain the first spectrum to be encoded.
  • the encoding end After the encoding end obtains the first adjustment grouping information of M blocks, it can use the first adjustment grouping information of the M blocks to group and arrange the frequency spectra of the M blocks of the current frame , by grouping and arranging the frequency spectra of the M blocks, the arrangement order of the frequency spectra of the M blocks in the current frame can be adjusted.
  • the above group arrangement is performed according to the first adjusted group information of the M blocks, and the first adjusted group information of the M blocks is obtained according to the M transient identifiers of the M blocks.
  • the obtained The spectrum of M blocks arranged in groups is based on the M transient identifiers of M blocks as the basis for grouping and sorting, and the coding order of the spectrum of M blocks can be changed by grouping and sorting .
  • the above M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the second spectrum to be encoded is the second spectrum to be encoded of the second channel of the current frame, and the second spectrum to be encoded may also be referred to as the spectrum of the M blocks of the second channel after grouping and arrangement.
  • Obtaining the second frequency spectrum to be encoded by the frequency spectrum of the M blocks of the grouping information and the second channel includes:
  • the spectrums of the M blocks of the second sound channel are grouped and arranged according to the second adjustment grouping information, so as to obtain a second spectrum to be encoded.
  • the spectrums of the M blocks of the first channel are grouped and arranged to obtain the first spectrum to be encoded, including:
  • the encoding end groups the M blocks based on the difference of the transient identifiers, so that the transient group and the non-transient group can be obtained, and then the M blocks are grouped in the current frame Arrange the positions in the frequency spectrum of the transient group, and arrange the frequency spectrum of the blocks in the transient group before the frequency spectrum of the blocks in the non-transient group, so as to obtain the frequency spectrum to be encoded.
  • the spectrum of all transient blocks in the spectrum to be encoded is located before the spectrum of the non-transient block, so that the spectrum of the transient block can be adjusted to a position with higher coding importance, so that the reconstructed audio after encoding and decoding using the neural network
  • the signal can better preserve the transient characteristics.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the spectrums of the M blocks of the second channel are grouped and arranged to obtain the second spectrum to be encoded, including:
  • the spectrums of the M blocks of the first channel are grouped and arranged to obtain the first spectrum to be encoded, including:
  • the spectrum of all transient blocks in the spectrum to be encoded is located before the spectrum of the non-transient block, so that the spectrum of the transient block can be adjusted to a position with higher coding importance, so that the reconstructed audio after encoding and decoding using the neural network
  • the signal can better preserve the transient characteristics.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the spectrums of the M blocks of the second channel are grouped and arranged to obtain the second spectrum to be encoded, including:
  • the encoding end after the encoding end obtains the first spectrum to be encoded and the second spectrum to be encoded, it can use the encoding neural network to perform encoding to generate a spectrum encoding result, and then write the spectrum encoding result into the code stream, The encoding end can send the code stream to the decoding end.
  • latent variables can be generated, and the latent variables represent the characteristics of the spectrum of the M blocks arranged in groups.
  • step 408 uses the encoding neural network to encode the first frequency spectrum to be encoded and the second frequency spectrum to be encoded
  • the method performed by the encoding end further includes:
  • step 408 uses the encoding neural network to encode the first spectrum to be encoded and the second spectrum to be encoded, including:
  • the encoding end can perform interleaving processing in the group according to the grouping of M blocks of each channel, thereby obtaining the interleaving processing in the group Spectrum of the next M blocks. Then the frequency spectrum of the M blocks after intra-group interleaving may be the input data of the coding neural network.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the number of transient blocks indicated by M first transient identifiers is P
  • M blocks are designated as transient blocks.
  • step D1 performs intra-group interleaving processing on the first frequency spectrum to be encoded, including:
  • performing interleaving processing on the frequency spectrum of P blocks includes performing interleaving processing on the frequency spectrum of the P blocks as a whole; similarly, performing interleaving processing on the frequency spectrum of Q blocks includes taking the frequency spectrum of the Q blocks as a whole A whole for interleaving processing.
  • step E1 utilizes an encoding neural network to encode the first frequency spectrum after intra-group interleaving processing and the second frequency spectrum after intra-group interleaving processing, including:
  • the interleaved frequency spectrum of P blocks and the interleaved frequency spectrum of Q blocks are encoded by using the coding neural network.
  • the encoder can perform interleaving processing according to the transient group and the non-transient group, so as to obtain the interleaved frequency spectrum of P blocks and the interleaved frequency spectrum of Q blocks.
  • the interleaved frequency spectrum of the P blocks and the interleaved frequency spectrum of the Q blocks can be used as input data of the encoding neural network.
  • the coding side information can also be reduced and the coding efficiency can be improved.
  • step 401 obtains the M first transient identifiers of the M blocks of the first channel according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded.
  • the methods executed by the encoding side also include:
  • the first window type of the first sound channel is a short window type or a non-short window type
  • the second window type of the second sound channel is a short window type or a non-short window type
  • the M blocks of the first channel are obtained according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded.
  • the step of the M first transient identification of the block is obtained according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded.
  • the encoding end may first determine the window type of the current frame.
  • the window type may be a short window type or a non-short window type.
  • the encoding end determines the window type according to the current frame of the multi-channel signal to be encoded.
  • the short window may also be called a short frame
  • the non-short window may also be called a non-short frame.
  • the window type is a short window type
  • the execution of the aforementioned step 401 is triggered.
  • the window type of the current frame is the short window type
  • the aforementioned encoding scheme is implemented to implement encoding when the multi-channel signal is a transient signal.
  • the method performed by the encoding end when the encoding end performs the foregoing steps F1 to F3, the method performed by the encoding end further includes:
  • the window type can be carried in the code stream, and the window type is first encoded.
  • the encoding method adopted by the window type is not limited here.
  • the window type encoding result can be obtained, and the window type encoding result can be written into the code stream, so that the code stream can carry the window type encoding result.
  • the decoding end can obtain the window type encoding result through the code stream, and analyze the window type encoding result to obtain the first window type of the first channel of the current frame and the second window type of the second channel. Determine whether to continue decoding the code stream according to the first window type of the first channel and the second window type of the second channel, so as to obtain the first decoding group information of the M blocks of the first channel.
  • step 401 obtains M first transient identifiers of the M blocks of the first channel according to the spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded, include:
  • the M spectral energies can be averaged to obtain the average value of the spectral energy, or the maximum value or several maximum values of the M spectral energies can be removed and then averaged, to obtain the spectral energy average.
  • Transient identification By comparing the spectral energy of each block in the M spectral energies with the average value of the spectral energy to determine the change of the spectrum of each block compared to the spectrum of other blocks in the M blocks, and then obtain the M of the M blocks Transient identification, wherein the transient identification of a block can be used to represent the transient characteristics of a block.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the transient identifier of each block can be determined through the spectral energy and the average value of the spectral energy of each block, so that the transient identifier of a block can determine the grouping information of the block.
  • the first transient identifier of the first block indicates that the first block is a transient block ;
  • the transient flag of the first block indicates that the first block is a non-transient block
  • K is a real number greater than or equal to 1.
  • K there are various values of K, which are not limited here.
  • the spectral energy of the first block is greater than K times the average value of the spectral energy, it means that the first block has a larger frequency spectrum than the other blocks of the M blocks. If the change is too large, the transient flag of the first block indicates that the first block is a transient block.
  • the spectrum energy of the first block is less than or equal to K times the average value of the spectrum energy, it means that the spectrum of the first block has little change compared with the other blocks of M blocks, and the transient flag of the first block indicates that the first block is non-transient block.
  • the aforementioned M blocks of the current frame may be the M blocks of the first channel of the current frame.
  • the encoder can also obtain M transient identifiers of M blocks in other ways, for example, obtain the difference or ratio between the spectral energy of the first block and the average value of the spectral energy, and according to the obtained difference or ratio value to determine M transient identifiers for M blocks.
  • the current frame of the multi-channel signal to be encoded includes a first channel and a second channel, and each channel includes the frequency spectrum of M blocks, according to the multi-channel signal to be encoded
  • the spectrum of the M blocks of the first channel of the current frame is used to obtain the M first transient identifiers of the M blocks of the first channel, and the M first transient identifiers of the M blocks of the first channel are obtained according to the M first transient identifiers.
  • One grouping information, the second grouping information of the M blocks of the second channel can be obtained in the same way, when the first grouping information and the second grouping information meet the preset conditions, according to the first grouping information and the second grouping information to obtain The first adjustment group information and the second adjustment group information;
  • the first frequency spectrum to be encoded can be obtained, and the second frequency spectrum to be encoded can also be obtained, and finally use
  • the encoding neural network encodes the first spectrum to be encoded and the second spectrum to be encoded to obtain a spectrum encoding result, which can be carried by the code stream.
  • the grouping information of M blocks of each channel is obtained according to the M transient identifiers of each channel of the current frame, and each channel is obtained when the grouping information of M blocks of each channel satisfies the preset condition.
  • the adjustment grouping information of M blocks of the channel, the spectrum to be encoded is obtained according to the adjustment grouping information of the M blocks of each channel and the frequency spectrum of the M blocks of each channel, so that it is possible to realize the encoding for blocks with different transient identities Grouping, alignment and encoding to improve the encoding quality of multi-channel signals.
  • An embodiment of the present application also provides a multi-channel signal decoding method, which can be executed by a terminal device, for example, the terminal device can be a multi-channel signal decoding device (hereinafter referred to as a decoding end or decoder, for example, the The decoding end can be an AI decoder).
  • the method performed on the decoding end in the embodiment of the present application mainly includes:
  • the decoding end receives the code stream sent by the encoding end, and the encoding end carries the encoding result of group information in the code stream, and the decoding end analyzes the code stream to obtain the first decoded group information of the M blocks of the current frame of the audio signal.
  • the decoding end may determine M first decoding transient identifiers of the M blocks according to the first decoding group information of the M blocks.
  • the first decoded group information may include: group number and group flag information.
  • the grouping information may include grouping flag information. For details, refer to the description of the foregoing embodiments at the encoding end.
  • the first decoded group information is the group information obtained by decoding the code stream at the decoder.
  • the first decoded group information obtained by the decoder corresponds to the aforementioned The first adjustment group information of .
  • the first decoding group information is used to indicate the first decoding transient identifiers of the M blocks of the first channel, and the first decoding transient identifiers correspond to the first transient identifiers or the first adjusted transient identifiers of the encoding end.
  • the second decoded group information obtained in the subsequent steps corresponds to the aforementioned second adjusted group information
  • the second decoded transient identifier corresponds to the second transient identifier or the second adjusted transient identifier of the encoder.
  • the decoder uses the decoding neural network to decode the code stream, and obtains the decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel.
  • the decoded spectrum of the M blocks of the channel and the decoded spectrum of the M blocks of the second channel are grouped and arranged, and then encoded, and the encoding end carries the spectral encoding result in the code stream, and the M blocks of the first channel
  • the decoded spectrum and the decoded spectrum of the M blocks of the second channel correspond to the spectrum of the M blocks of the first channel and the spectrum of the M blocks of the second channel arranged in groups at the encoding end, wherein the decoding neural network of the decoding end
  • the execution process of the network is opposite to that of the encoding neural network at the encoding end.
  • the first decoded spectrum of the M blocks of the first channel corresponds to the spectrum of the M blocks of the first channel arranged in groups at the encoding end, so the first reconstruction of the first channel can be obtained through the first decoded group information Signal.
  • decoding and reconstruction can be performed according to blocks with different transient identifiers in the multi-channel signal, so the reconstruction effect of the multi-channel signal can be improved.
  • the second decoded spectrum of the M blocks of the second channel corresponds to the spectrum of the M blocks of the second channel arranged in groups at the encoding end, so the second reconstruction of the second channel can be obtained through the second decoded group information Signal.
  • decoding and reconstruction can be performed according to blocks with different transient identifiers in the multi-channel signal, so the reconstruction effect of the multi-channel signal can be improved.
  • the first reconstructed signal of the first channel is obtained according to the first decoded group information and the decoded spectrum of the M blocks of the first channel, including:
  • the decoding spectrum of the M blocks of the first channel is subjected to an inverse grouping process to obtain the first channel
  • Obtaining the second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel includes:
  • the decoding spectrum of the M blocks of the second channel is subjected to an inverse grouping process to obtain the second channel
  • the second reconstructed signal of the second channel is obtained according to the inverse packet permutation processed frequency spectrum of the M blocks of the second channel.
  • the decoding end obtains the first decoded packet information of M blocks, and the decoding end also obtains the decoded spectrum of the M blocks of the first channel through the code stream.
  • the decoded spectrum of the M blocks of the first channel is grouped and arranged, and the decoding end needs to perform the reverse process of the encoding end. Therefore, according to the first decoding grouping information of the M blocks, the M blocks of the first channel.
  • the decoded frequency spectrum is subjected to inverse packet permutation processing to obtain the spectrum of the reverse group permutation processing of the M blocks of the first channel, and the inverse group permutation processing is inverse to the packet permutation processing at the encoding end.
  • the encoding end After the encoding end obtains the frequency spectrum of the inverse grouping arrangement processing of the M blocks of the first channel, it can transform the frequency domain to the time domain on the spectrum of the inverse grouping arrangement processing of the M blocks of the first channel, so that A first reconstructed signal of the first channel is obtained.
  • the implementation manner of the decoding process of the second audio channel is similar to the aforementioned decoding process of the first audio channel, and will not be repeated here.
  • step 504 obtains the first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel, including:
  • De-interleaving within the group is performed on the decoded spectrum of the M blocks of the first channel to obtain the de-interleaved spectrum within the group of the M blocks of the first channel;
  • the intra-group de-interleaving performed by the decoding end is the inverse process of the intra-group interleaving at the encoding end, which will not be described in detail here.
  • Step 505 obtains the second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel, including:
  • the second reconstructed signal is obtained according to the deinterleaved frequency spectrum of the M blocks of the second channel after intra-group deinterleaving.
  • the number of transient blocks indicated by M first decoding transient identifiers is P
  • the first reconstructed signal of the first channel is obtained according to the inverse packet permutation processed frequency spectrum of the M blocks of the first channel.
  • performing deinterleaving processing on the frequency spectrum of P blocks includes performing deinterleaving processing on the frequency spectrum of the P blocks as a whole; similarly, performing deinterleaving processing on the frequency spectrum of Q blocks includes deinterleaving the frequency spectrum of the Q blocks The frequency spectrum is deinterleaved as a whole.
  • the encoding end can perform interleaving processing according to the transient group and the non-transient group respectively, so as to obtain the interleaved frequency spectrum of P blocks and the interleaved frequency spectrum of Q blocks.
  • the interleaved frequency spectrum of the P blocks and the interleaved frequency spectrum of the Q blocks can be used as input data of the encoding neural network.
  • the coding side information can also be reduced and the coding efficiency can be improved. Since the encoding end performs intra-group interleaving, the decoding end needs to perform a corresponding inverse process, that is, the decoding end can perform deinterleaving processing.
  • the number of transient blocks indicated by M first decoding transient identifiers is P
  • the decoding spectrum of the M blocks of the first channel is subjected to inverse grouping and arrangement processing, including:
  • the indexes of the M blocks are continuous, for example, from 0 to M-1. After the encoding end performs group arrangement, the indexes of the M blocks are no longer continuous.
  • the decoding end can obtain, according to the first decoded grouping information of the M blocks, indexes of P blocks among the reconstructed grouped and arranged M blocks, and indexes of Q blocks among the reconstructed grouped and arranged M blocks, Through reverse grouping and permutation processing, it can be recovered that the indexes of the M blocks are still continuous.
  • the method performed by the decoding end further includes:
  • the step of obtaining the first decoded group information of the M blocks of the first channel of the current frame from the code stream is performed.
  • the foregoing encoding scheme can be implemented only when the first window type and the second window type of the current frame are both short window types, so as to implement encoding when the multi-channel signal is a transient signal.
  • the decoding end performs the reverse process of the encoding end, so the decoding end can also first determine the first window type and the second window type of the current frame.
  • the window type can be a short window type or a non-short window type.
  • the window type of the current frame is obtained from the stream, and the current frame includes the first channel and the second channel, then the first window type of the first channel and the second window type of the second channel can be obtained.
  • the short window may also be called a short frame
  • the non-short window may also be called a non-short frame.
  • the first decoding group information includes: the first decoding group number or the first decoding group number identification of the M blocks of the first channel, and the first decoding group number identification is used to indicate the first decoding The number of groups, when the number of first decoded groups is greater than 1, the first decoded group information also includes: M first decoded transient identifiers; or, the first decoded group information includes: M first decoded transient identifiers;
  • the second decoded group information includes: the second decoded group quantity or the second decoded group quantity identifier of the M blocks of the second channel, the second decoded group quantity identifier is used to indicate the second decoded group quantity, when the second decoded group quantity When it is greater than 1, the second decoded group information further includes: M second decoded transient identifiers; or, the second decoded group information includes: M second decoded transient identifiers.
  • the encoding end carries the encoding result of the group information in the code stream, and the encoding result of the group information includes the first adjustment group information and the second adjustment group information
  • the decoding end can obtain the first decoding group information and the second decoding group information by decoding the code stream Information, the first decoding group information corresponds to the first adjustment group information of the encoding end, and the second decoding group information corresponds to the second adjustment group information of the encoding end.
  • the first decoding group information includes: the first decoding group quantity or the first decoding group quantity identification of the M blocks of the first channel, the first decoding group quantity indicates the group quantity or the adjustment group quantity of the first channel, the first A decoded group number identifier is used to indicate the group number of the first channel or the number of adjusted groups.
  • the M first decoded transient identifiers are used to indicate the transient identifiers or adjusted transient identifiers respectively corresponding to the M blocks of the first sound channel.
  • the description of the second decoded group information is similar to that of the first decoded group information, and will not be repeated here.
  • the first decoded group information of the M blocks of the first channel of the current frame of the multi-channel signal is obtained from the code stream, and the first decoded group information is used to indicate the first
  • the first decoded transient identifiers of the M blocks of the first channel are obtained from the code stream in the same way as the second decoded grouping information of the M blocks of the second channel, and the code stream is decoded by a decoding neural network to obtain The decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel; the first decoded spectrum of the first channel is obtained by using the first decoded packet information and the decoded spectrum of the M blocks of the first channel For the reconstructed signal, similarly, the second reconstructed signal of the second channel is obtained by using the second decoded group information and the decoded spectrum of the M blocks of the second channel.
  • the first decoded spectrum of the M blocks of the first channel obtained when decoding the code stream and the second decoded spectrum of the M blocks of the second channel respectively correspond to the M blocks of the first channel after grouping and arrangement at the encoding end
  • the frequency spectrum and the frequency spectrum of the M blocks of the second channel after grouping so the first reconstructed signal of the first channel and the second Reconstruct the signal.
  • decoding and reconstruction can be performed according to blocks with different transient identifiers in the multi-channel signal, so the reconstruction effect of the multi-channel signal can be improved.
  • FIG. 6 it is a schematic diagram of the system architecture applied in the field of radio and television provided by the embodiment of this application. 3D sound codec.
  • the 3D sound signal produced by the 3D sound of the live broadcast program is obtained by applying the 3D sound encoding of the embodiment of the application to obtain a code stream, which is transmitted to the user side through the radio and television network, and is decoded by the 3D sound decoder in the set-top box to reconstruct the 3D sound
  • the signal is played back by the loudspeaker group.
  • the 3D sound signal produced by the 3D sound of the post-program is obtained through the 3D sound encoding of the embodiment of the application to obtain the code stream, and is transmitted to the user side through the broadcasting network or the Internet, and the 3D sound signal in the network receiver or mobile terminal
  • the decoder decodes and reconstructs the three-dimensional sound signal, which is played back by the speaker group or the earphone.
  • the embodiment of the present application provides an audio codec, and the audio codec may specifically include a wireless access network, a media gateway of a core network, a transcoding device, a media resource server, etc., a mobile terminal, a fixed network terminal, and the like. It can also be applied to audio codecs in broadcast TV or terminal media playback, and VR streaming services.
  • the encoder proposed by the embodiment of the present application is used to perform the following multi-channel signal encoding method, including:
  • a specific implementation includes the following three steps:
  • the audio signal of the current frame is a time-domain signal of L points.
  • Transient detection is performed according to the audio signal of the current frame to determine the transient information of the current frame.
  • the transient information of the current frame may include one or more of an identifier of whether the current frame is a transient signal, a location where the transient occurs in the current frame, and a parameter characterizing the degree of the transient.
  • the transient degree may be the level of the transient energy, or the ratio of the signal energy at the position where the transient occurs to the signal energy at the adjacent non-transient position.
  • the window type of the current frame is a short window.
  • the window type of the current frame is other window types excluding the short window.
  • the embodiment of the present application does not limit other window types, for example, other window types may include: long windows, cut-in windows, cut-out windows, and the like.
  • window type of the current frame is a short window
  • the audio signal of the current frame is subjected to short-window windowing processing and time-frequency transformation to obtain MDCT spectra of M blocks.
  • the window type of the current frame is a short window
  • M overlapping short window window functions are used for windowing processing to obtain audio signals of M blocks after windowing, where M is a positive integer greater than or equal to 2.
  • the window length of the short window window function is 2L/M, where L is the frame length of the current frame, and the splicing length is L/M.
  • M is equal to 8
  • L is equal to 1024
  • the window length of the short window function is 256 samples
  • the splicing length is 128 samples.
  • the audio signals of the M blocks after windowing are respectively subjected to time-frequency transformation to obtain the MDCT spectrum of the M blocks of the current frame.
  • the length of the windowed audio signal of the current block is 256 samples.
  • 128 MDCT coefficients are obtained, which is the MDCT spectrum of the current block.
  • step S13 obtains the number of groups and the grouping flag information of the current frame, in an implementation manner: first, the MDCT spectrum of M blocks is interleaved to obtain the MDCT spectrum of M blocks after interleaving; next, the The MDCT spectrum of the interleaved M blocks is encoded and preprocessed to obtain the preprocessed MDCT spectrum; then the preprocessed MDCT spectrum is deinterleaved to obtain the MDCT spectrum of the deinterleaved M blocks; finally, according to the solution The MDCT spectrum of the M blocks processed by the interleaving process determines the number of groups and group flag information of the current frame.
  • Interleaving the MDCT spectrum of M blocks is to interleave the M MDCT spectrum with length L/M into MDCT spectrum with length L.
  • the spectral coefficients are arranged in order from 0 to M-1 according to the serial number of the block where they are located, and the value of i starts from 0 to L/M-1.
  • the encoding preprocessing operation may include: frequency domain noise shaping (frequency domain noise shaping, FDNS), time domain noise shaping (temporal noise shaping, TNS) and bandwidth extension (bandwidth extension, BWE) and other processing, which is not limited here.
  • frequency domain noise shaping frequency domain noise shaping, FDNS
  • time domain noise shaping temporary noise shaping, TNS
  • bandwidth extension bandwidth extension
  • the deinterleaving process is the inverse process of the interleaving process.
  • the length of the preprocessed MDCT spectrum is L
  • the preprocessed MDCT spectrum of length L is divided into M MDCT spectra of length L/M, and the MDCT spectrum in each block is arranged from small to large frequency points, and the solution can be obtained
  • the MDCT spectrum of the M blocks processed by interleaving Preprocessing the interleaved frequency spectrum can reduce coding side information, thereby reducing the bit occupation of the side information and improving coding efficiency.
  • the specific method includes the following three steps:
  • the MDCT spectral energy of each block is calculated, which is denoted as enerMdct[8].
  • 8 is the value of M
  • 128 represents the number of MDCT coefficients in one block.
  • Method 1 directly calculate the average value of MDCT spectrum energy of M blocks, that is, the average value of enerMdct[8], and use it as the average value of MDCT spectrum energy avgEner.
  • Method 2 Determine the block with the largest MDCT spectral energy among the M blocks; calculate the average value of the MDCT spectral energy of the other M-1 blocks except the block with the largest energy, and use it as the average value avgEner of the MDCT spectral energy. Or calculate the average value of the MDCT spectrum energy of other blocks except several blocks with the largest energy, and use it as the average value avgEner of the MDCT spectrum energy.
  • the MDCT spectral energy of the M blocks and the average value of the MDCT spectral energy determine the number of groups and the grouping flag information of the current frame, and write them into the code stream.
  • the current block may be: comparing the MDCT spectrum energy of each block with the average value of the MDCT spectrum energy. If the MDCT spectrum energy of the current block is greater than K times of the average value of the MDCT spectrum energy, the current block is a transient block, and the transient state flag of the current block is 0; otherwise, the current block is a non-transient block, and the non-transient state of the current block is The status flag is 1.
  • M blocks are grouped, and the number of groups and grouping flag information are determined. Among them, those with the same transient identification value are a group, M blocks are divided into N groups, and N is the number of groups.
  • the group flag information is information composed of the transient flag value of each block in the M blocks.
  • transient blocks form transient groups and non-transient blocks form non-transient groups.
  • the transient identifiers of each block are not completely the same, the number of groups numGroups of the current frame is 2, otherwise it is 1.
  • the group quantity can be indicated by the group quantity indicator. For example, if the number of groups is marked as 1, it means that the number of groups in the current frame is 2; if the number of groups is marked as 0, it means that the number of groups in the current frame is 1.
  • step S13 obtains the number of groups and grouping flag information
  • another implementation is: do not perform interleaving and deinterleaving processing on the MDCT spectrum of M blocks, and directly determine the number of groups and grouping flags of the current frame according to the MDCT spectrum of M blocks Information, encoding the group number and group flag information of the current frame and writing the coding result into the code stream.
  • Determining the number of groups and group flag information of the current frame according to the MDCT spectrum of M blocks is similar to determining the number of groups and group flag information of the current frame according to the MDCT spectrum of M blocks after deinterleaving, and will not be repeated here.
  • non-transient group may be further divided into two or more other groups, which is not limited in this embodiment of the present application.
  • a non-transient group can be divided into a harmonic group and a non-harmonic group.
  • the MDCT spectrum arranged in groups is the spectrum to be encoded of the current frame.
  • the encoding neural network of the encoder will have a better encoding effect on the spectrum in the front, so adjusting the transient block to the front can ensure the encoding effect of the transient block, thereby retaining more spectral details of the transient block , to improve the encoding quality.
  • the MDCT spectrum arranged in groups is first interleaved within the group to obtain the MDCT spectrum interleaved within the group. Then, the encoding neural network is used to encode the interleaved MDCT spectrum within the group.
  • the intra-group interleaving process is similar to the aforementioned interleaving process performed on the MDCT spectrum of M blocks before obtaining the group number and group flag information, except that the object of interleaving is the MDCT spectrum belonging to the same group. For example, the interleaving process is performed on the MDCT spectrum blocks belonging to the transient group.
  • the MDCT spectrum blocks belonging to the non-transient group are interleaved.
  • the encoding neural network processing is pre-trained, and the embodiment of the present application does not limit the specific network structure and training method of the encoding neural network.
  • the encoding neural network can choose fully connected network or convolutional neural network (convolutional neural networks, CNN).
  • the decoding process corresponding to the encoding end includes:
  • window type of the current frame is a short window, decode according to the received code stream to obtain the group number and group flag information.
  • the identification information of the number of packets in the code stream can be analyzed, and the number of packets of the current frame can be determined according to the identification information of the number of packets. For example, if the number of groups is marked as 1, it means that the number of groups in the current frame is 2; if the number of groups is marked as 0, it means that the number of groups in the current frame is 1.
  • Decoding the received code stream to obtain group flag information may be: reading M-bit group flag information from the code stream. Whether the i-th block is a transient block can be determined according to the value of the i-th bit of the group flag information. If the value of the i-th bit is 0, it means that the i-th block is a transient block; if the value of the i-th bit is 1, it means that the i-th block is a non-transient block.
  • the decoding process at the decoding end corresponds to the encoding process at the encoding end. Specific steps include:
  • the decoded MDCT spectrum is obtained by using the decoding neural network.
  • the decoded MDCT spectrum belonging to the same group can be determined.
  • Intra-group deinterleaving processing is performed on the MDCT spectrum belonging to the same group to obtain the MDCT spectrum processed by intragroup deinterleaving.
  • the de-interleaving process within the group is the same as the de-interleaving process of the MDCT spectrum of the interleaved M blocks before the coder obtains the group number and group flag information.
  • the inverse packet permutation processing at the decoding end is the inverse process of the packet permutation processing at the encoding end.
  • the MDCT spectrum processed by intra-group deinterleaving is composed of M MDCT spectrum blocks of L/M points.
  • the block index idx0(i) of the i-th transient block is the block index corresponding to the block whose i-th flag value is 0 in the group flag information, and i starts from 0.
  • the number of transient blocks is the number of bits whose flag value is 0 in the packet flag information, which is denoted as num0.
  • the non-transient blocks need to be processed.
  • MDCT spectrum of idx1(j) blocks is the block index corresponding to the block whose jth flag value is 1 in the group flag information, and j starts from 0.
  • a specific implementation method is: firstly, perform interleaving processing on the MDCT spectrum of the M blocks processed by the inverse grouping permutation process to obtain the MDCT of the interleaved process of the M blocks Spectrum; Next, post-decoding processing is performed on the interleaved MDCT spectrum of M blocks.
  • post-decoding processing can include inverse TNS, inverse FDNS, BWE processing, etc., and post-decoding processing follows the encoding preprocessing method of the encoding end one by one.
  • the MDCT spectrum processed after decoding is obtained; then the MDCT spectrum processed after decoding is deinterleaved to obtain the MDCT spectrum of the deinterleaved process of M blocks; finally, the MDCT spectrum of the deinterleaved process of M blocks is respectively performed Transform from the frequency domain to the time domain, and after de-windowing and splicing and adding processing, the reconstructed audio signal is obtained.
  • another specific implementation method to obtain the reconstructed audio signal is: respectively transform the MDCT spectrum of M blocks from the frequency domain to the time domain, and perform de-windowing and splicing phase After processing, the reconstructed audio signal is obtained.
  • the encoding method of the multi-channel signal performed by the encoding end includes:
  • the frame length is 1024
  • the input signal of the current frame is an audio signal of 1024 points.
  • the input signal of the current frame is divided into L blocks, and the signal energy in each block is calculated. If the signal energy in adjacent blocks changes suddenly, the current frame is considered as a transient signal.
  • the window type of the current frame is a short window, otherwise it is a long window.
  • the window type of the current frame can also add a cut-in window and a cut-out window.
  • the frame number of the current frame be i, and determine the window type of the current frame according to the transient detection results of frames i-1 and i-2 and the transient detection results of the current frame.
  • the window type of frame i is long window.
  • the window type of frame i is cut-in window.
  • the window type of the i-th frame is a cut-out window.
  • the window type of frame i is short window.
  • windowing and MDCT transformation are performed respectively: for long window, cut-in window and cut-out window, if the signal length after windowing is 2048, then 1024 MDCT coefficients are obtained; For the short window, add 8 concatenated short windows with a length of 256, and each short window obtains 128 MDCT coefficients. The 128-point MDCT coefficients of each short window are called a block, and there are 1024 MDCT coefficients in total.
  • the window type of the current frame is a short window, perform interleaving processing on the MDCT spectrum of the current frame to obtain an interleaved MDCT spectrum.
  • the MDCT spectrum of eight blocks is interleaved, that is, eight 128-dimensional MDCT spectrums are interleaved into an MDCT spectrum with a length of 1024.
  • Spectrum form after interleaving can be: block 0 bin 0, block 1 bin 0, block 2 bin 0, ..., block 7 bin 0, block 0 bin 1, block 1, bin 1, block 2 bin 1, ..., block 7 bin 1,....
  • block 0 bin 0 represents the 0th frequency point of the 0th block.
  • Preprocessing may include FDNS, TNS, BWE and other processing.
  • step S35 Perform deinterleaving in the opposite manner to step S35 to obtain 8 blocks of MDCT spectrum, wherein each block has 128 points.
  • the information may include the number of groups numGroups and group indicator information groupIndicator.
  • the specific solution for determining the grouping information may be any one of the aforementioned steps S13 performed by the encoding end. For example, if the MDCT spectral coefficients of 8 blocks in a short frame are mdctSpectrum[8][128], then the MDCT spectral energy of each block is calculated and recorded as enerMdct[8]. Calculate the average value of the MDCT spectrum energy of 8 blocks, which is recorded as avgEner. There are two methods for calculating the average value of the MDCT spectrum energy:
  • Method 1 directly calculate the average value of the MDCT spectrum energy of 8 blocks, that is, the average value of enerMdct[8].
  • Method 2 In order to reduce the influence of the block with the largest energy among the 8 blocks on the calculation of the average value, the energy of the largest block can be removed before calculating the average value.
  • the current block is considered to be a transient block (marked as 0), otherwise the current block is considered to be a non-transient block (marked as 1), all transient blocks State blocks form a transient group, and all non-transient blocks form a non-transient group.
  • the grouping information obtained from the preliminary judgment can be:
  • Block index 0 1 2 3 4 5 6 7.
  • Group indicator information groupIndicator 1 1 1 1 0 0 0 0 1.
  • the number of groups and group flag information need to be written into the code stream and transmitted to the decoding end.
  • the specific scheme of grouping and arranging the MDCT spectrums of the M blocks according to the grouping information may be any one of the aforementioned steps S14 performed by the coding end.
  • step S38 if the grouping information is:
  • Block index 0 1 2 3 4 5 6 7.
  • Group indicator information groupIndicator 1 1 1 1 0 0 0 0 1.
  • Block index 3 4 5 6 0 1 2 7.
  • the spectrum of the 0th block after the arrangement is the spectrum of the 3rd block before the arrangement
  • the spectrum of the 1st block after the arrangement is the spectrum of the 4th block before the arrangement
  • the spectrum of the 2nd block after the arrangement is the 4th block before the arrangement
  • the spectrum of the 5 blocks, the spectrum of the third block after the arrangement is the spectrum of the sixth block before the arrangement
  • the spectrum of the fourth block after the arrangement is the spectrum of the 0th block before the arrangement
  • the spectrum of the fifth block after the arrangement is The spectrum of the first block before the arrangement
  • the spectrum of the sixth block after the arrangement is the spectrum of the second block before the arrangement
  • the spectrum of the seventh block after the arrangement is the spectrum of the seventh block before the arrangement.
  • S310 Perform intra-group spectrum interleaving processing on the group-arranged MDCT spectrum to obtain the intra-group interleaved MDCT spectrum.
  • interleave processing within the group is performed for each group, and the processing method is similar to step S35, except that the interleaving processing is limited to processing the MDCT spectrum belonging to the same group.
  • interleave the transient groups (blocks 3, 4, 5, and 6 before the arrangement, that is, blocks 0, 1, 2, and 3 after the arrangement), and interleave the other Groups (blocks 0, 1, 2, and 7 before the arrangement, that is, blocks 4, 5, 6, and 7 after the arrangement) are interleaved.
  • the embodiment of the present application does not limit the specific method of encoding the MDCT spectrum after intra-group interleaving by using the encoding neural network.
  • the MDCT spectrum after intragroup interleaving is processed by a coded neural network to generate latent variables. Quantify the latent variables to obtain the quantified latent variables. Arithmetic encoding is performed on the quantized latent variables, and the arithmetic encoding result is written into the code stream.
  • the MDCT spectrum of the current frame obtained in step S34 is directly encoded by using an encoding neural network.
  • determine the window function corresponding to the window type perform windowing processing on the audio signal of the current frame, and obtain the signal after windowing processing; when the windows of adjacent frames are overlapping, time-frequency processing is performed on the signal after windowing processing
  • Forward transform such as MDCT transform, obtains the MDCT spectrum of the current frame; encodes the MDCT spectrum of the current frame.
  • the decoding method of the multi-channel signal performed by the decoder includes:
  • the decoding neural network corresponds to the encoding neural network.
  • the specific method of decoding using the decoding neural network perform arithmetic decoding according to the received code stream to obtain quantized latent variables. Dequantize the quantized latent variables to obtain the dequantized latent variables. The dequantized latent variables are taken as input and processed by a decoding neural network to generate a decoded MDCT spectrum.
  • the MDCT spectrum blocks belonging to the same group are determined according to the number of groups and group flag information.
  • the decoded MDCT spectrum is divided into 8 blocks.
  • the number of groups is equal to 2, and the group indicator information groupIndicator is 1 1 1 0 0 0 0 1.
  • the number of bits with a flag value of 0 in the group flag information is 4, so the MDCT spectrum of the first 4 blocks in the decoded MDCT spectrum is a group, which belongs to the transient group and needs to be de-interleaved within the group; If the number of bits is 4, then the MDCT spectrum of the last 4 blocks is a group, which belongs to a non-transient group, and needs to be deinterleaved within the group.
  • the MDCT spectrum of the eight blocks obtained by the intra-group deinterleaving process is the MDCT spectrum of the eight blocks by the intra-group deinterleaving process.
  • the MDCT spectrums processed by deinterleaving in the group are arranged into M block spectrums sorted by time.
  • the MDCT spectrum of the 0th block obtained by deinterleaving within the group is adjusted to the MDCT spectrum of the third block (group indicator information
  • group indicator information The element position index corresponding to the bit with the first flag value of 0 in the group is 3
  • the MDCT spectrum of the first block obtained by the deinterleaving process in the group is adjusted to the MDCT spectrum of the fourth block (the second in the group flag information
  • the element position index corresponding to the bit whose flag value is 0 is 4
  • the MDCT spectrum of the second block obtained by the deinterleaving process in the group is adjusted to the MDCT spectrum of the fifth block (the third flag value in the group flag information is 0
  • the element position index corresponding to the bit of the bit is 5
  • the MDCT spectrum of the 3rd block obtained by the deinterleaving process in the group is adjusted to the MDCT spectrum of the 6th block (the bit corresponding to the fourth
  • the short-frame spectrum form after spectrum grouping is as follows: Block index 3 4 5 6 0 1 2 7.
  • the window type of the current frame is a short window
  • the MDCT spectrum processed by the inverse packet arrangement is interleaved, and the method is the same as before.
  • Post-decoding processing may include BWE inverse processing, TNS inverse processing, FDNS inverse processing and so on.
  • the reconstructed MDCT spectrum includes the MDCT spectrum of M blocks, and the inverse MDCT transform is performed on the MDCT spectrum of each block respectively. After windowing and aliasing and adding are performed on the inversely transformed signal, the reconstructed audio signal of the short frame can be obtained.
  • window type of the current frame is other window types, decode according to the decoding method corresponding to other types of frames to obtain the reconstructed audio signal.
  • the reconstructed MDCT spectrum is obtained by using the decoding neural network.
  • the window type of the current frame is a short window
  • the number of groups and the grouping flag information of the current frame are obtained; according to the number of groups and the grouping flag information of the current frame
  • the frequency spectra of the M blocks of the current frame are grouped and arranged to obtain grouped and arranged audio signals; the grouped and arranged frequency spectra are encoded by using an encoding neural network.
  • the MDCT spectrum containing the transient feature can be adjusted to a position with higher coding importance, so that the reconstructed audio signal can better preserve the transient state after encoding and decoding with the neural network feature.
  • the embodiment of the present application can also be used for stereo coding, the difference is that: firstly, according to steps S31-310 of the coding end in the previous embodiment, the left and right channels of the stereo are respectively processed and obtained after the intra-group interleaving MDCT of the left channel Spectrum and intra-interleaved MDCT spectrum of the right channel. Then step S311 becomes: use the encoding neural network to encode the MDCT spectrum after intra-group interleaving of the left channel and the MDCT spectrum after intra-group interleaving of the right channel.
  • the input of the encoding neural network is no longer the interleaved MDCT spectrum of the mono channel, but the MDCT spectrum of the left channel and the right MDCT spectrum after intra-group interleaving of channels.
  • the coding neural network may be a CNN network, and the MDCT spectrum after intra-group interleaving of the left channel and the MDCT spectrum after intra-group interleaving of the right channel are used as the input of the two channels of the CNN network.
  • the process performed by the decoder includes:
  • the window type of the left channel of the current frame, the number of groups and the group flag information are obtained.
  • the window type of the right channel of the current frame the number of groups and the group flag information are obtained.
  • the decoding neural network is used to obtain the decoded stereo MDCT spectrum.
  • the process is performed according to the steps of monophonic decoding on the decoding side of Embodiment 1, and the reconstructed left channel signal is obtained. .
  • the process is performed according to the steps of monophonic decoding on the decoding side of Embodiment 1, and the reconstructed right channel signal is obtained. .
  • the window type of the current frame is a short window
  • the number of groups and the grouping flag information of the current frame are obtained; according to the number of groups and the grouping flag information of the current frame
  • the frequency spectra of the M blocks of the current frame are grouped and arranged to obtain grouped and arranged audio signals; the grouped and arranged frequency spectra are encoded by using an encoding neural network.
  • the MDCT spectrum containing the transient feature can be adjusted to a position with higher coding importance, so that the reconstructed audio signal can better preserve the transient state after encoding and decoding with the neural network feature.
  • the embodiment of the present application can also be used for stereo coding.
  • the encoding process for adjusting the grouping information of the left and right channels in the encoder proposed by the embodiment of the present application includes:
  • the stereo signal is divided into frames to obtain the stereo signal of the current frame.
  • the stereo signal of the current frame includes the left channel signal of the current frame and the right channel signal of the current frame.
  • the left channel signal of the current frame as the audio signal of the current frame, and determine the window type of the left channel signal of the current frame according to the method in the encoding end steps S11 and S12 shown in Figure 7; if the left channel signal of the current frame
  • the window type of the channel signal is a short frame, and the left channel signal of the current frame is subjected to short frame windowing processing and time-frequency conversion to obtain the left channel spectrum of M blocks.
  • the right channel signal of the current frame is used as the audio signal of the current frame, and the window type of the right channel signal of the current frame is determined according to the method in the steps S11 and S12 of the encoding end shown in FIG. 7; if the current frame The window type of the right channel signal is a short frame, and the right channel signal of the current frame is subjected to short frame windowing processing and time-frequency conversion to obtain the right channel spectrum of M blocks.
  • the method in step S13 of the encoding end shown in FIG. 7 obtains the group number and group flag information of the left channel.
  • the method in step S13 of the encoding end shown in FIG. 7 obtains the group number and group flag information of the right channel.
  • S54 Determine whether to adjust the grouping flag information according to the grouping flag information of the left and right channels, and if adjustment is required, determine the adjusted grouping flags of the left and right channels according to the grouping flag information of the left and right channels information.
  • the grouping flag information of the left channel is adjusted according to the grouping flag information of the left channel and the grouping flag information of the right channel to obtain the adjusted grouping flag information; otherwise, the left and right
  • the adjustment process is not performed, and the grouping flag information of the left and right channels is directly used as the left, right, and left channel.
  • Group flag information for right channel adjustment is not performed.
  • Completely consistent means that each flag value is equal, and inconsistency includes incomplete consistency or complete inconsistency, which means that some are equal, some are not equal or all are not equal.
  • the comparison is performed according to the corresponding position. For example, 1 1 1 0 0 0 1 1 and 1 1 1 0 0 0 0 1 are not completely consistent.
  • 1 1 1 0 0 0 0 1 1 and 1 1 1 0 0 0 1 1 represent complete agreement
  • 1 1 1 0 0 0 1 1 and 0 0 0 1 1 1 0 0 represent complete inconsistency.
  • the specific method of adjustment can be to carry out the AND calculation of the grouping flag information of the left channel and the grouping flag information of the right channel according to the corresponding bits, and use the result as the value of the corresponding bit in the grouping flag information adjusted by the left and right channels .
  • Another implementation manner is: firstly, according to the group numbers of the left and right channels, it is judged whether to compare the group flag information of the left and right channels. If the group numbers of the left and right channels are both equal to 2, the group flag information of the left and right channels is further compared to determine whether to adjust the group flag information; otherwise, no group flag information adjustment is required.
  • the group flag information adjusted by the left and right channels is encoded and written into the code stream, and then transmitted to the decoding end.
  • the left channel spectrum of the M blocks and the right channel spectrum of the M blocks are respectively grouped and arranged to obtain the left channel spectrum and the right channel spectrum arranged in groups.
  • One method is: according to the adjusted group flag information, the group-arranged left channel spectrum is first interleaved within the group to obtain the group-interleaved left channel spectrum. Similarly, according to the adjusted group flag information, the group-arranged right channel spectrum is firstly interleaved within the group to obtain the group-interleaved right channel spectrum. Then use the encoding neural network to encode the stereo frequency spectrum interleaved in the group and write it into the code stream.
  • the coding neural network used for stereo coding may be a CNN network, wherein the left channel spectrum and the right channel spectrum are respectively used as input signals of a channel in the CNN network.
  • the decoding process corresponding to the encoding end shown in Figure 11 includes the following steps:
  • the window types of the left and right channels of the current frame are obtained. If the window type of the left channel of the current frame is a short frame, then decode according to the received code stream to obtain the group quantity and group flag information of the left channel. If the window type of the right channel of the current frame is a short frame, the received code stream is decoded to obtain the group number and group flag information of the right channel.
  • the decoding end corresponds to the encoding end. Specific steps include:
  • the decoded spectrum of the left channel and the decoded spectrum of the right channel are obtained by using the decoding neural network.
  • the spectrum belonging to the same group in the decoded spectrum of the left channel can be determined.
  • Intra-group deinterleaving processing is performed on the frequency spectrum belonging to the same group to obtain the left channel frequency spectrum after the intragroup deinterleaving processing.
  • the spectrum belonging to the same group in the decoded spectrum of the right channel can be determined.
  • Intra-group de-interleaving processing is performed on the frequency spectrum belonging to the same group, and the right channel frequency spectrum after the intra-group de-interleaving processing is obtained.
  • the deinterleaving process is the same as the deinterleaving process on the encoding side.
  • the left channel spectrum after the intra-group deinterleaving process is subjected to inverse group arrangement processing to obtain the left channel spectrum after the inverse group arrangement process.
  • the right channel spectrum after the intra-group deinterleaving process is subjected to inverse group arrangement processing to obtain the right channel spectrum after the inverse group arrangement process.
  • the specific method of inverse packet permutation processing is the inverse process of the packet permutation in step S55 at the encoding end shown in FIG. 11 , and will not be described in detail here.
  • a reconstructed right channel signal is obtained.
  • the specific method for obtaining the reconstructed stereo signal from the spectrum of the left and right channels is the inverse process of the encoding in step S56 at the encoding end shown in FIG. 11 , and will not be described in detail here.
  • the embodiment of the present application also includes a solution of grouping and adjusting the left and right channels of the stereo signal.
  • the encoding method is as shown in Figure 13:
  • the stereo signal of the current frame includes the left channel signal of the current frame and the right channel signal of the current frame.
  • the specific method of the transient detection of the left and right channels is the same as the step S12 shown in FIG. 7 above.
  • the method for determining the window type according to the transient state detection result is the same as the step S13 shown in FIG. 7 above.
  • the window type of the left channel signal of the current frame is a short frame
  • the MDCT spectrum of the left channel of the current frame is interleaved to obtain the MDCT spectrum of the left channel after interleaving. Coding and preprocessing are performed on the interleaved MDCT spectrum of the left channel to obtain the preprocessed MDCT spectrum of the left channel.
  • Preprocessing may include FDNS, TNS, BWE and other processing. Perform deinterleaving processing on the preprocessed left channel MDCT spectrum to obtain the left channel MDCT spectrum of M blocks.
  • the window type of the right channel signal of the current frame is a short frame
  • the MDCT spectrum of the right channel of the current frame is interleaved to obtain the MDCT spectrum of the right channel after interleaving. Coding and preprocessing are performed on the interleaved right channel MDCT spectrum to obtain the preprocessed right channel MDCT spectrum. Preprocessing may include FDNS, TNS, BWE and other processing. Perform deinterleaving processing on the preprocessed right channel MDCT spectrum to obtain the right channel MDCT spectrum of M blocks.
  • the specific method for obtaining the group number and group flag information is the same as the step S18 shown in FIG. 7 above.
  • the specific method for obtaining the group number and group flag information is the same as the step S18 shown in FIG. 7 above.
  • S78 Determine whether to adjust the grouping flag information according to the grouping flag information of the left and right channels, and if adjustment is required, determine the adjusted grouping flags of the left and right channels according to the grouping flag information of the left and right channels information.
  • Case 1 If the grouping flag information of the left and right channels indicates that the positions of the spectrum blocks contained in the transient groups in the left and right channels are exactly the same, then the grouping flag information of the left and right channels is not adjusted. That is, the left channel transient group contains the same number of blocks as the right channel transient group, and the left channel transient group contains the same blocks as the right channel transient group , the group flag information of the left and right channels will not be adjusted.
  • Group flag information of the left channel 1 1 1 1 1 1 1 0 0.
  • Group flag information of the right channel 1 1 1 1 1 1 1 0 0.
  • the above grouping information indicates that the positions of the spectrum blocks contained in the transient groups of the left and right channels completely overlap, and in this case, no adjustment is required for the grouping information of the left and right channels.
  • Case 2 If the number of blocks contained in the transient group of the left channel is the same as the number of blocks contained in the transient group of the right channel, the grouping flag information of the left and right channels is not adjusted. That is, the number of blocks contained in the left channel transient group is the same as the number of blocks contained in the right channel transient group, and the positions of the blocks contained in the left channel transient group are inconsistent with the positions of the blocks contained in the right channel transient group, Then the group flag information of the left and right channels is not adjusted.
  • Group flag information of the left channel 0 0 0 1 1 1 1 1.
  • Group flag information of the right channel 1 1 1 1 1 1 0 0 0.
  • the above grouping information shows that the number of blocks contained in the transient group of the left and right channels is the same, but the positions of the blocks contained in the transient group of the left channel are inconsistent with the positions of the blocks contained in the transient group of the right channel. In this case, it is not necessary to Make any adjustments to the left and right channel group flag information.
  • the number of transient blocks contained in the left channel transient group is not the same as the number of transient blocks contained in the right channel transient group, then at least one of the left and right channels needs to be The group flag information of the channel is adjusted.
  • the grouping flag information of one of the left and right channels is adjusted, and in case 4, the grouping flag information of one of the left and right channels is adjusted or the grouping flag information of two channels is adjusted. Group flag information are adjusted.
  • Case 3 If the group flag information of the left and right channels indicates that the number of blocks contained in the transient group of the left channel is different from the number of blocks contained in the transient group of the right channel, and the number of blocks contained in the transient group in the left and right channels If the positions are completely different, adjust the grouping flag information of the channel whose transient group contains a small number of blocks, so as to ensure that the transient groups of the left and right channels contain the same number of blocks.
  • the grouping flag information of the left channel is adjusted so that the number of blocks in the transient group of the left channel is the same as the number of blocks in the transient group of the right channel, for example, the left channel sequence number can be 3 (serial number The transient identifier of the block starting from 0) is changed to transient, and the adjusted grouping information is as follows:
  • Group indicator information of the left channel groupIndicator_L 0 0 0 0 1 1 1 1.
  • the group indicator information of the right channel groupIndicator_R 1 1 1 1 1 0 0 0 0.
  • Case 4 If the group flag information of the left and right channels indicates that the number of blocks contained in the transient group of the left channel is different from the number of blocks contained in the transient group of the right channel, and the transient groups contained in the left and right channels If the positions of the blocks are not completely the same, that is, the positions of the spectrum blocks contained in the transient groups of the left and right channels are only partly different, then grouping information adjustment is required.
  • the adjustment method may be to combine the transient groups of the left and right channels, that is, to expand the range of the transient groups.
  • serial numbers of the group flag information of the left and right channels start from 0, and the group information of the right channel needs to be adjusted:
  • the group indicator information of the left channel groupIndicator_L 1 1 1 1 0 0 0 0 1.
  • Group indicator information of the right channel groupIndicator_R 1 1 1 1 1 0 0 0 1.
  • the group indicator information of the left channel groupIndicator_L 1 1 1 1 0 0 0 0 1.
  • the group indicator information of the right channel groupIndicator_R 1 1 1 0 0 0 0 1.
  • the adjusted group flag information of the left and right channels is encoded and written into the code stream, and then transmitted to the decoding end.
  • the grouping information of the left and right channels needs to be adjusted:
  • the group indicator information of the left channel groupIndicator_L 1 1 0 0 0 0 11.
  • Group indicator information of the right channel groupIndicator_R 1 1 1 1 1 0 0 0 1.
  • the group indicator information of the left channel groupIndicator_L 1 1 0 0 0 0 0 1.
  • the group indicator information of the right channel groupIndicator_R 1 1 0 0 0 0 0 1.
  • the specific method of group arrangement processing is the same as that in step S14 shown in FIG. 7 above.
  • the left channel spectrum of the M blocks and the right channel spectrum of the M blocks are respectively grouped and arranged to obtain the left channel spectrum and the right channel spectrum arranged in groups.
  • One method is: according to the adjusted group flag information, the group-arranged left channel spectrum is first interleaved within the group to obtain the group-interleaved left channel spectrum. Similarly, according to the adjusted group flag information, the group-arranged right channel spectrum is first interleaved within the group to obtain the group-interleaved right channel spectrum. Then, the encoded neural network is used to encode the stereo spectrum interleaved within the group.
  • the coding neural network used for stereo coding may be a CNN network, wherein the left channel spectrum and the right channel spectrum are respectively used as input signals of a channel in the CNN network.
  • the decoding method is shown in Figure 14, and mainly includes the following steps:
  • the spectrum belonging to the same group in the decoded spectrum of the left channel can be determined.
  • Intra-group deinterleaving processing is performed on the frequency spectrum belonging to the same group to obtain the left channel frequency spectrum after the intragroup deinterleaving processing.
  • S87 Perform intra-group deinterleaving processing on the decoded spectrum of the right channel according to the group quantity and group flag information of the right channel, and obtain the right channel spectrum after intra-group deinterleaving processing.
  • the spectrum belonging to the same group in the decoded spectrum of the right channel can be determined.
  • Intra-group de-interleaving processing is performed on the frequency spectrum belonging to the same group, and the right channel frequency spectrum after the intra-group de-interleaving processing is obtained.
  • the deinterleaving process is the same as the deinterleaving process on the encoding side.
  • the window type of the left channel of the current frame is a short frame
  • the spectrum of the left channel after inverse packet processing is interleaved.
  • the window type of the right channel of the current frame is a short frame, the right channel frequency spectrum after the inverse grouping process is interleaved.
  • Post-decoding processing may include BWE, TNS inverse processing, FDNS inverse processing, and other processing.
  • the grouping flag information is adjusted according to the grouping flag information of the left channel and the grouping flag information of the right channel to obtain the adjusted grouping flag information of the left and right channels;
  • the grouping flag information is for grouping and arranging the left channel spectrum of M blocks and the right channel spectrum of M blocks to obtain the grouped and arranged stereo frequency spectrum.
  • a multi-channel signal encoding device 1500 may include: a transient identification obtaining module 1501, a grouping information obtaining module 1502, a grouping information adjusting module 1503, and a spectrum obtaining module 1504 and encoding module 1505, wherein,
  • the transient identification obtaining module is used to obtain M first transient identifications of the M blocks of the first channel according to the frequency spectrum of the M blocks of the first channel of the current frame of the multi-channel signal to be encoded;
  • the M blocks of the first channel include the first block of the first channel, and the first transient identifier of the first block is used to indicate that the first block is a transient block, or indicate that the first One block is a non-transient block;
  • a grouping information obtaining module configured to obtain first grouping information of M blocks of the first sound channel according to the M first transient identifiers
  • the transient identifier obtaining module is configured to obtain M second transient identifiers of the M blocks of the second channel according to the spectrum of the M blocks of the second channel of the current frame; the second The M blocks of the channel include the second block of the second channel, and the second transient identifier of the second block is used to indicate that the second block is a transient block, or indicate that the second block is non-transient block;
  • the grouping information obtaining module is configured to obtain the second grouping information of the M blocks of the second sound channel according to the M second transient identifiers;
  • a group information adjustment module configured to obtain first adjusted group information and second group information according to the first group information and the second group information when the first group information and the second group information meet preset conditions. Adjustment group information, the first adjustment group information corresponds to the first group information, and the second adjustment group information corresponds to the second group information; wherein, the first adjustment group information corresponds to the first The grouping information is the same and the second adjusted grouping information is obtained based on adjusting the second grouping information; or, the first adjusted grouping information is obtained based on adjusting the first grouping information and the The second adjusted group information is the same as the second group information; or, the first adjusted group information is obtained based on adjusting the first group information and the second adjusted group information is obtained based on the first adjusted group information Obtained by adjusting the two-group information;
  • a spectrum obtaining module configured to obtain a first spectrum to be encoded according to the first adjustment group information and the spectrum of the M blocks of the first channel;
  • the spectrum obtaining module is configured to obtain a second spectrum to be encoded according to the second adjustment group information and the spectrum of the M blocks of the second channel;
  • An encoding module configured to use an encoding neural network to encode the first spectrum to be encoded and the second spectrum to be encoded to obtain a spectrum encoding result; and write the spectrum encoding result into a code stream.
  • an apparatus 1600 for decoding a multi-channel signal may include: a grouping information obtaining module 1601, a decoding module 1602, a frequency spectrum obtaining module 1603 and a reconstructed signal obtaining module 1604, wherein ,
  • the grouping information obtaining module is used to obtain the first decoded grouping information of the M blocks of the first channel of the current frame of the multi-channel signal from the code stream, and the first decoded grouping information is used to indicate that the first audio
  • the first decoded transient identifier of the M blocks of the track
  • the grouping information obtaining module is configured to obtain the second decoding grouping information of the M blocks of the second channel of the current frame from the code stream, the second decoding grouping information is used to indicate the second a second decoded transient identifier of the M blocks of the channel;
  • a decoding module configured to use a decoding neural network to decode the code stream to obtain the decoded spectrum of the M blocks of the first channel and the decoded spectrum of the M blocks of the second channel;
  • a reconstructed signal obtaining module configured to obtain a first reconstructed signal of the first channel according to the first decoded group information and the decoded spectrum of the M blocks of the first channel;
  • the reconstructed signal obtaining module is configured to obtain a second reconstructed signal of the second channel according to the second decoded group information and the decoded spectrum of the M blocks of the second channel.
  • the embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes some or all of the steps described in the above method embodiments.
  • the encoding device 1700 for a multi-channel signal includes:
  • a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (the number of processors 1703 in the multi-channel signal encoding device 1700 can be one or more, one processor is taken as an example in FIG. 17 ).
  • the receiver 1701 , the transmitter 1702 , the processor 1703 and the memory 1704 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 17 .
  • the memory 1704 may include read-only memory and random-access memory, and provides instructions and data to the processor 1703 .
  • a part of the memory 1704 may also include a non-volatile random access memory (non-volatile random access memory, NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1704 stores operating systems and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1703 controls the operation of the device for encoding multi-channel signals, and the processor 1703 may also be called a central processing unit (central processing unit, CPU).
  • the various components of the multi-channel signal encoding device are coupled together through a bus system, wherein the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703 .
  • the processor 1703 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 1703 or instructions in the form of software.
  • the above-mentioned processor 1703 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.
  • the receiver 1701 can be used to receive input digital or character information, and generate signal input related to the setting and function control of the encoding device of the multi-channel signal.
  • the transmitter 1702 can include a display device such as a display screen.
  • the transmitter 1702 can be used for Output digital or character information through an external interface.
  • the processor 1703 is configured to execute the methods performed by the multi-channel signal encoding apparatus shown in FIG. 4 , FIG. 7 , FIG. 9 , FIG. 11 , and FIG. 13 in the foregoing embodiments.
  • the multi-channel signal decoding device 1800 includes:
  • a receiver 1801, a transmitter 1802, a processor 1803 and a memory 1804 (the number of processors 1803 in the multi-channel signal decoding device 1800 can be one or more, one processor is taken as an example in FIG. 18 ).
  • the receiver 1801 , the transmitter 1802 , the processor 1803 and the memory 1804 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 18 .
  • the memory 1804 may include read-only memory and random-access memory, and provides instructions and data to the processor 1803 . A portion of memory 1804 may also include NVRAM.
  • the memory 1804 stores operating systems and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1803 controls the operation of the multi-channel signal decoding device, and the processor 1803 may also be referred to as a CPU.
  • various components of the multi-channel signal decoding device are coupled together through a bus system, wherein the bus system may include a power bus, a control bus, and a status signal bus, etc. in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 1803 or implemented by the processor 1803 .
  • the processor 1803 may be an integrated circuit chip and has a signal processing capability. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the processor 1803 or instructions in the form of software.
  • the aforementioned processor 1803 may be a general processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1804, and the processor 1803 reads the information in the memory 1804, and completes the steps of the above method in combination with its hardware.
  • the processor 1803 is configured to execute the methods performed by the multi-channel signal decoding apparatus shown in FIG. 5 , FIG. 8 , FIG. 10 , FIG. 12 , and FIG. 14 in the foregoing embodiments.
  • the chip when the device for encoding multi-channel signals or the device for decoding multi-channel signals is a chip in the terminal, the chip includes: a processing unit and a communication unit, and the processing unit may be, for example, a processor , the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit may execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the audio encoding method of any one of the above-mentioned first aspect, or the audio decoding method of any one of the second aspect.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read -only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling program execution of the method of the first aspect or the second aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
  • the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application .
  • a computer device which can be a personal computer, a server, or a network device, etc.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wired eg, coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种多声道信号的编解码方法和装置。在多声道信号的编码方法中,待编码多声道信号的当前帧包括第一声道和第二声道,获得第一声道的M个块的第一分组信息和第二声道的M个块的第二分组信息,当第一分组信息和第二分组信息满足预设条件时,根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息(405);接下来根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱(406),同样的可以获得第二待编码频谱(407),最后利用编码神经网络对第一待编码频谱和第二待编码进行编码,获得了频谱编码结果(408),通过码流可以携带该频谱编码结果(409)。能够实现针对不同暂态标识的块进行分组、调整以及编码,提高对多声道信号的编码质量。

Description

一种多声道信号的编解码方法和装置
本申请要求于2021年7月29日提交中国专利局、申请号为202110865298.2、发明名称为“一种多声道信号的编解码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术领域,尤其涉及一种多声道信号的编解码方法和装置。
背景技术
音频数据的压缩是媒体通信和媒体广播等媒体应用中不可或缺的环节。随着高清音频产业以及三维音频产业的发展,人们对音频质量的需求越来越高,随之而来的是媒体应用中音频数据量的迅猛增长。
目前的音频数据的压缩技术为基于信号处理的基本原理,在时间、空间上利用信号的相关性对原始的音频信号进行压缩,例如该音频信号包括立体声信号,以减少数据量,从而便于音频数据的传输或存储。
在目前的音频信号编码方案中,当音频信号是暂态信号时,存在编码质量低的问题。在解码端进行信号重建时,也会存在多声道信号的重建效果差的问题。
发明内容
本申请实施例提供了一种多声道信号的编解码方法和装置,用于提高多声道信号的编码质量和多声道信号的重建效果。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种多声道信号的编码方法,包括:
根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识;所述第一声道的M个块包括所述第一声道的第一块,所述第一块的第一暂态标识用于指示所述第一块为暂态块,或者指示所述第一块为非暂态块;
根据所述M个第一暂态标识获得所述第一声道的M个块的第一分组信息;
根据所述当前帧的第二声道的M个块的频谱获得所述第二声道的M个块的M个第二暂态标识;所述第二声道的M个块包括所述第二声道的第二块,所述第二块的第二暂态标识用于指示所述第二块为暂态块,或者指示所述第二块为非暂态块;
根据所述M个第二暂态标识获得所述第二声道的M个块的第二分组信息;
当所述第一分组信息和所述第二分组信息满足预设条件时,根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息,所述第一调整分组信息与所述第一分组信息对应,所述第二调整分组信息与所述第二分组信息对应;其中,所述第一调整分组信息与所述第一分组信息相同且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息与所述第二分组信息相同;或,所述第一调整分组信息是基 于对所述第一分组信息进行调整获得的且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;
根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱;
根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱;
利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,以获得频谱编码结果;
将所述频谱编码结果写入码流。
在上述方案中,待编码多声道信号的当前帧包括第一声道和第二声道,每个声道包括M个块的频谱,根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识,根据M个第一暂态标识获得第一声道的M个块的第一分组信息,同样的方式可以获得第二声道的M个块的第二分组信息,当第一分组信息和第二分组信息满足预设条件时,根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息;接下来根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱,同样的可以获得第二待编码频谱,最后利用编码神经网络对第一待编码频谱和第二待编码进行编码,获得了频谱编码结果,通过码流可以携带该频谱编码结果。因此本申请实施例中根据当前帧的各个声道的M个暂态标识获得了各个声道的M个块的分组信息,在各个声道的M个块的分组信息满足预设条件时获得各个声道的M个块的调整分组信息,根据各个声道的M个块的调整分组信息和每个声道的M个块的频谱获得待编码频谱,从而能够实现针对不同暂态标识的块进行分组、调整以及编码,提高对多声道信号的编码质量。
在一种可能的实现方式中,所述方法还包括:对所述第一调整分组信息,以及所述第二调整分组信息进行编码,以获得分组信息编码结果;将所述分组信息编码结果写入所述码流。在上述方案中,编码端在获得第一调整分组信息以及第二调整分组信息之后,对第一调整分组信息,以及第二调整分组信息进行编码,获得分组信息编码结果,对于该调整分组信息所采用的编码方式,此处不做限定。通过对调整分组信息的编码,可以获得分组信息编码结果,该分组信息编码结果可以被写入到码流中,从而使得码流可以携带分组信息编码结果,以使得解码端通过解析码流得到分组信息编码结果,进行解析获得第一调整分组信息和第二调整分组信息。
在一种可能的实现方式中,所述第一分组信息包括:所述第一声道的M个块的第一分组数量或第一分组数量标识,所述第一分组数量标识用于指示所述第一分组数量,当所述第一分组数量大于1时,所述第一分组信息还包括:所述M个第一暂态标识;或者,所述第一分组信息包括:所述M个第一暂态标识;
和/或,
所述第二分组信息包括:所述第二声道的M个块的第二分组数量或第二分组数量标识,所述第二分组数量标识用于指示所述第二分组数量,当所述第二分组数量大于1时,所述第二分组信息还包括:所述M个第二暂态标识;或者,所述第二分组信息包括:所述M个第二暂态标识;
和/或,
所述第一调整分组信息包括:所述第一声道的M个块的第一调整分组数量或第一调整 分组数量标识,所述第一调整分组数量标识用于指示所述第一调整分组数量,当所述第一调整分组数量大于1时,所述第一调整分组信息还包括:所述第一声道的M个块的M个第一调整暂态标识,所述第一块的第一调整暂态标识与所述第一块的第一暂态标识不同或所述第一块的第一调整暂态标识与所述第一块的第一暂态标识相同;或者,所述第一调整分组信息包括:所述M个第一调整暂态标识;
和/或,
所述第二调整分组信息包括:所述第二声道的M个块的第二调整分组数量或第二调整分组数量标识,所述第二调整分组数量标识用于指示所述第二调整分组数量,当所述第二调整分组数量大于1时,所述第二调整分组信息还包括:所述第二声道的M个块的M个第二调整暂态标识,所述第二块的第二调整暂态标识与所述第二块的第二暂态标识不同或所述第二块的第二调整暂态标识与所述第二块的第二暂态标识相同;或者,所述第二调整分组信息包括:所述M个第二调整暂态标识。
在上述方案中,第一调整分组信息和第一分组信息可以相同或者不同。第一分组信息包括:第一声道的M个块的第一分组数量或第一分组数量标识,第一调整分组信息包括:第一声道的M个块的第一调整分组数量或第一调整分组数量标识,当第一分组信息没有被调整时,第一分组数量和第一调整分组数量相同,第一分组数量标识和第一调整分组数量标识相同。当第一分组信息被调整时,第一分组数量和第一调整分组数量可以相同,也可以不同,例如针对第一分组信息的调整并不改变分组数量,则第一分组数量和第一调整分组数量相同,若针对第一分组信息的调整改变了分组数量,则第一分组数量和第一调整分组数量不同,例如第一分组信息调整之前,第一分组数量为2,第一分组信息被调整之后,第一调整分组数量为1。当第一分组信息被调整时,第一分组数量标识和第一调整分组数量标识可以相同,也可以不同。例如第一分组信息调整之前,第一分组数量为2,第一分组数量标识为1,第一分组信息被调整之后,若第一调整分组数量为2,第一分组数量标识仍然为1。同样的,第二调整分组信息和第二分组信息可以相同或者不同。
在一种可能的实现方式中,所述预设条件包括:所述第一分组信息与所述第二分组信息不一致。在上述方案中,第一分组信息与第二分组信息不一致是指第一分组信息和第二分组信息不完全一致,第一分组信息与第二分组信息不一致时可以认为第一分组信息和第二分组信息满足预设条件,第一分组信息与第二分组信息一致时可以认为第一分组信息和第二分组信息不满足预设条件。例如第一分组信息的M个块的分组数量与第二分组信息的M个块的分组数量相同,但是第一分组信息包括的M个第一暂态标识与第二分组信息包括的M个第二暂态标识不同。又如第一分组信息的M个块的分组数量与第二分组信息的M个块的分组数量不相同,该预设条件需要结合具体的应用场景来确定,此处不做限定。通过设置上述的预设条件,可以判断是否对第一分组信息和第二分组信息进行调整。
在一种可能的实现方式中,所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述M个第一暂态标识和所述M个第二暂态标识不一致;
或,
所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致;
或,
所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,所述M个第一暂态标识和所述M个第二暂态标识不一致,且所述第一声道的M个块中的第N块和所述第二声道的M个块中的第N块均为暂态,0≤N<M。
在上述方案的一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,同样的,第二声道的M个块包括暂态块和非暂态块。M个第一暂态标识和M个第二暂态标识不一致是指M个第一暂态标识中至少有一个暂态标识和M个第二暂态标识中相同索引的暂态标识的取值不同。例如,第一声道的M个块中有1个块A为暂态块,第二声道的M个块中有1个块B为暂态块,若块A在第一声道的M个块中的索引与块B在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是一致的。例如第一声道的M个块中有一个块C为非暂态块,第二声道的M个块中有一个块D为暂态块,若块C在第一声道的M个块中的索引与块D在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是不一致的。本申请实施例中当M个第一暂态标识和M个第二暂态标识不一致时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当M个第一暂态标识和M个第二暂态标识完全一致时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
在上述方案的一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,因此可以统计得到第一声道包括的暂态块数量,同样的,第二声道的M个块包括暂态块和非暂态块,因此可以统计得到第二声道包括的暂态块数量。本申请实施例中当第一通道的暂态块数量与第二声道的暂态块数量不同时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当第一通道的暂态块数量与第二声道的暂态块数量相同时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
在上述方案的一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,同样的,第二声道的M个块包括暂态块和非暂态块。M个第一暂态标识和M个第二暂态标识不一致是指M个第一暂态标识中至少有一个暂态标识和M个第二暂态标识中相同索引的暂态标识的取值不同。例如,第一声道的M个块中有1个块A为暂态块,第二声道的M个块中有1个块B为暂态块,若块A在第一声道的M个块中的索引与块B在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是一致的。例如第一声道的M个块中有一个块C为非暂态块,第二声道的M个块中有一个块D为暂态块,若块C在第一声道的M个块中的索引与块D在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是不一致的。第一声道的M个块中的第 N块和第二声道的M个块中的第N块均为暂态,0≤N<M,第一声道的第N块的索引和第二声道的第N块的索引是相同的,N的取值大小以及N的取值个数不做限定,例如N的取值个数为1个时表示第一声道和第二声道具有相同索引的暂态块为1个,例如N的取值个数为2时表示第一声道和第二声道具有相同索引的暂态块为2个。本申请实施例中当M个第一暂态标识和M个第二暂态标识不一致、且第一声道的M个块中的第N块和第二声道的M个块中的第N块均为暂态时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当M个第一暂态标识和M个第二暂态标识完全一致、或者M个第一暂态标识和M个第二暂态标识不一致且第一声道和第二声道不具有相同索引的暂态块时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
在一种可能的实现方式中,所述第一声道的M个块具有各自的索引,所述第二声道的M个块具有各自的索引;
当所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致时,如果所述第一声道的M个块中的暂态块的索引与所述第二声道的M个块中的暂态块的索引没有交集,所述根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息包括:
当所述第一声道的暂态块数量小于所述第二声道的暂态块数量时,对所述第一分组信息进行调整,以获得所述第一调整分组信息,所述第一调整分组信息指示的所述第一声道的暂态块数量与所述第二分组信息指示的所述第二声道的暂态块数量相等;
或,
当所述第一声道的暂态块数量大于所述第二声道的暂态块数量时,对所述第二分组信息进行调整,以获得所述第二调整分组信息,所述第二调整分组信息指示的所述第二声道的暂态块数量与所述第一分组信息指示的所述第一声道的暂态块数量相等。
在上述方案中,当第一声道的暂态块数量与第二声道的暂态块数量不一致、且第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引没有交集时,则需要对暂态块数量较小的那个声道的分组信息进行调整,而暂态块数量较多的那个声道的分组信息保持不变,并且调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。当第一声道的暂态块数量小于第二声道的暂态块数量时,对第一分组信息进行调整,以获得第一调整分组信息,具体的,第一分组信息的调整可以包括对M个块的第一暂态标识进行调整,例如M个块中的第一块的第一暂态标识从非暂态调整为暂态,以使得第一声道的暂态块数量增加,使得第一调整分组信息中第一声道的暂态块数量(即第一声道的调整后的暂态块数量)与所述第二分组信息指示的所述第二声道的暂态块数量相等。当第一声道的暂态块数量大于第二声道的暂态块数量时,对第二分组信息进行调整,以获得第二调整分组信息,具体的,第二分组信息的调整可以包括对M个块的第二暂态标识进行调整,例如M个块中的第二块的第二暂态标识从非暂态调整为暂态,以使得第二声道的暂态块数量增加,使得第二调整分组信息中第二声道的暂态块数量(即 第二声道的调整后的暂态块数量)与所述第一分组信息指示的所述第一声道的暂态块数量相等。
在一种可能的实现方式中,所述第一声道的M个块具有各自的索引,所述第二声道的M个块具有各自的索引;
当所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致时,如果所述第一声道的M个块中的暂态块的索引与所述第二声道的M个块中的暂态块的索引有交集,所述根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息包括:
当所述M个第一暂态标识指示的暂态块的索引是所述M个第二暂态标识指示的暂态块的索引的一部分时,对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识,所述M个第一调整暂态标识指示的所有暂态块的索引与所述M个第二暂态标识指示的所有暂态块的索引相同;
当所述M个第二暂态标识指示的暂态块的索引是所述M个第一暂态标识指示的暂态块的索引的一部分时,对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识,所述M个第二调整暂态标识指示的所有暂态块的索引与所述M个第一暂态标识指示的所有暂态块的索引相同;
当所述M个第一暂态标识指示的暂态块的索引与所述M个第二暂态标识指示的暂态块的索引部分相同时,对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识,对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识,所述M个第一调整暂态标识指示的所有暂态块的索引与所述M个第二调整暂态标识指示的所有暂态块的索引相同。
在上述方案的一种实现方式中,例如第一声道的暂态块数量小于第二声道的暂态块数量,即M个第一暂态标识指示的暂态块的索引是M个第二暂态标识指示的暂态块的索引的一部分,此时第一声道的M个块的第一暂态标识需要调整,第二声道的M个块的第二暂态标识保持不变,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二暂态标识指示的所有暂态块的索引相同,调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在上述方案的一种实现方式中,例如第二声道的暂态块数量小于第一声道的暂态块数量,即M个第二暂态标识指示的暂态块的索引是M个第一暂态标识指示的暂态块的索引的一部分,此时第二声道的M个块的第二暂态标识需要调整,第一声道的M个块的第一暂态标识保持不变,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第二调整暂态标识指示的所有暂态块的索引与M个第一暂态标识指示的所有暂态块的 索引相同,调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在上述方案的一种实现方式中,例如第二声道的暂态块数量不等于第一声道的暂态块数量,但是M个第一暂态标识指示的暂态块的索引与M个第二暂态标识指示的暂态块的索引部分相同,此处的部分相同是指第一声道的M个块中的有的暂态块的索引与第二声道的M个块中的有的暂态块的索引部分相同,但是并不是完全相同。此时第一声道的M个块的第一暂态标识需要调整,第二声道的M个块的第二暂态标识需要调整,即两个声道的M个块的暂态标识都需要调整,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二调整暂态标识指示的所有暂态块的索引相同。调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在一种可能的实现方式中,所述对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识包括:
当所述第一块的第一暂态标识指示所述第一块为非暂态块时,如果所述第二声道的M个块的第三块的第二暂态标识指示所述第三块为暂态块,将所述第一块的第一暂态标识调整为所述第一块的第一调整暂态标识,所述第一块的第一调整暂态标识指示所述第一块为暂态块,所述第一块的索引与所述第三块的索引相同;
所述对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识包括:
当所述第二块的第二暂态标识指示所述第二块为非暂态块时,如果所述第一声道的M个块的第四块的第一暂态标识指示所述第四块为暂态块,将所述第二块的第二暂态标识调整为所述第二块的第二调整暂态标识,所述第二块的第二调整暂态标识指示所述第二块为暂态块,所述第二块的索引与所述第四块的索引相同。
在上述方案中,以第一暂态标识的调整为例进行说明,当第一块的第一暂态标识指示第一块为非暂态块时,如果第二声道的M个块的第三块的第二暂态标识指示第三块为暂态块,将第一块的第一暂态标识调整为第一块的第一调整暂态标识,第一块的第一调整暂态标识指示第一块为暂态块,第一块的索引与第三块的索引相同。例如,第一块的第一暂态标识为1,而第三块的第二暂态标识为0,第一块的索引和第三块的索引都是4,则第一块的第一调整暂态标识为0。通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在一种可能的实现方式中,当所述第一调整分组数量大于1或所述M个第一调整暂态标识指示所述第一声道的M个块包括暂态块和非暂态块时,所述根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱包括:
根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第 一待编码频谱;
当所述第二调整分组数量大于1或所述M个第二调整暂态标识指示所述第二声道的M个块包括暂态块和非暂态块时,所述根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱包括:
根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱。
在上述方案中,以编码端获得第一调整分组信息为例,编码端获得M个块的第一调整分组信息之后,可以使用该M个块的第一调整分组信息对当前帧的M个块的频谱进行分组排列,通过对M个块的频谱进行分组排列,从而可以调整M个块的频谱在当前帧中的排列顺序。上述分组排列是根据M个块的第一调整分组信息进行的,M个块的第一调整分组信息是根据M个块的M个暂态标识获得,上述对M个块的分组排列之后,获得分组排列后的M个块的频谱,该分组排列后的M个块的频谱是以M个块的M个暂态标识为分组排序的依据,通过分组排序可以改变M个块的频谱的编码顺序。需要说明的是,上述当前帧的M个块可以是当前帧的第一声道的M个块。
在一种可能的实现方式中,所述根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为暂态块的频谱分到第一暂态组中,以及将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为非暂态块的频谱分到第一非暂态组中;将所述第一暂态组中的块的频谱排列至所述第一非暂态组中的块的频谱之前,以获得所述第一待编码频谱;
或,
所述根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为暂态块的频谱分到第二暂态组中,以及将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为非暂态块的频谱分到第二非暂态组中;将所述第二暂态组中的块的频谱排列至所述第二非暂态组中的块的频谱之前,以获得所述第二待编码频谱。
在上述方案中,编码端获得M个块的第一调整分组信息之后,对M个块基于暂态标识的不同进行分组,从而可以获得暂态组和非暂态组,接下来对M个块在当前帧的频谱中的位置进行排列,将暂态组中的块的频谱排列至非暂态组中的块的频谱之前,以获得待编码频谱。即在待编码频谱中所有暂态块的频谱位于非暂态块的频谱之前,从而能够将暂态块的频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。
在一种可能的实现方式中,所述根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为暂态块的频谱排列至所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为非暂态块的频谱之前,以获得所述第一待编码频谱;
或,
所述根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为暂态块的频谱排列至所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为非暂态块的频谱之前,以获得所述第二待编码频谱。
在上述方案中,编码端获得M个块的第一调整分组信息之后,根据该第一调整分组信息确定M个块中每个块的暂态标识,先从M个块中找到P个暂态块以及Q个非暂态块,则M=P+Q。将M个块中被M个第一调整暂态标识指示为暂态块的频谱排列至M个块中被M个暂态标识指示为非暂态块的频谱之前,以获得待编码频谱。即在待编码频谱中所有暂态块的频谱位于非暂态块的频谱之前,从而能够将暂态块的频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。
在一种可能的实现方式中,所述利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码之前,所述方法还包括:
对所述第一待编码频谱进行组内交织处理,以获得组内交织处理后的第一频谱;
对所述第二待编码频谱进行组内交织处理,以获得组内交织处理后的第二频谱;
所述利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,包括:
利用所述编码神经网络对所述组内交织处理后的第一频谱和所述组内交织处理后的第二频谱进行编码。
在上述方案中,编码端在获得待编码频谱(例如第一待编码频谱和第二待编码频谱)之后,可以先根据各个声道的M个块的分组进行组内的交织处理,从而获得组内交织处理后的M个块的频谱。则组内交织处理后的M个块的频谱可以是编码神经网络的输入数据。上述当前帧的M个块可以是当前帧的第一声道的M个块。通过组内交织处理,还可以减少编码的边信息,提高编码效率。
在一种可能的实现方式中,所述第一声道的M个块中被所述M个第一调整暂态标识指示为暂态块的数量为P个,所述第一声道的M个块中被所述M个第一调整暂态标识指示为非暂态块的数量为Q个,M=P+Q;
所述对所述第一待编码频谱进行组内交织处理,包括:
对所述P个块的频谱进行交织处理,以获得所述P个块的交织处理后的频谱;
对所述Q个块的频谱进行交织处理,以获得所述Q个块的交织处理后的频谱。
在上述方案中,对P个块的频谱进行交织处理包括将所述P个块的频谱作为一个整体来进行交织处理;同理,对Q个块的频谱进行交织处理包括将所述Q个块的频谱作为一个整体来进行交织处理。若第一声道的M个块的调整分组数量为1,则需要对第一声道的M个块的频谱进行组内交织处理,以获得第一声道的M个块的组内交织处理后的频谱。
在一种可能的实现方式中,所述根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识前,所述方法还包括:
获得所述第一声道的第一窗类型,所述第一窗类型为短窗类型或非短窗类型;
获得所述第二声道的第二窗类型,所述第二窗类型为短窗类型或非短窗类型;
当所述第一窗类型和所述第二窗类型均为短窗类型时,才执行根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识的步骤。
在上述方案中,编码端可以先确定当前帧的窗类型,该窗类型可以为短窗类型或非短窗类型,例如编码端根据待编码多声道信号的当前帧确定窗类型。其中,短窗又可以称为短帧,非短窗又可以称为非短帧。当窗类型为短窗类型时,触发执行前述获得第一声道的M个块的M个第一暂态表示的步骤。本申请实施例中在当前帧的窗类型为短窗类型时执行前述的编码方案,实现在多声道信号为暂态信号时的编码。
在一种可能的实现方式中,所述方法还包括:
对所述第一窗类型和所述第二窗类型进行编码以获得窗类型编码结果;
将所述窗类型编码结果写入所述码流。
在上述方案中,编码端在获得当前帧的第一声道的第一窗类型和第二声道的第二窗类型之后,可以在码流中携带该窗类型,首先对该窗类型进行编码,对于该窗类型所采用的编码方式,此处不做限定。通过对窗类型的编码,可以获得窗类型编码结果,该窗类型编码结果可以被写入到码流中,从而使得码流可以携带窗类型编码结果。使得解码端可以通过码流获得窗类型编码结果,解析窗类型编码结果获得当前帧的第一声道的第一窗类型和第二声道的第二窗类型。根据第一声道的第一窗类型和第二声道的第二窗类型确定是否对码流进行继续解码,以获得第一声道的M个块的第一解码分组信息。
在一种可能的实现方式中,所述根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识,包括:
根据所述第一声道的M个块的频谱获得所述第一声道的M个块的M个第一频谱能量;
根据所述M个第一频谱能量获得所述第一声道的M个块的第一频谱能量平均值;
根据所述M个第一频谱能量与所述第一频谱能量平均值获得所述M个第一暂态标识。
在上述方案中,编码端获得M个频谱能量之后,可以将M个频谱能量进行平均,以获得频谱能量平均值,或者将M个频谱能量中的最大值或最大的若干个值剔除之后,再进行平均,以获得频谱能量平均值。通过M个频谱能量中每个块的频谱能量与频谱能量平均值进行比较,以确定每个块的频谱相比于M个块中其它块的频谱的变化情况,进而获得M个块的M个暂态标识,其中,一个块的暂态标识可以用于表示一个块的暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。本申请实施例通过每个块的频谱能量与频谱能量平均值可以确定出每个块的暂态标识,使得一个块的暂态标识能够确定该块的分组信息。
在一种可能的实现方式中,当所述第一块的第一频谱能量大于所述第一频谱能量平均值的K倍时,所述第一块的第一暂态标识指示所述第一块为暂态块;或,
当所述第一块的第一频谱能量小于或等于所述第一频谱能量平均值的K倍时,所述第一块的暂态标识指示所述第一块为非暂态块;
其中,所述K为大于或等于1的实数。
在上述方案中,K的取值有多种,此处不做限定。以M个块中第一块的暂态标识的确 定过程为例,当第一块的频谱能量大于频谱能量平均值的K倍时,说明第一块相较于M个块的其它块,频谱变化过大,此时第一块的暂态标识指示第一块为暂态块。当第一块的频谱能量小于或等于频谱能量平均值的K倍时,说明第一块相较于M个块的其它块,频谱变化不大,第一块的暂态标识指示第一块为非暂态块。上述当前帧的M个块可以是当前帧的第一声道的M个块。不限定的是,编码端还可以根据其它方式获得M个块的M个暂态标识,例如获得第一块的频谱能量与频谱能量平均值的差值或者比例值,根据获得的差值或者比例值来确定M个块的M个暂态标识。
第二方面,本申请实施例还提供一种多声道信号的解码方法,包括:
从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,所述第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识;
从所述码流中获得所述当前帧的第二声道的M个块的第二解码分组信息,所述第二解码分组信息用于指示所述第二声道的M个块的第二解码暂态标识;
利用解码神经网络对所述码流进行解码,以获得所述第一声道的M个块的解码频谱和所述第二声道的M个块的解码频谱;
根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号;
根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号。
在上述方案中,从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识,同样的方式从码流中获得第二声道的M个块的第二解码分组信息,利用解码神经网络对码流进行解码,以获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱;利用第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,同样的,利用第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号。解码码流时获得的第一声道的M个块的第一解码频谱和第二声道的M个块的第二解码频谱分别对应编码端的分组排列后的第一声道的M个块的频谱和分组排列后的第二声道的M个块的频谱,因此可以通过第一解码分组信息和第二解码分组信息获得第一声道的第一重构信号和第二声道的第二重构信号。在进行信号重建时,可以根据多声道信号中不同暂态标识的块进行解码和重构,因此能够提高多声道信号的重建效果。
在一种可能的实现方式中,所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
当所述第一解码分组信息指示所述第一声道的M个块的第一解码分组数量大于1时,对所述第一声道的M个块的解码频谱进行逆分组排列处理,以获得所述第一声道的M个块的逆分组排列处理后的频谱;
根据所述第一声道的M个块的逆分组排列处理后的频谱获得所述第一声道的第一重构信号;
所述根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号包括:
当所述第二解码分组信息指示所述第二声道的M个块的第二解码分组数量大于1时,对所述第二声道的M个块的解码频谱进行逆分组排列处理,以获得所述第二声道的M个块的逆分组排列处理后的频谱;
根据所述第二声道的M个块的逆分组排列处理后的频谱获得所述第二声道的第二重构信号。
在上述方案中,以第一声道的信号重构过程为例,解码端获得M个块的第一解码分组信息,解码端通过码流还获得第一声道的M个块的解码频谱,由于编码端对第一声道的M个块的解码频谱进行了分组排列处理,在解码端需要执行与编码端相逆的流程,因此根据M个块的第一解码分组信息对第一声道的M个块的解码频谱进行逆分组排列处理,以获得第一声道的M个块的逆分组排列处理的频谱,该逆分组排列处理与编码端的分组排列处理相逆。编码端在获得第一声道的M个块的逆分组排列处理的频谱之后,可以通过对第一声道的M个块的逆分组排列处理的频谱进行频域到时域的变换,以此获得第一声道的第一重构信号。
在一种可能的实现方式中,所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
对所述第一声道的M个块的解码频谱进行组内解交织处理,以获得所述第一声道的M个块的组内解交织处理后的频谱;
根据所述第一声道的M个块的组内解交织处理后的频谱获得所述第一重构信号;
所述根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号,包括:
对所述第二声道的M个块的解码频谱进行组内解交织处理,以获得所述第二声道的M个块的组内解交织处理后的频谱;
根据所述第二声道的M个块的组内解交织处理后的频谱获得所述第二重构信号。
在上述方案中,解码端执行的组内解交织为编码端的组内交织的逆过程,此处不再详细说明。
在一种可能的实现方式中,所述第一声道的M个块中被所述M个第一解码暂态标识指示为暂态块的数量为P个,所述第一声道的M个块中被所述M个第一解码暂态标识指示为非暂态块的数量为Q个,其中,M=P+Q;
所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
对所述第一声道的所述P个块的解码频谱进行组内解交织处理和对所述第一声道的所述Q个块的解码频谱进行组内解交织处理,以获得所述第一声道的M个块的组内解交织处理后的频谱;
根据所述第一解码分组信息对所述第一声道的M个块的组内解交织处理后的频谱进行逆分组排列处理,以获得所述第一声道的M个块的逆分组排列处理后的频谱;
根据所述第一声道的M个块的逆分组排列处理后的频谱获得所述第一声道的第一重构信号。
在上述方案中,对P个块的频谱进行解交织处理包括将所述P个块的频谱作为一个整 体来进行解交织处理;同理,对Q个块的频谱进行解交织处理包括将所述Q个块的频谱作为一个整体来进行解交织处理。编码端可以根据暂态组和非暂态组分别进行交织处理,从而可以获得P个块的交织处理后的频谱和Q个块的交织处理后的频谱。P个块的交织处理后的频谱、Q个块的交织处理后的频谱可以作为编码神经网络的输入数据。通过组内交织处理,还可以减少编码的边信息,提高编码效率。由于编码端进行了组内交织,解码端需要执行相应的逆过程,即解码端可以进行解交织处理。若第一声道的M个块的调整分组数量为1,则需要对第一声道的M个块的解码频谱进行组内解交织处理,以获得第一声道的M个块的组内解交织处理后的频谱
在一种可能的实现方式中,所述根据所述第一解码分组信息对所述第一声道的M个块的组内解交织处理后的频谱进行逆分组排列处理,包括:
根据所述第一解码分组信息获得所述第一声道的所述P个块的索引;
根据所述第一解码分组信息获得所述第一声道的所述Q个块的索引;
根据所述P个块的索引和所述Q个块的索引对所述第一声道的M个块的组内解交织处理后的频谱进行所述逆分组排列处理。
在上述方案中,编码端对M个块的频谱进行分组排列之前,M个块的索引是连续的,例如从0至M-1。当编码端进行分组排列之后,M个块的索引不再连续。解码端根据M个块的第一解码分组信息可以获得重构的分组排列后的M个块中的P个块的索引、重构的分组排列后的M个块中的Q个块的索引,通过逆分组排列处理,可以恢复出M个块的索引仍是连续的。
在一种可能的实现方式中,所述方法还包括:
从所述码流中获得当前帧的第一声道的窗类型;
从所述码流中获得所述当前帧的第二声道的窗类型;
当所述第一窗类型和所述第二窗类型均为短窗类型时,才执行从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息的步骤。
在上述方案中,只有在当前帧的第一窗类型和第二窗类型均为短窗类型时可以执行前述的编码方案,实现在多声道信号为暂态信号时的编码。解码端执行与编码端相逆的过程,因此解码端也可以先确定当前帧的第一窗类型和第二窗类型,该窗类型可以为短窗类型或非短窗类型,例如解码端从码流中获得当前帧的窗类型,当前帧包括第一声道和第二声道,则可以获得第一声道的第一窗类型和第二声道的第二窗类型。
在一种可能的实现方式中,所述第一解码分组信息包括:所述第一声道的M个块的第一解码分组数量或第一解码分组数量标识,所述第一解码分组数量标识用于指示所述第一解码分组数量,当所述第一解码分组数量大于1时,所述第一解码分组信息还包括:M个第一解码暂态标识;或者,所述第一解码分组信息包括:所述M个第一解码暂态标识;
和/或,
所述第二解码分组信息包括:所述第二声道的M个块的第二解码分组数量或第二解码分组数量标识,所述第二解码分组数量标识用于指示所述第二解码分组数量,当所述第二解码分组数量大于1时,所述第二解码分组信息还包括:M个第二解码暂态标识;或者,所述第二解码分组信息包括:所述M个第二解码暂态标识。
在上述方案中,编码端在码流中携带分组信息编码结果,该分组信息编码结果包括第一调整分组信息和第二调整分组信息,解码端通过解码码流可以得到第一解码分组信息和第二解码分组信息,第一解码分组信息对应于编码端的第一调整分组信息,第二解码分组信息对应于编码端的第二调整分组信息。例如,第一解码分组信息包括:第一声道的M个块的第一解码分组数量或第一解码分组数量标识,第一解码分组数量表示第一声道的分组数量或者调整分组数量,第一解码分组数量标识用于指示第一声道的分组数量或者调整分组数量。M个第一解码暂态标识用于指示第一声道的M个块分别对应的暂态标识或者调整暂态标识。同样的,第二解码分组信息与第一解码分组信息的说明相类似。
第三方面,本申请实施例还提供一种多声道信号的编码装置,包括:
暂态标识获得模块,用于根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识;所述第一声道的M个块包括所述第一声道的第一块,所述第一块的第一暂态标识用于指示所述第一块为暂态块,或者指示所述第一块为非暂态块;
分组信息获得模块,用于根据所述M个第一暂态标识获得所述第一声道的M个块的第一分组信息;
所述暂态标识获得模块,用于根据所述当前帧的第二声道的M个块的频谱获得所述第二声道的M个块的M个第二暂态标识;所述第二声道的M个块包括所述第二声道的第二块,所述第二块的第二暂态标识用于指示所述第二块为暂态块,或者指示所述第二块为非暂态块;
所述分组信息获得模块,用于根据所述M个第二暂态标识获得所述第二声道的M个块的第二分组信息;
分组信息调整模块,用于当所述第一分组信息和所述第二分组信息满足预设条件时,根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息,所述第一调整分组信息与所述第一分组信息对应,所述第二调整分组信息与所述第二分组信息对应;其中,所述第一调整分组信息与所述第一分组信息相同且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息与所述第二分组信息相同;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;
频谱获得模块,用于根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱;
所述频谱获得模块,用于根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱;
编码模块,用于利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,以获得频谱编码结果;将所述频谱编码结果写入码流。
在本申请的第三方面中,多声道信号的编码装置的组成模块还可以执行前述第一方面以及各种可能的实现方式中所描述的步骤,详见前述对第一方面以及各种可能的实现方式中的说明。
第四方面,本申请实施例还提供一种多声道信号的解码装置,包括:
分组信息获得模块,用于从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,所述第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识;
所述分组信息获得模块,用于从所述码流中获得所述当前帧的第二声道的M个块的第二解码分组信息,所述第二解码分组信息用于指示所述第二声道的M个块的第二解码暂态标识;
解码模块,用于利用解码神经网络对所述码流进行解码,以获得所述第一声道的M个块的解码频谱和所述第二声道的M个块的解码频谱;
重构信号获得模块,用于根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号;
所述重构信号获得模块,用于根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号。
在本申请的第四方面中,多声道信号的解码装置的组成模块还可以执行前述第二方面以及各种可能的实现方式中所描述的步骤,详见前述对第二方面以及各种可能的实现方式中的说明。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第六方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第七方面,本申请实施例提供了一种计算机可读存储介质,包括如前述第一方面所述的方法所生成的码流。
第八方面,本申请实施例提供一种通信装置,该通信装置可以包括终端设备或者芯片等实体,所述通信装置包括:处理器、存储器;所述存储器用于存储指令;所述处理器用于执行所述存储器中的所述指令,使得所述通信装置执行如前述第一方面或第二方面中任一项所述的方法。
第九方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持多声道信号的编码装置或者多声道信号的解码装置实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存多声道信号的编码装置或者多声道信号的解码装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
从以上技术方案可以看出,本申请实施例具有以下优点:
在本申请实施例中,待编码多声道信号的当前帧包括第一声道和第二声道,每个声道包括M个块的频谱,根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识,根据M个第一暂态标识获得第一声道的M个块的第一分组信息,同样的方式可以获得第二声道的M个块的第二分组信息,当第一分组信息和第二分组信息满足预设条件时,根据第一分组信息和第二分组信息获得第一调整分组信息和 第二调整分组信息;接下来根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱,同样的可以获得第二待编码频谱,最后利用编码神经网络对第一待编码频谱和第二待编码进行编码,获得了频谱编码结果,通过码流可以携带该频谱编码结果。因此本申请实施例中根据当前帧的各个声道的M个暂态标识获得了各个声道的M个块的分组信息,在各个声道的M个块的分组信息满足预设条件时获得各个声道的M个块的调整分组信息,根据各个声道的M个块的调整分组信息和每个声道的M个块的频谱获得待编码频谱,从而能够实现针对不同暂态标识的块进行分组、调整以及编码,提高对多声道信号的编码质量。
在本申请的另一个实施例中,从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识,同样的方式从码流中获得第二声道的M个块的第二解码分组信息,利用解码神经网络对码流进行解码,以获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱;利用第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,同样的,利用第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号。解码码流时获得的第一声道的M个块的第一解码频谱和第二声道的M个块的第二解码频谱分别对应编码端的分组排列后的第一声道的M个块的频谱和分组排列后的第二声道的M个块的频谱,因此可以通过第一解码分组信息和第二解码分组信息获得第一声道的第一重构信号和第二声道的第二重构信号。在进行信号重建时,可以根据多声道信号中不同暂态标识的块进行解码和重构,因此能够提高多声道信号的重建效果。
附图说明
图1为本申请实施例提供的音频处理系统的组成结构示意图;
图2a为本申请实施例提供的音频编码器和音频解码器应用于终端设备的示意图;
图2b为本申请实施例提供的音频编码器应用于无线设备或者核心网设备的示意图;
图2c为本申请实施例提供的音频解码器应用于无线设备或者核心网设备的示意图;
图3a为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图;
图3b为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图;
图3c为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图;
图4为本申请实施例提供的一种多声道信号的编码方法的示意图;
图5为本申请实施例提供的一种多声道信号的解码方法的示意图;
图6为本申请实施例提供的一种音频信号的编解码系统的示意图;
图7为本申请实施例提供的一种多声道信号的编码方法的示意图;
图8为本申请实施例提供的一种多声道信号的解码方法的示意图;
图9为本申请实施例提供的一种多声道信号的编码方法示意图;
图10为本申请实施例提供的一种多声道信号的解码方法的示意图;
图11为本申请实施例提供的一种多声道信号的编码方法的示意图;
图12为本申请实施例提供的一种多声道信号的解码方法的示意图;
图13为本申请实施例提供的一种多声道信号的编码方法示意图;
图14为本申请实施例提供的一种多声道信号的解码方法的示意图;
图15为本申请实施例提供的一种多声道信号的编码装置的组成结构示意图;
图16为本申请实施例提供的一种多声道信号的解码装置的组成结构示意图;
图17为本申请实施例提供的另一种多声道信号的编码装置的组成结构示意图;
图18为本申请实施例提供的另一种多声道信号的解码装置的组成结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
声音(sound)是由物体振动产生的一种连续的波。产生振动而发出声波的物体称为声源。声波通过介质(如:空气、固体或液体)传播的过程中,人或动物的听觉器官能感知到声音。
声波的特征包括音调、音强和音色。音调表示声音的高低。音强表示声音的大小。音强也可以称为响度或音量。音强的单位是分贝(decibel,dB)。音色又称为音品。
声波的频率决定了音调的高低。频率越高音调越高。物体在一秒钟之内振动的次数称为频率,频率单位是赫兹(hertz,Hz)。人耳能识别的声音的频率在20Hz至20000Hz之间。
声波的幅度决定了音强的强弱。幅度越大音强越大。距离声源越近,音强越大。
声波的波形决定了音色。声波的波形包括方波、锯齿波、正弦波和脉冲波等。
根据声波的特征,声音可以分为规则声音和无规则声音。无规则声音是指声源无规则地振动发出的声音。无规则声音例如是影响人们工作、学习和休息等的噪声。规则声音是指声源规则地振动发出的声音。规则声音包括语音和乐音。声音用电表示时,规则声音是一种在时频域上连续变化的模拟信号。该模拟信号可以称为音频信号(acoustic signals)。音频信号是一种携带语音、音乐和音效的信息载体。
由于人的听觉具有辨别空间中声源的位置分布的能力,则听音者听到空间中的声音时,除了能感受到声音的音调、音强和音色外,还能感受到声音的方位。
声音还可以根据分为单声道和立体声。单声道具有一个声音通道,用一个传声器拾取声音,用一个扬声器进行放音。立体声具有多个声音通道,且不同的声音通道传输不同声音波形。
当音频信号为暂态信号时,目前的编码端并未提取暂态特征并在码流中进行传输,该暂态特征用于表示音频信号的暂态帧中相邻块频谱的变化情况,从而在解码端进行信号重建时,无法从码流中获得重建的音频信号的暂态特征,存在音频信号重建效果差的问题。
本申请实施例提供一种音频处理技术,尤其是提供一种面向多声道信号的音频编码技 术,以改进传统的音频编码系统,多声道信号是指包括多个声道的音频信号,例如多声道信号可以是立体声信号。音频处理包括音频编码和音频解码两部分。音频编码在源侧执行,包括编码(例如,压缩)原始音频以减少表示该音频所需的数据量,从而更高效地存储和/或传输。音频解码在目的侧执行,包括相对于编码器作逆处理,以重建原始音频。编码部分和解码部分也合称为编码。下面将结合附图对本申请实施例的实施方式进行详细描述。
本申请实施例的技术方案可以应用于各种的音频处理系统,如图1所示,为本申请实施例提供的音频处理系统的组成结构示意图。音频处理系统100可以包括:多声道信号的编码装置101和多声道信号的解码装置102。其中,多声道信号的编码装置101又可以称为音频编码装置,可用于生成码流,然后该音频编码码流可以通过音频传输通道传输给多声道信号的解码装置102,多声道信号的解码装置102又可以称为多音频解码装置,可以接收到码流,然后执行多声道信号的解码装置102的音频解码功能,最后获得重建后的信号。
在本申请的实施例中,该多声道信号的编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的编码装置可以是上述终端设备或者无线设备或者核心网设备的音频编码器。同样的,该多声道信号的解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的解码装置可以是上述终端设备或者无线设备或者核心网设备的音频解码器。例如,音频编码器可以包括无线接入网、核心网的媒体网关、转码设备、媒体资源服务器、移动终端、固网终端等,音频编码器还可以是应用于虚拟现实技术(virtual reality,VR)流媒体(streaming)服务中的音频编码器。
在申请实施例中,以适用于虚拟现实流媒体(VR streaming)服务中的音频编码模块(audio encoding及audio decoding)为例,端到端对音频信号的编解码流程包括:音频信号A经过采集模块(acquisition)后进行预处理操作(audioPReprocessing),预处理操作包括滤除掉信号中的低频部分,可以是以20Hz或者50Hz为分界点,提取信号中的方位信息,之后进行编码处理(audio encoding)打包(file/segment encapsulation)之后发送(delivery)到解码端,解码端首先进行解包(file/segment decapsulation),之后解码(audio decoding),对解码信号进行双耳渲染(audio rendering)处理,渲染处理后的信号映射到收听者耳机(headphones)上,可以为独立的耳机,也可以是眼镜设备上的耳机。
如图2a所示,为本申请实施例提供的音频编码器和音频解码器应用于终端设备的示意图。对于每个终端设备都可以包括:音频编码器、信道编码器、音频解码器、信道解码器。具体的,信道编码器用于对音频信号进行信道编码,信道解码器用于对音频信号进行信道解码。例如,在第一终端设备20中可以包括:第一音频编码器201、第一信道编码器202、第一音频解码器203、第一信道解码器204。在第二终端设备21中可以包括:第二音频解码器211、第二信道解码器212、第二音频编码器213、第二信道编码器214。第一终端设备20连接无线或者有线的第一网络通信设备22,第一网络通信设备22和无线或者有线的第二网络通信设备23之间通过数字信道连接,第二终端设备21连接无线或者有线的第二网络通信设备23。其中,上述无线或者有线的网络通信设备可以泛指信号传输设备,例如 通信基站,数据交换设备等。
在音频通信中,作为发送端的终端设备首先进行音频采集,对采集到的音频信号进行音频编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号进行信道解码,以获得码流,然后经过音频解码恢复出音频信号,由接收端的终端设备进音频回放。
如图2b所示,为本申请实施例提供的音频编码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、其他音频解码器252、本申请实施例提供的音频编码器253、信道编码器254,其中,其他音频解码器252是指除音频解码器以外的其他音频解码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用其他音频解码器252进行音频解码,然后使用本申请实施例提供的音频编码器253进行音频编码,最后使用信道编码器254对音频信号进行信道编码,完成信道编码之后再传输出去。其中,其他音频解码器252是对信道解码器251解码后的码流进行音频解码。
如图2c所示,为本申请实施例提供的音频解码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、本申请实施例提供的音频解码器255、其他音频编码器256、信道编码器254,其中,其他音频编码器256是指除音频编码器以外的其他音频编码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用音频解码器255对接收到的音频编码码流进行解码,然后使用其他音频编码器256进行音频编码,最后使用信道编码器254对音频信号进行信道编码,完成信道编码之后再传输出去。在无线设备或者核心网设备中,如果需要实现转码,则需要进行相应的音频编码处理。其中,无线设备指的是通信中的射频相关的设备,核心网设备指的是通信中核心网相关的设备。
在本申请的一些实施例中,该多声道信号的编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的编码装置可以是上述终端设备或者无线设备或者核心网设备的多声道编码器。同样的,该多声道信号的解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的解码装置可以是上述终端设备或者无线设备或者核心网设备的多声道解码器。
如图3a所示,为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图,对于每个终端设备都可以包括:多声道编码器、信道编码器、多声道解码器、信道解码器。该多声道编码器可以执行本申请实施例提供的音频编码方法,该多声道解码器可以执行本申请实施例提供的音频解码方法。具体的,信道编码器用于对多声道信号进行信道编码,信道解码器用于对多声道信号进行信道解码。例如,在第一终端设备30中可以包括:第一多声道编码器301、第一信道编码器302、第一多声道解码器303、第一信道解码器304。在第二终端设备31中可以包括:第二多声道解码器311、第二信道解码器312、第二多声道编码器313、第二信道编码器314。第一终端设备30连接无线或者有线的第一网络通信设备32,第一网络通信设备32和无线或者有线的第二网络通信设备33之间通过数字信道连接,第二终端设备31连接无线或者有线的第二网络通信设备33。其中,上述 无线或者有线的网络通信设备可以泛指信号传输设备,例如通信基站,数据交换设备等。音频通信中作为发送端的终端设备对采集到的多声道信号进行多声道编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号,进行信道解码,以获得多声道信号编码码流,然后经过多声道解码恢复出多声道信号,由作为接收端的终端设备进回放。
如图3b所示,为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、其他音频解码器352、多声道编码器353、信道编码器354,与前述图2b类似,此处不再赘述。
如图3c所示,为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、多声道解码器355、其他音频编码器356、信道编码器354,与前述图2c类似,此处不再赘述。
其中,音频编码处理可以是多声道编码器中的一部分,音频解码处理可以是多声道解码器中的一部分,例如,对采集到的多声道信号进行多声道编码可以是将采集到的多声道信号经过处理后获得音频信号,再按照本申请实施例提供的方法对获得的音频信号进行编码;解码端根据多声道信号编码码流,解码获得音频信号,经过上混处理后恢复出多声道信号。因此,本申请实施例也可应用于终端设备、无线设备、核心网设备中的多声道编码器和多声道解码器。在无线或者核心网设备中,如果需要实现转码,则需要进行相应的多声道编码处理。
首先介绍本申请实施例提供的一种多声道信号的编码方法,该方法可以由终端设备执行,例如该终端设备可以是一种多声道信号的编码装置(如下简称编码端或者编码器,例如编码端可以是人工智能(artificial intelligence,AI)编码器)。本申请实施例中多声道信号可以包括多个声道,例如第一声道和第二声道,或者多个声道可以包括第一声道、第二声道和第三声道等。后续实施例中着重对第一声道的编码流程进行说明,其它声道的编码流程可以参阅对第一声道的编码处理方式,不再针对每个声道进行详细说明。如图4所示,对本申请实施例中编码端执行的编码流程进行说明:
401.根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识;第一声道的M个块包括第一声道的第一块,第一块的第一暂态标识用于指示第一块为暂态块,或者指示第一块为非暂态块。
编码端首先获得待编码多声道信号,将待编码多声道信号进行分帧处理,以获得待编码多声道信号的当前帧。后续实施例中以对当前帧的编码过程为例进行说明,待编码多声道信号的其它帧的编码方式与当前帧的编码方式类似。待编码多声道信号的当前帧包括第一声道和第二声道,每个声道包括M个块的频谱,例如第一声道可以是左声道,第二声道可以是右声道。或者第一声道和第二声道可以是多个声道中的任意两个声道,或者第一声道和第二声道可以是根据多声道信号获得的两个声道的信号。不限定的是,本申请实施例中,当前帧还可以包括3个声道或者更多声道,此处不做限定。本申请实施例中,针对第一声道和第二声道,获得暂态标识、获得分组信息、分组排列的方式相类似,后续实施例中仅以第一声道的处理为例,第二声道的处理可以参照对第一声道的处理方式,不再赘述。
编码端确定当前帧之后,对当前帧进行加窗处理,并进行时频变换,若当前帧包括M 个块,则可以获得当前帧的M个块的频谱,M表示当前帧中包括的块个数,本申请实施例中对于M的取值不做限定,例如,对当前帧的音频信号进行分块(block),得到M个块的音频信号,一个块的音频信号与对该块的音频信号进行加窗时使用的窗函数的长度相同,再对M个块的音频信号进行加窗和时频变换,从而可以得到M个块的频谱。例如,编码端对当前帧的M个块的加窗后的音频信号进行时频变换,以获得M个块的修正的离散余弦变换(modified discrete cosine transform,MDCT)频谱,后续实施例中以M个块的频谱为MDCT频谱为例,不限定的是,M个块的频谱也可以是其它频谱。上述当前帧的M个块可以是当前帧的第一声道的M个块。
编码端获得M个块的频谱之后,根据该M个块的频谱分别获得M个块的M个暂态标识。其中,每个块的频谱用于确定该块的暂态标识,每个块都对应一个暂态标识,一个块的暂态标识用于指示该块在M个块中的频谱变化情况。例如M个块中包括的某一个块为第一块,则该第一块对应一个暂态标识。上述当前帧的M个块可以是当前帧的第一声道的M个块。又如,第一声道的M个块中包括第四块,则第四块与第一块的索引不相同。
在本申请的一些实施例中,暂态标识的取值有多种实现方式,例如暂态标识可以指示第一块为暂态块,或者暂态标识可以指示第一块为非暂态块。其中,一个块的暂态标识为暂态表示该块的频谱相比于M个块中其它块的频谱变化较大,一个块的暂态标识为非暂态表示该块的频谱相比于M个块中其它块的频谱变化不大。例如暂态标识占用1个比特,若暂态标识取值为0则指示对应的块为暂态块,若暂态标识取值为1则指示对应的块为非暂态块。或者,若暂态标识取值为1则指示对应的块为暂态块,若暂态标识取值为0则指示对应的块为非暂态块,此处不做限定。
402.根据M个第一暂态标识获得第一声道的M个块的第一分组信息。
编码端在获得M个块的M个暂态标识之后,该M个块的M个暂态标识用于对M个块的分组,根据M个块的M个暂态标识获得M个块的第一分组信息,该M个块的第一分组信息可表示对M个块的分组方式,M个块的M个暂态标识是M个块的分组的依据,例如暂态标识相同的块可被分入一个组中,不同暂态标识的块被分入不同的组中。上述当前帧的M个块可以是当前帧的第一声道的M个块。
在本申请的一些实施例中,第一分组信息包括:第一声道的M个块的第一分组数量或第一分组数量标识,第一分组数量标识用于指示第一分组数量,当第一分组数量大于1时,第一分组信息还包括:M个第一暂态标识;或者,第一分组信息包括:M个第一暂态标识,也就是说第一分组信息可以不直接包括分组数量,而是由M个第一暂态标识间接指示分组数量,即当M个第一暂态标识指示第一声道的M个块均为暂态块或均为非暂态块时,分组数量为1,当M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块时,分组数量为2。
其中,M个块的第一分组信息可以有多种实现方式,M个块的第一分组信息包括:M个块的分组数量或分组数量标识,分组数量标识用于指示分组数量,当分组数量大于1时,M个块的第一分组信息还包括:M个块的M个暂态标识;或者,M个块的第一分组信息包括:M个块的M个暂态标识。通过上述M个块的第一分组信息可以指示M个块的分组情况,从而编码端可以使用该分组信息对M个块的频谱进行分组排列。上述当前帧的M个块可以是 当前帧的第一声道的M个块。
例如M个块的第一分组信息包括:M个块的分组数量和M个块的暂态标识,该M个块的暂态标识又可以称为分组标志信息,因此本申请实施例中分组信息可以包括分组数量和分组标志信息。例如分组数量的取值可以为1或2。分组标志信息用于指示M个块的暂态标识。
例如M个块的第一分组信息包括:M个块的暂态标识,该M个块的暂态标识又可以称为分组标志信息,因此本申请实施例中分组信息可以包括分组标志信息。例如分组标志信息用于指示M个块的暂态标识。
例如M个块的第一分组信息包括:M个块的分组数量为1,即当分组数量等于1时,M个块的第一分组信息不包括M个暂态标识,而当分组数量大于1时,M个块的第一分组信息还包括:M个块的M个暂态标识。
又如,M个块的第一分组信息中的分组数量还可以替换为分组数量标识,用于指示分组数量,例如分组数量标识为0时指示分组数量为1,分组数量标识为1时指示分组数量为2。
403.根据当前帧的第二声道的M个块的频谱获得第二声道的M个块的M个第二暂态标识;第二声道的M个块包括第二声道的第二块,第二块的第二暂态标识用于指示第二块为暂态块,或者指示第二块为非暂态块;
404.根据M个第二暂态标识获得第二声道的M个块的第二分组信息。
其中,步骤403至404与前述步骤401至402的实现方式类似,此处不再赘述。
编码端获得当前帧的第二声道的M个块的频谱之后,根据该M个块的频谱分别获得M个块的M个暂态标识。其中,每个块的频谱用于确定该块的暂态标识,每个块都对应一个暂态标识,一个块的暂态标识用于指示该块在M个块中的频谱变化情况。例如M个块中包括的某一个块为第二块,则该第二块对应一个暂态标识。又如,第二声道的M个块中包括第三块,则第三块与第二块的索引不相同。
405.当第一分组信息和第二分组信息满足预设条件时,根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息,第一调整分组信息与第一分组信息对应,第二调整分组信息与第二分组信息对应。
其中,第一调整分组信息与第一分组信息相同且第二调整分组信息是基于对第二分组信息进行调整获得的;或,第一调整分组信息是基于对第一分组信息进行调整获得的且第二调整分组信息与第二分组信息相同;或,第一调整分组信息是基于对第一分组信息进行调整获得的且第二调整分组信息是基于对第二分组信息进行调整获得的。
在本申请的一些实施例中,第一分组信息包括:第一声道的M个块的第一分组数量或第一分组数量标识,第一分组数量标识用于指示第一分组数量,当第一分组数量大于1时,第一分组信息还包括:M个第一暂态标识;或者,第一分组信息包括:M个第一暂态标识;
和/或,
第二分组信息包括:第二声道的M个块的第二分组数量或第二分组数量标识,第二分组数量标识用于指示第二分组数量,当第二分组数量大于1时,第二分组信息还包括:M个第二暂态标识;或者,第二分组信息包括:M个第二暂态标识;
和/或,
第一调整分组信息包括:第一声道的M个块的第一调整分组数量或第一调整分组数量标识,第一调整分组数量标识用于指示第一调整分组数量,当第一调整分组数量大于1时,第一调整分组信息还包括:第一声道的M个块的M个第一调整暂态标识,第一块的第一调整暂态标识与第一块的第一暂态标识不同或第一块的第一调整暂态标识与第一块的第一暂态标识相同;或者,第一调整分组信息包括:M个第一调整暂态标识;
和/或,
第二调整分组信息包括:第二声道的M个块的第二调整分组数量或第二调整分组数量标识,第二调整分组数量标识用于指示第二调整分组数量,当第二调整分组数量大于1时,第二调整分组信息还包括:第二声道的M个块的M个第二调整暂态标识,第二块的第二调整暂态标识与第二块的第二暂态标识不同或第二块的第二调整暂态标识与第二块的第二暂态标识相同;或者,第二调整分组信息包括:M个第二调整暂态标识。
具体的,第一分组信息、第二分组信息、第一调整分组信息、第二调整分组信息的实现方式可以是前述的针对分组信息的具体实现方式中的任意一种,此处不做限定。
需要说明的是,第一调整分组信息和第一分组信息可以相同或者不同,详见前述对第一调整分组信息和第一分组信息的说明,第一分组信息包括:第一声道的M个块的第一分组数量或第一分组数量标识,第一调整分组信息包括:第一声道的M个块的第一调整分组数量或第一调整分组数量标识,当第一分组信息没有被调整时,第一分组数量和第一调整分组数量相同,第一分组数量标识和第一调整分组数量标识相同。当第一分组信息被调整时,第一分组数量和第一调整分组数量可以相同,也可以不同,例如针对第一分组信息的调整并不改变分组数量,则第一分组数量和第一调整分组数量相同,若针对第一分组信息的调整改变了分组数量,则第一分组数量和第一调整分组数量不同,例如第一分组信息调整之前,第一分组数量为2,第一分组信息被调整之后,第一调整分组数量为1。当第一分组信息被调整时,第一分组数量标识和第一调整分组数量标识可以相同,也可以不同。例如第一分组信息调整之前,第一分组数量为2,第一分组数量标识为1,第一分组信息被调整之后,若第一调整分组数量为2,第一分组数量标识仍然为1。同样的,第二调整分组信息和第二分组信息可以相同或者不同,此处不再赘述。
在一种实施方式中,第一调整分组信息指示的第一声道的M个块中的暂态块的数量与第二调整分组信息指示的第二声道的M个块中的暂态块的数量相同。此时,第一调整分组信息指示的第一声道的M个块中的暂态块的位置(索引)与第二调整分组信息指示的第二声道的M个块中的暂态块位置(索引)可以相同,或者第一调整分组信息指示的第一声道的M个块中的暂态块的位置(索引)与第二调整分组信息指示的第二声道的M个块中的暂态块位置(索引)也可以不相同。
在另一种实施方式中,第一调整分组信息指示的第一声道的M个块中的暂态块的数量与第二调整分组信息指示的第二声道的M个块中的暂态块的数量相同,并且,第一调整分组信息指示的第一声道的M个块中的暂态块的位置(索引)与第二调整分组信息指示的第二声道的M个块中的暂态块位置(索引)也相同。
其中,当前帧包括第一声道和第二声道,若上述两个声道的分组信息满足预设的条件, 则需要对分组信息进行调整,该预设条件需要结合具体的应用场景来确定,此处不做限定。通过判断第一分组信息和第二分组信息是否满足预设条件,从而可以对第一分组信息和第二分组信息中的至少一个进行调整,使得第一声道的暂态块数量和第二声道的暂态块数量相同,从而便于后续进行编码操作。
当第一分组信息和第二分组信息满足预设条件时,编码端需要对第一分组信息和第二分组信息中的至少一个分组信息进行调整,以获得第一调整分组信息和第二调整分组信息。例如只调整第一分组信息,则第一调整分组信息是基于对第一分组信息进行调整获得的且第二调整分组信息与第二分组信息相同。又如只调整第二分组信息,第一调整分组信息与第一分组信息相同且第二调整分组信息是基于对第二分组信息进行调整获得的。又如,第一分组信息和第二分组信息都调整,则第一调整分组信息是基于对第一分组信息进行调整获得的且第二调整分组信息是基于对第二分组信息进行调整获得的。编码端通过对第一分组信息和第二分组信息中的至少一个分组信息进行调整,以使得调整后的分组信息可以用于分组排列,从而可以得到待编码频谱。
在本申请的一些实施例中,预设条件包括:第一分组信息与第二分组信息不一致。
其中,第一分组信息与第二分组信息不一致是指第一分组信息和第二分组信息不完全一致,第一分组信息与第二分组信息不一致时可以认为第一分组信息和第二分组信息满足预设条件,第一分组信息与第二分组信息一致时可以认为第一分组信息和第二分组信息不满足预设条件。例如第一分组信息的M个块的分组数量与第二分组信息的M个块的分组数量相同,但是第一分组信息包括的M个第一暂态标识与第二分组信息包括的M个第二暂态标识不同。又如第一分组信息的M个块的分组数量与第二分组信息的M个块的分组数量不相同,该预设条件需要结合具体的应用场景来确定,此处不做限定。通过设置上述的预设条件,可以判断是否对第一分组信息和第二分组信息进行调整。
在本申请的一些实施例中,第一分组信息与第二分组信息不一致具有多种实现方式,例如第一分组信息与第二分组信息不一致包括:M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块,M个第二暂态标识指示第二声道的M个块包括暂态块和非暂态块,且M个第一暂态标识和M个第二暂态标识不一致;
或,
第一分组信息与第二分组信息不一致包括:M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块,M个第二暂态标识指示第二声道的M个块包括暂态块和非暂态块,第一通道的暂态块数量与第二声道的暂态块数量不同;
或,
第一分组信息与第二分组信息不一致包括:M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块,M个第二暂态标识指示第二声道的M个块包括暂态块和非暂态块,M个第一暂态标识和M个第二暂态标识不一致,且第一声道的M个块中的第N块和第二声道的M个块中的第N块均为暂态,0≤N<M。
在一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,同样的,第二声道的M个块包括暂态块和非暂态块。M个第一暂态标识和M个第二暂态标识不一致是指M个第一暂态标识中至少有一个暂态标识和M个第二暂态标识 中相同索引的暂态标识的取值不同。例如,第一声道的M个块中有1个块A为暂态块,第二声道的M个块中有1个块B为暂态块,若块A在第一声道的M个块中的索引(index)与块B在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是一致的。例如第一声道的M个块中有一个块C为非暂态块,第二声道的M个块中有一个块D为暂态块,若块C在第一声道的M个块中的索引与块D在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是不一致的。本申请实施例中当M个第一暂态标识和M个第二暂态标识不一致时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当M个第一暂态标识和M个第二暂态标识完全一致时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
在一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,因此可以统计得到第一声道包括的暂态块数量,同样的,第二声道的M个块包括暂态块和非暂态块,因此可以统计得到第二声道包括的暂态块数量。本申请实施例中当第一通道的暂态块数量与第二声道的暂态块数量不同时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当第一通道的暂态块数量与第二声道的暂态块数量相同时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
在一种实现方式中,第一声道的M个块中有的块为暂态块,第一声道的M个块中有的块为非暂态块,同样的,第二声道的M个块包括暂态块和非暂态块。M个第一暂态标识和M个第二暂态标识不一致是指M个第一暂态标识中至少有一个暂态标识和M个第二暂态标识中相同索引的暂态标识的取值不同。例如,第一声道的M个块中有1个块A为暂态块,第二声道的M个块中有1个块B为暂态块,若块A在第一声道的M个块中的索引(index)与块B在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是一致的。例如第一声道的M个块中有一个块C为非暂态块,第二声道的M个块中有一个块D为暂态块,若块C在第一声道的M个块中的索引与块D在第二声道的M个块的索引相同,则块A的第一暂态标识与块B的第二暂态标识是不一致的。第一声道的M个块中的第N块和第二声道的M个块中的第N块均为暂态,0≤N<M,第一声道的第N块的索引和第二声道的第N块的索引是相同的,N的取值大小以及N的取值个数不做限定,例如N的取值个数为1个时表示第一声道和第二声道具有相同索引的暂态块为1个,例如N的取值个数为2时表示第一声道和第二声道具有相同索引的暂态块为2个。本申请实施例中当M个第一暂态标识和M个第二暂态标识不一致、且第一声道的M个块中的第N块和第二声道的M个块中的第N块均为暂态时可以确定第一分组信息和第二分组信息满足预设的条件,此时需要进行分组信息的调整。当M个第一暂态标识和M个第二暂态标识完全一致、或者M个第一暂态标识和M个第二暂态标识不一致且第一声道和第二声道不具有相同索引的暂态块时可以确定第一分组信息和第二分组信息不满足预设的条件,此时不进行分组信息的调整。
进一步的,在本申请的一些实施例中,第一声道的M个块具有各自的索引,第二声道的M个块具有各自的索引;
当第一分组信息与第二分组信息不一致包括:M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块,M个第二暂态标识指示第二声道的M个块包括暂态块和非暂态 块,且第一声道的暂态块数量与第二声道的暂态块数量不一致时,如果第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引没有交集,步骤405根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息包括:
当第一声道的暂态块数量小于第二声道的暂态块数量时,对第一分组信息进行调整,以获得第一调整分组信息,第一调整分组信息指示的第一声道的暂态块数量与第二分组信息指示的第二声道的暂态块数量相等;
或,
当第一声道的暂态块数量大于第二声道的暂态块数量时,对第二分组信息进行调整,以获得第二调整分组信息,第二调整分组信息指示的第二声道的暂态块数量与第一分组信息指示的第一声道的暂态块数量相等。
具体的,第一声道的M个块分别具有索引,例如从0至M-1为M个块的索引,同样的,第二声道的M个块分别具有索引,例如从0至M-1为M个块的索引。第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引没有交集,即第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引完全不同。例如暂态块的暂态标识为0,非暂态块的暂态标识为1。例如M的取值为4,第一声道的4个块(索引分别为0-3)的暂态标识为1011(分别对应索引为0-3,即索引为0的块的暂态标识的值为1,索引为1的块的暂态标识的值为0,索引为2的块的暂态标识的值为1,以及索引为3的块的暂态标识的值为1),第二声道的4个块(索引分别为0-3)的暂态标识为0110(分别对应索引为0-3,即索引为0的块的暂态标识的值为0,索引为1的块的暂态标识的值为1,索引为2的块的暂态标识的值为1,以及索引为3的块的暂态标识的值为0),则第一声道和有一个暂态块,第二声道有两个暂态块,第一声道的一个暂态块的索引为1,第二声道的两个暂态块的索引为0和3,第一声道的4个块中的暂态块的索引与第二声道的4个块中的暂态块的索引没有交集。
当第一声道的暂态块数量与第二声道的暂态块数量不一致、且第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引没有交集时,则需要对暂态块数量较小的那个声道的分组信息进行调整,而暂态块数量较多的那个声道的分组信息保持不变,并且调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。其中,第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引没有交集是指,第一声道的M个块中和第二声道的M个块中同一索引对应的两个块的暂态标识均不相同,即,以M为4为例进行说明,第一声道的M个块中索引为0的块的暂态标识与第二声道的M个块中索引为0的块暂态标识不相同,第一声道的M个块中索引为1的块的暂态标识与第二声道的M个块中索引为1的块暂态标识不相同,第一声道的M个块中索引为2的块的暂态标识与第二声道的M个块中索引为2的块暂态标识不相同,并且第一声道的M个块中索引为3的块的暂态标识与第二声道的M个块中索引为3的块暂态标识也不相同。
其中,当第一声道的暂态块数量小于第二声道的暂态块数量时,对第一分组信息进行调整,以获得第一调整分组信息,具体的,第一分组信息的调整可以包括对M个块的第一 暂态标识进行调整,例如M个块中的第一块的第一暂态标识从非暂态调整为暂态,以使得第一声道的暂态块数量增加,使得第一调整分组信息中第一声道的暂态块数量(即第一声道的调整后的暂态块数量)与所述第二分组信息指示的所述第二声道的暂态块数量相等。
当第一声道的暂态块数量大于第二声道的暂态块数量时,对第二分组信息进行调整,以获得第二调整分组信息,具体的,第二分组信息的调整可以包括对M个块的第二暂态标识进行调整,例如M个块中的第二块的第二暂态标识从非暂态调整为暂态,以使得第二声道的暂态块数量增加,使得第二调整分组信息中第二声道的暂态块数量(即第二声道的调整后的暂态块数量)与所述第一分组信息指示的所述第一声道的暂态块数量相等。
进一步的,在本申请的一些实施例中,第一声道的M个块具有各自的索引,第二声道的M个块具有各自的索引;
当第一分组信息与第二分组信息不一致包括:M个第一暂态标识指示第一声道的M个块包括暂态块和非暂态块,M个第二暂态标识指示第二声道的M个块包括暂态块和非暂态块,且第一声道的暂态块数量与第二声道的暂态块数量不一致时,如果第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引有交集,步骤405根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息包括:
当M个第一暂态标识指示的暂态块的索引是M个第二暂态标识指示的暂态块的索引的一部分时,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二暂态标识指示的所有暂态块的索引相同;
当M个第二暂态标识指示的暂态块的索引是M个第一暂态标识指示的暂态块的索引的一部分时,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第二调整暂态标识指示的所有暂态块的索引与M个第一暂态标识指示的所有暂态块的索引相同;
当M个第一暂态标识指示的暂态块的索引与M个第二暂态标识指示的暂态块的索引部分相同时,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二调整暂态标识指示的所有暂态块的索引相同。
具体的,第一声道的M个块分别具有索引,例如从0至M-1为M个块的索引,同样的,第二声道的M个块分别具有索引,例如从0至M-1为M个块的索引。第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引有交集,即第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引部分相同,但是并不是完全相同。例如暂态块的暂态标识位0,非暂态块的暂态标识位1。例如M的取值为4,第一声道的4个块的暂态标识为0011,第二声道的4个块的暂态标识为0111,则第一声道有两个暂态块,第二声道有一个暂态块,第一声道的两个暂态块的索引为0和1,第二声道的一个暂态块的索引为0,第一声道的一个暂态块索引0和第二声道的一个暂态块索引0是相同的,即第一声道的4个块中的暂态块的索引与第二声道的4个块中的暂态块的索引有交集。
第一声道的M个块中的暂态块的索引与第二声道的M个块中的暂态块的索引有交集,具有多种实现方式。
在一种实现方式中,例如第一声道的暂态块数量小于第二声道的暂态块数量,即M个第一暂态标识指示的暂态块的索引是M个第二暂态标识指示的暂态块的索引的一部分,此时第一声道的M个块的第一暂态标识需要调整,第二声道的M个块的第二暂态标识保持不变,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二暂态标识指示的所有暂态块的索引相同,调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在一种实现方式中,例如第二声道的暂态块数量小于第一声道的暂态块数量,即M个第二暂态标识指示的暂态块的索引是M个第一暂态标识指示的暂态块的索引的一部分,此时第二声道的M个块的第二暂态标识需要调整,第一声道的M个块的第一暂态标识保持不变,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第二调整暂态标识指示的所有暂态块的索引与M个第一暂态标识指示的所有暂态块的索引相同,调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
在一种实现方式中,例如第二声道的暂态块数量不等于第一声道的暂态块数量,但是M个第一暂态标识指示的暂态块的索引与M个第二暂态标识指示的暂态块的索引部分相同,此处的部分相同是指第一声道的M个块中的有的暂态块的索引与第二声道的M个块中的有的暂态块的索引部分相同,但是并不是完全相同。此时第一声道的M个块的第一暂态标识需要调整,第二声道的M个块的第二暂态标识需要调整,即两个声道的M个块的暂态标识都需要调整,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识,M个第一调整暂态标识指示的所有暂态块的索引与M个第二调整暂态标识指示的所有暂态块的索引相同。调整之后两个声道的分组信息指示的暂态块数量是相同的,通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
接下来对本申请实施例中暂态标识的调整方式进行说明。例如,对M个第一暂态标识中的至少一个进行调整以获得M个第一调整暂态标识,包括:
当第一块的第一暂态标识指示第一块为非暂态块时,如果第二声道的M个块的第三块的第二暂态标识指示第三块为暂态块,将第一块的第一暂态标识调整为第一块的第一调整暂态标识,第一块的第一调整暂态标识指示第一块为暂态块,第一块的索引与第三块的索引相同;
对M个第二暂态标识中的至少一个进行调整以获得M个第二调整暂态标识包括:
当第二块的第二暂态标识指示第二块为非暂态块时,如果第一声道的M个块的第四块 的第一暂态标识指示第四块为暂态块,将第二块的第二暂态标识调整为第二块的第二调整暂态标识,第二块的第二调整暂态标识指示第二块为暂态块,第二块的索引与第四块的索引相同。
其中,M个第一暂态标识的调整与M个第二暂态标识的调整相类似,接下来以第一暂态标识的调整为例进行说明,当第一块的第一暂态标识指示第一块为非暂态块时,如果第二声道的M个块的第三块的第二暂态标识指示第三块为暂态块,将第一块的第一暂态标识调整为第一块的第一调整暂态标识,第一块的第一调整暂态标识指示第一块为暂态块,第一块的索引与第三块的索引相同。例如,第一块的第一暂态标识为1,而第三块的第二暂态标识为0,第一块的索引和第三块的索引都是4,则第一块的第一调整暂态标识为0。通过这种调整方式,可以使得第一声道和第二声道的暂态块数量相同,从而便于后续针对第一声道和第二声道的频谱进行编码。
本申请的一些实施例中,编码端执行的方法还包括:
A1.对第一调整分组信息,以及第二调整分组信息进行编码,以获得分组信息编码结果。
A2.将分组信息编码结果写入码流。
其中,编码端在获得第一调整分组信息以及第二调整分组信息之后,对第一调整分组信息,以及第二调整分组信息进行编码,获得分组信息编码结果,对于该调整分组信息所采用的编码方式,此处不做限定。通过对调整分组信息的编码,可以获得分组信息编码结果,该分组信息编码结果可以被写入到码流中,从而使得码流可以携带分组信息编码结果,以使得解码端通过解析码流得到分组信息编码结果,进行解析获得第一调整分组信息和第二调整分组信息。
需要说明的是,步骤A2和后续步骤409之间没有先后顺序,可以先执行步骤409,再执行步骤A2,也可以先执行步骤A2,再执行步骤409,或者同时执行步骤A2和步骤409,此处不做限定。
406.根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱。
其中,第一待编码频谱是当前帧的第一声道的第一待编码频谱,第一待编码频谱又可以称为第一声道的分组排列后的M个块的频谱。
以编码端获得第一调整分组信息为例,编码端获得M个块的第一调整分组信息之后,可以使用该M个块的第一调整分组信息对当前帧的M个块的频谱进行处理,第一调整分组信息可用于调整M个块的频谱在当前帧中的排列顺序,通过第一调整分组信息可以生成第一待编码频谱。
在本申请的一些实施例中,当第一调整分组数量大于1或M个第一调整暂态标识指示第一声道的M个块包括暂态块和非暂态块时,根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱包括:
根据第一调整分组信息对第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱。
以编码端获得第一调整分组信息为例,编码端获得M个块的第一调整分组信息之后,可以使用该M个块的第一调整分组信息对当前帧的M个块的频谱进行分组排列,通过对M个块的频谱进行分组排列,从而可以调整M个块的频谱在当前帧中的排列顺序。上述分组 排列是根据M个块的第一调整分组信息进行的,M个块的第一调整分组信息是根据M个块的M个暂态标识获得,上述对M个块的分组排列之后,获得分组排列后的M个块的频谱,该分组排列后的M个块的频谱是以M个块的M个暂态标识为分组排序的依据,通过分组排序可以改变M个块的频谱的编码顺序。需要说明的是,上述当前帧的M个块可以是当前帧的第一声道的M个块。
407.根据第二调整分组信息和第二声道的M个块的频谱获得第二待编码频谱。
其中,第二待编码频谱是当前帧的第二声道的第二待编码频谱,第二待编码频谱又可以称为第二声道的分组排列后的M个块的频谱。
在本申请的一些实施例中,当第二调整分组数量大于1或M个第二调整暂态标识指示第二声道的M个块包括暂态块和非暂态块时,根据第二调整分组信息和第二声道的M个块的频谱获得第二待编码频谱包括:
根据第二调整分组信息对第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱。
在本申请的一些实施例中,根据第一调整分组信息对第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
B1.将第一声道的M个块中被M个块的第一调整暂态标识指示为暂态块的频谱分到第一暂态组中,以及将第一声道的M个块中被M个块的第一调整暂态标识指示为非暂态块的频谱分到第一非暂态组中;将第一暂态组中的块的频谱排列至第一非暂态组中的块的频谱之前,以获得第一待编码频谱。
其中,编码端获得M个块的第一调整分组信息之后,对M个块基于暂态标识的不同进行分组,从而可以获得暂态组和非暂态组,接下来对M个块在当前帧的频谱中的位置进行排列,将暂态组中的块的频谱排列至非暂态组中的块的频谱之前,以获得待编码频谱。即在待编码频谱中所有暂态块的频谱位于非暂态块的频谱之前,从而能够将暂态块的频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。
或,根据第二调整分组信息对第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
B2.将第二声道的M个块中被M个块的第二调整暂态标识指示为暂态块的频谱分到第二暂态组中,以及将第二声道的M个块中被M个块的第二调整暂态标识指示为非暂态块的频谱分到第二非暂态组中;将第二暂态组中的块的频谱排列至第二非暂态组中的块的频谱之前,以获得第二待编码频谱。
在本申请的一些实施例中,根据第一调整分组信息对第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
C1.将第一声道的M个块中被M个块的第一调整暂态标识指示为暂态块的频谱排列至第一声道的M个块中被M个块的第一调整暂态标识指示为非暂态块的频谱之前,以获得第一待编码频谱。
其中,编码端获得M个块的第一调整分组信息之后,根据该第一调整分组信息确定M个块中每个块的暂态标识,先从M个块中找到P个暂态块以及Q个非暂态块,则M=P+Q。 将M个块中被M个第一调整暂态标识指示为暂态块的频谱排列至M个块中被M个暂态标识指示为非暂态块的频谱之前,以获得待编码频谱。即在待编码频谱中所有暂态块的频谱位于非暂态块的频谱之前,从而能够将暂态块的频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。
或,根据第二调整分组信息对第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
将第二声道的M个块中被M个块的第二调整暂态标识指示为暂态块的频谱排列至第二声道的M个块中被M个块的第二调整暂态标识指示为非暂态块的频谱之前,以获得第二待编码频谱。
408.利用编码神经网络对第一待编码频谱和第二待编码频谱进行编码,以获得频谱编码结果;
409.将频谱编码结果写入码流。
在本申请实施例中,编码端获得第一待编码频谱和第二待编码频谱之后,可以使用编码神经网络进行编码,以生成频谱编码结果,再将该频谱编码结果写入到码流中,编码端可以向解码端发送该码流。
其中,一种可实现的方式是编码端以待编码频谱作为编码神经网络的输入数据,或者还可以对待编码频谱进行其它处理,然后作为编码神经网络的输入数据。经过编码神经网络处理之后,可以生成潜在变量(latent variables),潜在变量表示分组排列后的M个块的频谱的特征。
在本申请的一些实施例中,步骤408利用编码神经网络对第一待编码频谱和第二待编码频谱进行编码之前,编码端执行的方法还包括:
D1.对第一待编码频谱进行组内交织处理,以获得组内交织处理后的第一频谱;
D2.对第二待编码频谱进行组内交织处理,以获得组内交织处理后的第二频谱;
在这种实现场景下,步骤408利用编码神经网络对第一待编码频谱和第二待编码频谱进行编码,包括:
E1.利用编码神经网络对组内交织处理后的第一频谱和组内交织处理后的第二频谱进行编码。
其中,编码端在获得待编码频谱(例如第一待编码频谱和第二待编码频谱)之后,可以先根据各个声道的M个块的分组进行组内的交织处理,从而获得组内交织处理后的M个块的频谱。则组内交织处理后的M个块的频谱可以是编码神经网络的输入数据。上述当前帧的M个块可以是当前帧的第一声道的M个块。通过组内交织处理,还可以减少编码的边信息,提高编码效率。
在本申请的一些实施例中,第一声道的M个块中被M个第一暂态标识指示为暂态块的数量为P个,第一声道的M个块中被M个第一暂态标识指示为非暂态块的数量为Q个,M=P+Q;本申请实施例中对P和Q的取值不做限定。
具体的,步骤D1对第一待编码频谱进行组内交织处理,包括:
D11.对P个块的频谱进行交织处理,以获得P个块的交织处理后的频谱;
D12.对Q个块的频谱进行交织处理,以获得Q个块的交织处理后的频谱。
其中,对P个块的频谱进行交织处理包括将所述P个块的频谱作为一个整体来进行交织处理;同理,对Q个块的频谱进行交织处理包括将所述Q个块的频谱作为一个整体来进行交织处理。
需要说明的是,若第一声道的M个块的调整分组数量为1,则需要对第一声道的M个块的频谱进行组内交织处理,以获得第一声道的M个块的组内交织处理后的频谱。
在执行步骤D11和D12的情况下,步骤E1利用编码神经网络对组内交织处理后的第一频谱和组内交织处理后的第二频谱进行编码,包括:
利用编码神经网络对P个块的交织处理后的频谱、Q个块的交织处理后的频谱进行编码。
其中,在D11至D12中,编码端可以根据暂态组和非暂态组分别进行交织处理,从而可以获得P个块的交织处理后的频谱和Q个块的交织处理后的频谱。P个块的交织处理后的频谱、Q个块的交织处理后的频谱可以作为编码神经网络的输入数据。通过组内交织处理,还可以减少编码的边信息,提高编码效率。
在本申请的一些实施例中,步骤401根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识前,编码端执行的方法还包括:
F1.获得第一声道的第一窗类型,第一窗类型为短窗类型或非短窗类型;
F2.获得第二声道的第二窗类型,第二窗类型为短窗类型或非短窗类型;
F3.当第一窗类型和第二窗类型均为短窗类型时,才执行根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识的步骤。
在编码端执行401之前,编码端可以先确定当前帧的窗类型,该窗类型可以为短窗类型或非短窗类型,例如编码端根据待编码多声道信号的当前帧确定窗类型。其中,短窗又可以称为短帧,非短窗又可以称为非短帧。当窗类型为短窗类型时,触发执行前述步骤401。本申请实施例中在当前帧的窗类型为短窗类型时执行前述的编码方案,实现在多声道信号为暂态信号时的编码。
在本申请的一些实施例中,编码端执行前述步骤F1至F3的情况下,编码端执行的方法还包括:
G1.对第一窗类型和第二窗类型进行编码以获得窗类型编码结果;
G2.将窗类型编码结果写入码流。
其中,编码端在获得当前帧的第一声道的第一窗类型和第二声道的第二窗类型之后,可以在码流中携带该窗类型,首先对该窗类型进行编码,对于该窗类型所采用的编码方式,此处不做限定。通过对窗类型的编码,可以获得窗类型编码结果,该窗类型编码结果可以被写入到码流中,从而使得码流可以携带窗类型编码结果。使得解码端可以通过码流获得窗类型编码结果,解析窗类型编码结果获得当前帧的第一声道的第一窗类型和第二声道的第二窗类型。根据第一声道的第一窗类型和第二声道的第二窗类型确定是否对码流进行继续解码,以获得第一声道的M个块的第一解码分组信息。
在本申请的一些实施例中,步骤401根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识,包括:
H1.根据第一声道的M个块的频谱获得第一声道的M个块的M个第一频谱能量;
H2.根据M个第一频谱能量获得第一声道的M个块的第一频谱能量平均值;
H3.根据M个第一频谱能量与第一频谱能量平均值获得M个第一暂态标识。
其中,编码端获得M个频谱能量之后,可以将M个频谱能量进行平均,以获得频谱能量平均值,或者将M个频谱能量中的最大值或最大的若干个值剔除之后,再进行平均,以获得频谱能量平均值。通过M个频谱能量中每个块的频谱能量与频谱能量平均值进行比较,以确定每个块的频谱相比于M个块中其它块的频谱的变化情况,进而获得M个块的M个暂态标识,其中,一个块的暂态标识可以用于表示一个块的暂态特征。上述当前帧的M个块可以是当前帧的第一声道的M个块。本申请实施例通过每个块的频谱能量与频谱能量平均值可以确定出每个块的暂态标识,使得一个块的暂态标识能够确定该块的分组信息。
进一步的,在本申请的一些实施例中,当第一块的第一频谱能量大于第一频谱能量平均值的K倍时,第一块的第一暂态标识指示第一块为暂态块;或,
当第一块的第一频谱能量小于或等于第一频谱能量平均值的K倍时,第一块的暂态标识指示第一块为非暂态块;
其中,K为大于或等于1的实数。
其中,K的取值有多种,此处不做限定。以M个块中第一块的暂态标识的确定过程为例,当第一块的频谱能量大于频谱能量平均值的K倍时,说明第一块相较于M个块的其它块,频谱变化过大,此时第一块的暂态标识指示第一块为暂态块。当第一块的频谱能量小于或等于频谱能量平均值的K倍时,说明第一块相较于M个块的其它块,频谱变化不大,第一块的暂态标识指示第一块为非暂态块。上述当前帧的M个块可以是当前帧的第一声道的M个块。
不限定的是,编码端还可以根据其它方式获得M个块的M个暂态标识,例如获得第一块的频谱能量与频谱能量平均值的差值或者比例值,根据获得的差值或者比例值来确定M个块的M个暂态标识。
通过前述实施例对编码端的举例说明可知,待编码多声道信号的当前帧包括第一声道和第二声道,每个声道包括M个块的频谱,根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得第一声道的M个块的M个第一暂态标识,根据M个第一暂态标识获得第一声道的M个块的第一分组信息,同样的方式可以获得第二声道的M个块的第二分组信息,当第一分组信息和第二分组信息满足预设条件时,根据第一分组信息和第二分组信息获得第一调整分组信息和第二调整分组信息;接下来根据第一调整分组信息和第一声道的M个块的频谱获得第一待编码频谱,同样的可以获得第二待编码频谱,最后利用编码神经网络对第一待编码频谱和第二待编码进行编码,获得了频谱编码结果,通过码流可以携带该频谱编码结果。因此本申请实施例中根据当前帧的各个声道的M个暂态标识获得了各个声道的M个块的分组信息,在各个声道的M个块的分组信息满足预设条件时获得各个声道的M个块的调整分组信息,根据各个声道的M个块的调整分组信息和每个声道的M个块的频谱获得待编码频谱,从而能够实现针对不同暂态标识的块进行分组、调整以及编码,提高对多声道信号的编码质量。
本申请实施例还提供一种多声道信号的解码方法,该方法可以由终端设备执行,例如 该终端设备可以是一种多声道信号的解码装置(如下简称解码端或者解码器,例如该解码端可以是AI解码器)。如图5所示,对本申请实施例中解码端执行的方法主要包括:
501.从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,第一解码分组信息用于指示第一声道的M个块的第一解码暂态标识。
解码端接收编码端发送的码流,编码端在码流中携带分组信息编码结果,解码端解析该码流可以获得音频信号的当前帧的M个块的第一解码分组信息。解码端根据该M个块的第一解码分组信息可以确定M个块的M个第一解码暂态标识。例如第一解码分组信息可以包括:分组数量和分组标志信息。又如,分组信息可以包括分组标志信息,详见前述编码端的实施例说明。
需要说明的是,第一解码分组信息是解码端对码流进行解码得到的分组信息,例如编码端在码流中携带第一调整分组信息,则解码端得到的第一解码分组信息对应于前述的第一调整分组信息。第一解码分组信息用于指示第一声道的M个块的第一解码暂态标识,第一解码暂态标识对应于编码端的第一暂态标识或者第一调整暂态标识。同样的,后续步骤中得到第二解码分组信息对应于前述的第二调整分组信息,第二解码暂态标识对应于编码端的第二暂态标识或者第二调整暂态标识。
502.从码流中获得当前帧的第二声道的M个块的第二解码分组信息,第二解码分组信息用于指示第二声道的M个块的第二解码暂态标识。
503.利用解码神经网络对码流进行解码,以获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱。
其中,解码端获得码流之后,利用解码神经网络对码流进行解码,获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱,由于编码端对第一声道的M个块的解码频谱和第二声道的M个块的解码频谱进行分组排列后进行了编码,编码端在码流中携带频谱编码结果,该第一声道的M个块的解码频谱和第二声道的M个块的解码频谱对应于编码端的分组排列后的第一声道的M个块的频谱和第二声道的M个块的频谱,其中,解码端的解码神经网络与编码端的编码神经网络的执行过程相逆,通过解码,可以获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱。
504.根据第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号。
第一声道的M个块的第一解码频谱对应于编码端的分组排列后的第一声道的M个块的频谱,因此可以通过第一解码分组信息获得第一声道的第一重构信号。在进行信号重建时,可以根据多声道信号中具有不同暂态标识的块进行解码和重构,因此能够提高多声道信号的重建效果。
505.根据第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号。
第二声道的M个块的第二解码频谱对应于编码端的分组排列后的第二声道的M个块的频谱,因此可以通过第二解码分组信息获得第二声道的第二重构信号。在进行信号重建时,可以根据多声道信号中具有不同暂态标识的块进行解码和重构,因此能够提高多声道信号的重建效果。
在本申请的一些实施例中,根据第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,包括:
当第一解码分组信息指示第一声道的M个块的第一解码分组数量大于1时,对第一声道的M个块的解码频谱进行逆分组排列处理,以获得第一声道的M个块的逆分组排列处理后的频谱;
根据第一声道的M个块的逆分组排列处理后的频谱获得第一声道的第一重构信号;
根据第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号包括:
当第二解码分组信息指示第二声道的M个块的第二解码分组数量大于1时,对第二声道的M个块的解码频谱进行逆分组排列处理,以获得第二声道的M个块的逆分组排列处理后的频谱;
根据第二声道的M个块的逆分组排列处理后的频谱获得第二声道的第二重构信号。
以第一声道的信号重构过程为例,解码端获得M个块的第一解码分组信息,解码端通过码流还获得第一声道的M个块的解码频谱,由于编码端对第一声道的M个块的解码频谱进行了分组排列处理,在解码端需要执行与编码端相逆的流程,因此根据M个块的第一解码分组信息对第一声道的M个块的解码频谱进行逆分组排列处理,以获得第一声道的M个块的逆分组排列处理的频谱,该逆分组排列处理与编码端的分组排列处理相逆。
编码端在获得第一声道的M个块的逆分组排列处理的频谱之后,可以通过对第一声道的M个块的逆分组排列处理的频谱进行频域到时域的变换,以此获得第一声道的第一重构信号。
其中,第二声道的解码过程的实现方式,与前述对第一声道进行解码的过程相类似,此处不再赘述。
在本申请的一些实施例中,步骤504根据第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,包括:
I1.对第一声道的M个块的解码频谱进行组内解交织处理,以获得第一声道的M个块的组内解交织处理后的频谱;
J1.根据第一声道的M个块的组内解交织处理后的频谱获得第一重构信号。
其中,解码端执行的组内解交织为编码端的组内交织的逆过程,此处不再详细说明。
步骤505根据第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号,包括:
对第二声道的M个块的解码频谱进行组内解交织处理,以获得第二声道的M个块的组内解交织处理后的频谱;
根据第二声道的M个块的组内解交织处理后的频谱获得第二重构信号。
在本申请的一些实施例中,第一声道的M个块中被M个第一解码暂态标识指示为暂态块的数量为P个,第一声道的M个块中被M个第一解码暂态标识指示为非暂态块的数量为Q个,其中,M=P+Q;
根据第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,包括:
对第一声道的P个块的解码频谱进行组内解交织处理和对第一声道的Q个块的解码频谱进行组内解交织处理,以获得第一声道的M个块的组内解交织处理后的频谱;
根据第一解码分组信息对第一声道的M个块的组内解交织处理后的频谱进行逆分组排列处理,以获得第一声道的M个块的逆分组排列处理后的频谱;
根据第一声道的M个块的逆分组排列处理后的频谱获得第一声道的第一重构信号。
其中,对P个块的频谱进行解交织处理包括将所述P个块的频谱作为一个整体来进行解交织处理;同理,对Q个块的频谱进行解交织处理包括将所述Q个块的频谱作为一个整体来进行解交织处理。
其中,编码端可以根据暂态组和非暂态组分别进行交织处理,从而可以获得P个块的交织处理后的频谱和Q个块的交织处理后的频谱。P个块的交织处理后的频谱、Q个块的交织处理后的频谱可以作为编码神经网络的输入数据。通过组内交织处理,还可以减少编码的边信息,提高编码效率。由于编码端进行了组内交织,解码端需要执行相应的逆过程,即解码端可以进行解交织处理。
需要说明的是,若第一声道的M个块的调整分组数量为1,则需要对第一声道的M个块的解码频谱进行组内解交织处理,以获得第一声道的M个块的组内解交织处理后的频谱。
在本申请的一些实施例中,第一声道的M个块中被M个第一解码暂态标识指示为暂态块的数量为P个,第一声道的M个块中被M个第一解码暂态标识指示为非暂态块的数量为Q个,M=P+Q;
根据第一解码分组信息对第一声道的M个块的解码频谱进行逆分组排列处理,包括:
K1.根据第一解码分组信息获得第一声道的P个块的索引;
K2.根据第一解码分组信息获得第一声道的Q个块的索引;
K3.根据P个块的索引和Q个块的索引对第一声道的M个块的解码频谱进行逆分组排列处理。
其中,编码端对M个块的频谱进行分组排列之前,M个块的索引是连续的,例如从0至M-1。当编码端进行分组排列之后,M个块的索引不再连续。解码端根据M个块的第一解码分组信息可以获得重构的分组排列后的M个块中的P个块的索引、重构的分组排列后的M个块中的Q个块的索引,通过逆分组排列处理,可以恢复出M个块的索引仍是连续的。
在本申请的一些实施例中,解码端执行的方法还包括:
L1.从码流中获得当前帧的第一声道的窗类型;
L2.从码流中获得当前帧的第二声道的窗类型;
L2.当第一窗类型和第二窗类型均为短窗类型时,才执行从码流中获得当前帧的第一声道的M个块的第一解码分组信息的步骤。
其中,本申请实施例中只有在当前帧的第一窗类型和第二窗类型均为短窗类型时可以执行前述的编码方案,实现在多声道信号为暂态信号时的编码。解码端执行与编码端相逆的过程,因此解码端也可以先确定当前帧的第一窗类型和第二窗类型,该窗类型可以为短窗类型或非短窗类型,例如解码端从码流中获得当前帧的窗类型,当前帧包括第一声道和第二声道,则可以获得第一声道的第一窗类型和第二声道的第二窗类型。其中,短窗又可以称为短帧,非短窗又可以称为非短帧。当窗类型为短窗类型时,触发执行前述步骤501。
在本申请的一些实施例中,第一解码分组信息包括:第一声道的M个块的第一解码分组数量或第一解码分组数量标识,第一解码分组数量标识用于指示第一解码分组数量,当第一解码分组数量大于1时,第一解码分组信息还包括:M个第一解码暂态标识;或者,第一解码分组信息包括:M个第一解码暂态标识;
和/或,
第二解码分组信息包括:第二声道的M个块的第二解码分组数量或第二解码分组数量标识,第二解码分组数量标识用于指示第二解码分组数量,当第二解码分组数量大于1时,第二解码分组信息还包括:M个第二解码暂态标识;或者,第二解码分组信息包括:M个第二解码暂态标识。
其中,编码端在码流中携带分组信息编码结果,该分组信息编码结果包括第一调整分组信息和第二调整分组信息,解码端通过解码码流可以得到第一解码分组信息和第二解码分组信息,第一解码分组信息对应于编码端的第一调整分组信息,第二解码分组信息对应于编码端的第二调整分组信息。例如,第一解码分组信息包括:第一声道的M个块的第一解码分组数量或第一解码分组数量标识,第一解码分组数量表示第一声道的分组数量或者调整分组数量,第一解码分组数量标识用于指示第一声道的分组数量或者调整分组数量。M个第一解码暂态标识用于指示第一声道的M个块分别对应的暂态标识或者调整暂态标识。同样的,第二解码分组信息与第一解码分组信息的说明相类似,此处不再赘述。
通过前述实施例对解码端的举例说明可知,从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识,同样的方式从码流中获得第二声道的M个块的第二解码分组信息,利用解码神经网络对码流进行解码,以获得第一声道的M个块的解码频谱和第二声道的M个块的解码频谱;利用第一解码分组信息和第一声道的M个块的解码频谱获得第一声道的第一重构信号,同样的,利用第二解码分组信息和第二声道的M个块的解码频谱获得第二声道的第二重构信号。解码码流时获得的第一声道的M个块的第一解码频谱和第二声道的M个块的第二解码频谱分别对应编码端的分组排列后的第一声道的M个块的频谱和分组排列后的第二声道的M个块的频谱,因此可以通过第一解码分组信息和第二解码分组信息获得第一声道的第一重构信号和第二声道的第二重构信号。在进行信号重建时,可以根据多声道信号中不同暂态标识的块进行解码和重构,因此能够提高多声道信号的重建效果。
为便于更好的理解和实施本申请实施例的上述方案,下面举例相应的应用场景来进行具体说明。
如图6所示,为本申请实施例提供的在广播电视领域应用的系统架构的示意图,本申请实施例也可以应用于广播电视的直播场景和后期制作场景,或应用于终端媒体播放中的三维声编解码器。
在直播场景下,直播节目三维声制作出的三维声信号经过应用本申请实施例的三维声编码获得码流,经广电网络传输到用户侧,由机顶盒中的三维声解码器进行解码重建三维声信号,由扬声器组进行回放。后期制作场景下,后期节目三维声制作出的三维声信号经过应用本申请实施例的三维声编码获得码流,经广电网络或者互联网传输到用户侧,由网络接收器或者移动终端中的三维声解码器进行解码重建三维声信号,由扬声器组或者耳机 进行回放。
本申请实施例提供音频编解码器,音频编解码器具体可以包括无线接入网、核心网的媒体网关、转码设备、媒体资源服务器等,移动终端、固网终端等。还可以应用于广播电视或终端媒体播放、VR streaming服务中的音频编解码器。
接下来分别对本申请实施例中编码端和解码端的应用场景进行说明。
如图7所示,应用本申请实施例提出的编码器执行如下的多声道信号的编码方法,包括:
S11.确定当前帧的窗类型。
获得当前帧的音频信号,根据当前帧的音频信号确定当前帧的窗类型,并将窗类型写入码流。
一种具体的实现方式包括如下三个步骤:
1).将待编码音频信号进行分帧处理,获得当前帧的音频信号。
例如,当前帧的帧长为L个样点,则当前帧的音频信号为L点时域信号。
2).根据当前帧的音频信号进行暂态检测,确定当前帧的暂态信息。
进行暂态检测的方法有多种,本申请实施例不做限定。当前帧的暂态信息可以包括当前帧是否为暂态信号的标识、当前帧暂态发生的位置以及表征暂态程度的参数中的一种或多种。其中,暂态程度可以是暂态能量高低,或者是暂态发生位置的信号能量与相邻的非暂态位置的信号能量比。
3).根据当前帧的暂态信息,确定当前帧的窗类型,对所述当前帧的窗类型进行编码并将编码结果写入码流。
如果当前帧的暂态信息表征了当前帧为暂态信号,则当前帧的窗类型为短窗。
如果当前帧的暂态信息表征了当前帧为非暂态信号,则当前帧的窗类型为不包括短窗在内的其他窗类型。本申请实施例对其他窗类型不做限定,例如其他窗类型可以包括:长窗、切入窗、切出窗等。
S12.若当前帧的窗类型为短窗,对当前帧的音频信号进行短窗的加窗处理并进行时频变换,获得所述当前帧的M个块的MDCT频谱。
若当前帧的窗类型为短窗,对当前帧的音频信号进行短窗的加窗处理并进行时频变换,获得M个块的MDCT频谱。
例如,若当前帧的窗类型为短窗,使用M个叠接的短窗窗函数进行加窗处理,获得加窗后的M个块的音频信号,M为大于等于2的正整数。例如,短窗窗函数的窗长为2L/M,L为当前帧的帧长,叠接长度为L/M。例如,M等于8,L等于1024,短窗窗函数的窗长为256个样点,叠接长度为128个样点。
对加窗后的M个块的音频信号分别进行时频变换,获得当前帧的M个块的MDCT频谱。
例如,当前块的加窗后的音频信号的长度为256个样点,经过MDCT变换后,获得128点MDCT系数,即为当前块的MDCT频谱。
S13.根据M个块的MDCT频谱,获得当前帧的分组数量和分组标志信息,对所述当前帧的分组数量和分组标志信息进行编码并将编码结果写入码流。
在步骤S13获得当前帧的分组数量和分组标志信息之前,在一种实现方式中:首先, 对M个块的MDCT频谱进行交织处理,获得交织后的M个块的MDCT频谱;接下来,对交织后的M个块的MDCT频谱进行编码预处理操作,获得预处理的MDCT频谱;然后对预处理的MDCT频谱进行解交织处理,获得解交织处理的M个块的MDCT频谱;最后,根据解交织处理的M个块的MDCT频谱确定当前帧的分组数量和分组标志信息。
对M个块的MDCT频谱进行交织处理,是将M个长度为L/M的MDCT频谱交织为长度为L的MDCT频谱。将M个块的MDCT频谱中频点位置为i的M个频谱系数按照所在块的序号从0到M-1顺序排列在一起,然后将M个块的MDCT频谱中频点位置为i+1的M个频谱系数按照所在块的序号从0到M-1顺序排列在一起,i的取值为从0开始直到L/M-1。
其中,编码预处理操作可以包括:频域噪声整形(frequency domain noise shaping,FDNS)、时域噪声整形(temporal noise shaping,TNS)以及带宽扩展(bandwidth extension,BWE)等处理,这里不做限定。
解交织处理为交织处理的逆过程。预处理的MDCT频谱长度为L,将长度为L的预处理的MDCT频谱分成M个长度为L/M的MDCT频谱,每个块中的MDCT频谱按照频点从小到大排列,即可获得解交织处理的M个块的MDCT频谱。在对交织处理的频谱进行预处理,可以减少编码边信息,从而减少边信息的比特占用,提高编码效率。
根据解交织处理的M个块的MDCT频谱确定当前帧的分组数量和分组标志信息。具体方法包括如下3个步骤:
a).计算M个块的MDCT频谱能量。
假设解交织处理的M个块的MDCT频谱系数为mdctSpectrum[8][128],计算各个块的MDCT频谱能量,记为enerMdct[8]。其中,8为M的取值,128表示一个块中的MDCT系数的个数。
b).根据M个块的MDCT频谱能量,计算MDCT频谱能量的平均值。主要包括如下两种方法:
方法一:直接计算M个块的MDCT频谱能量的平均值,即enerMdct[8]的平均值,作为MDCT频谱能量的平均值avgEner。
方法二:确定M个块中MDCT频谱能量最大的块;计算除能量最大的1个块之外其他M-1个块的MDCT频谱能量的平均值,作为MDCT频谱能量的平均值avgEner。或者计算除能量最大的若干个块之外其他块的MDCT频谱能量的平均值,作为MDCT频谱能量的平均值avgEner。
c).根据M个块的MDCT频谱能量与MDCT频谱能量的平均值,确定当前帧的分组数量和分组标志信息,写入码流。
具体可以是:将各个块的MDCT频谱能量与MDCT频谱能量的平均值进行比较。如果当前块的MDCT频谱能量大于MDCT频谱能量的平均值的K倍,则当前块为暂态块,当前块的暂态标识为0;否则,当前块为非暂态块,当前块的非暂态标识为1。其中,K大于等于1,例如K=2。根据各个块的暂态标识,将M个块进行分组,确定分组数量和分组标志信息。其中,暂态标识值相同的为一组,M个块被分成N个组,N就是分组数量。分组标志信息为M个块中每个块的暂态标识值构成的信息。
例如,暂态块构成暂态组,非暂态块构成非暂态组。具体可以是:如果各个块的暂态 标识不完全相同,则当前帧的分组数量numGroups为2,否则为1。分组数量可以由分组数量标识来表示。例如,分组数量标识为1,表示当前帧的分组数量为2;分组数量标识为0,表示当前帧的分组数量为1。根据M个块的暂态标识确定当前帧的分组标志信息groupIndicator。例如,将M个块的暂态标识顺序排列构成当前帧的分组标志信息groupIndicator。
在步骤S13获得分组数量和分组标志信息之前,另一种实现方式是:不对M个块的MDCT频谱进行交织处理和解交织处理,直接根据M个块的MDCT频谱确定当前帧的分组数量和分组标志信息,对所述当前帧的分组数量和分组标志信息进行编码并将编码结果写入码流。
根据M个块的MDCT频谱确定当前帧的分组数量和分组标志信息,与根据解交织后的M个块的MDCT频谱确定当前帧的分组数量和分组标志信息类似,这里不再赘述。
将当前帧的分组数量和分组标志信息,写入码流。
此外,非暂态组还可以进一步分成两个或两个以上的其他组,本申请实施例不做限定。例如,非暂态组可以分成谐波组和非谐波组。
S14.根据当前帧的分组数量和分组标志信息对M个块的MDCT频谱进行分组排列,获得分组排列的MDCT频谱。该分组排列的MDCT频谱即为当前帧的待编码频谱。
如果当前帧的分组数量为2,则需要对当前帧的M个块的音频信号频谱进行分组排列。排列的方式为:将M个块中属于暂态组的若干个块调整到前面,属于非暂态组的若干个块调整到后面。其中,编码器的编码神经网络对于排在前面的频谱会有更好的编码效果,因此将暂态块调整到前面可以确保暂态块的编码效果,从而保留更多的暂态块的频谱细节,提升编码质量。
根据当前帧的分组数量和分组标志信息对当前帧的M个块的MDCT频谱进行分组排列,也可以是根据当前帧的分组数量和分组标志信息对当前帧解交织后的M个块的MDCT频谱进行分组排列。
S15.利用编码神经网络对分组排列的MDCT频谱进行编码,写入码流。
分组排列的MDCT频谱先进行组内交织处理,获得组内交织的MDCT频谱。然后,再利用编码神经网络,对组内交织的MDCT频谱进行编码。组内交织处理与前述获得分组数量和分组标志信息之前对M个块的MDCT频谱进行的交织处理类似,只是交织的对象为属于同一分组内的MDCT频谱。例如,对属于暂态组的MDCT频谱块进行交织处理。对属于非暂态组的MDCT频谱块进行交织处理。
编码神经网络处理是预先训练好的,本申请实施例对编码神经网络的具体网络结构和训练方法不做限定。例如编码神经网络可以选择全连接网络或者卷积神经网络(convolutional neural networks,CNN)。
如图8所示,与编码端对应的解码流程,包括:
S21.根据接收到的码流解码,获得当前帧的窗类型。
S22.若当前帧的窗类型为短窗,则根据接收到的码流解码,获得分组数量和分组标志信息。
可以解析码流中的分组数量标识信息,根据分组数量标识信息确定当前帧的分组数量。例如,分组数量标识为1,表示当前帧的分组数量为2;分组数量标识为0,表示当前帧的 分组数量为1。
如果当前帧的分组数量大于1,则可以根据接收到的码流解码,获得分组标志信息。
根据接收到的码流解码,获得分组标志信息,可以是:从码流中读取M比特的分组标志信息。根据分组标志信息的第i个比特位的值可以确定第i个块是否为暂态块。若第i个比特位的值为0,表示第i个块为暂态块;第i个比特位的值为1,表示第i个块为非暂态块。
S23.根据接收到的码流,利用解码神经网络,获得解码MDCT频谱。
解码端的解码流程与编码端的编码流程相对应。具体步骤包括:
首先,根据接收到的码流解码,利用解码神经网络,获得解码MDCT频谱。
然后,根据分组数量和分组标志信息,可以确定属于同一分组的解码MDCT频谱。对属于同一分组的MDCT频谱进行组内解交织处理,获得组内解交织处理的MDCT频谱。该组内解交织处理的过程与编码端获得分组数量和分组标志信息之前对交织处理的M个块的MDCT频谱的解交织处理相同。
S24.根据分组数量和分组标志信息,对组内解交织处理的MDCT频谱进行逆分组排列处理,获得逆分组排列处理的MDCT频谱。
如果当前帧的分组数量大于1,则需要根据分组标志信息对组内解交织处理的MDCT频谱进行逆分组排列处理。解码端的逆分组排列处理是编码端分组排列处理的逆过程。
例如,假设组内解交织处理的MDCT频谱是由M个L/M点的MDCT频谱块构成。根据分组标志信息确定第i个暂态块的块索引idx0(i),将组内解交织处理的MDCT频谱中第i个块的MDCT频谱作为逆分组排列处理的MDCT频谱中的第idx0(i)个块的MDCT频谱。第i个暂态块的块索引idx0(i)为分组标志信息中第i个标志值为0的块对应的块索引,i从0开始。暂态块的数量为分组标志信息中标志值为0的比特位的数量,记作num0。在处理完暂态块后,需要对非暂态块进行处理。根据分组标志信息确定第j个非暂态块的块索引idx1(j),将组内解交织处理的MDCT频谱中第num0+j个块的MDCT频谱作为逆分组排列处理的MDCT频谱中的第idx1(j)个块的MDCT频谱。第j个非暂态块的块索引idx1(j)为分组标志信息中第j个标志值为1的块对应的块索引,j从0开始。
S25.根据逆分组排列处理的MDCT频谱,获得当前帧的重构音频信号。
根据逆分组排列处理的MDCT频谱,获得重构音频信号,一种具体的实现方式是:首先,对逆分组排列处理的M个块的MDCT频谱进行交织处理,获得M个块的交织处理的MDCT频谱;接下来,对M个块的交织处理的MDCT频谱进行解码后处理操作,例如解码后处理可以包括逆TNS、逆FDNS、BWE处理等等,解码后处理跟编码端的编码预处理方式一一对应,获得解码后处理的MDCT频谱;然后对解码后处理的MDCT频谱进行解交织处理,获得M个块的解交织处理的MDCT频谱;最后,分别对M个块的解交织处理的MDCT频谱进行频域到时域的变换,并进行去加窗及叠接相加处理后,获得重构音频信号。
根据逆分组排列处理的MDCT频谱,获得重构音频信号的另一种具体的实现方式是:分别对M个块的MDCT谱进行频域到时域的变换,并进行去加窗及叠接相加处理后,获得重构音频信号。
如图9所示,编码端执行的多声道信号的编码方法包括:
S31.对输入信号进行分帧处理,获得当前帧的输入信号。
例如,帧长为1024,当前帧的输入信号为1024点音频信号。
S32.根据获得当前帧的输入信号进行暂态检测,获得暂态检测结果。
例如,将当前帧的输入信号分为L个块,计算每个块中的信号能量,如果相邻块中的信号能量发生突变,则认为当前帧为暂态信号。例如,L为大于2的正整数,可以取L=8。如果相邻块中的信号能量之间的差异大于预先设定的阈值,则认为当前帧为非暂态信号。
S33.根据暂态检测结果,确定当前帧的窗类型。
如果当前帧的暂态检测结果为暂态信号,则当前帧的窗类型为短窗,否则为长窗。
当前帧的窗类型除了短窗和长窗,还可以增加切入窗和切出窗。设当前帧的帧序号为i,根据i-1帧和i-2帧的暂态检测结果和当前帧的暂态检测结果,确定当前帧的窗类型。
如果第i帧、第i-1帧和第i-2帧的暂态检测结果均为非暂态信号,则第i帧的窗类型为长窗。
如果第i帧的暂态检测结果为暂态信号,第i-1帧和第i-2帧的暂态检测结果为非暂态信号,则第i帧的窗类型为切入窗。
如果第i帧和第i-1帧的暂态检测结果为非暂态信号,第i-2帧的暂态检测结果为暂态信号,则第i帧的窗类型为切出窗。
如果第i帧、第i-1帧和第i-2帧的暂态检测结果为除以上三种情况外的其他情况,则第i帧的窗类型为短窗。
S34.根据当前帧的窗类型,进行加窗及时频变换处理,获得当前帧的MDCT频谱。
根据长窗、切入窗、切出窗和短窗类型,分别进行加窗和MDCT变换:对长窗、切入窗、切出窗,若加窗后信号长度为2048,则获得1024个MDCT系数;对短窗,则加8个叠接的长度为256的短窗,每个短窗获得128个MDCT系数,将每个短窗的128点MDCT系数称为一个块,共1024个MDCT系数。
确定当前帧的窗类型是否为短窗,若是,执行如下步骤S35,若不是,执行如下步骤S312。
S35.若当前帧的窗类型为短窗,对当前帧的MDCT频谱进行交织处理,获得交织后的MDCT频谱。
若当前帧的窗类型为短窗,将8个块的MDCT频谱进行交织处理,即将8个128维度的MDCT频谱交织为长度1024的MDCT频谱。
交织后频谱形式可以是:block 0 bin 0,block 1 bin 0,block 2 bin 0,…,block 7 bin 0,block 0 bin 1,block 1,bin 1,block 2 bin 1,…,block 7 bin 1,…。
其中,block 0 bin 0表示第0个块的第0个频点。
S36.对交织后的MDCT频谱进行编码预处理,获得预处理的MDCT频谱。
预处理可以包括FDNS、TNS、BWE等处理。
S37.对预处理的MDCT频谱进行解交织处理,获得M个块的MDCT频谱。
按与步骤S35相反的方式进行解交织,获得8个块的MDCT频谱,其中,每个块128点。
S38.根据M个块的MDCT频谱,确定分组信息。
信息可以包括分组数量numGroups和分组标志信息groupIndicator。根据M个块的MDCT频谱,确定分组信息的具体方案可以是编码端执行的前述步骤S13中的任何一种。例如,设短帧中8个块的MDCT频谱系数为mdctSpectrum[8][128],则计算各个块的MDCT频谱能量,记为enerMdct[8]。计算8个块的MDCT频谱能量的平均值,记为avgEner,此处有两种计算MDCT频谱能量的平均值的方法:
方法1:直接计算8个块MDCT频谱能量的平均值,即enerMdct[8]的平均值。
方法2:为了减少8个块中能量最大的块对平均值计算的影响,可以将最大块能量去除后,再计算平均值。
将各个块的MDCT频谱能量与平均能量比较,若大于平均能量的若干倍,则认为当前块是暂态块(标记为0),否则认为当前块是非暂态块(标记为1),所有暂态块构成暂态组,所有非暂态块构成非暂态组。
例如,当前帧的窗类型为短窗,初步判断所得的分组信息可以是:
分组数量numGroups:2。
Block索引:0 1 2 3 4 5 6 7。
分组标志信息groupIndicator:1 1 1 0 0 0 0 1。
分组数量和分组标志信息需要写入码流,传输到解码端。
S39.根据分组信息,对M个块的MDCT频谱进行分组排列,获得分组排列后的MDCT频谱。
根据分组信息对M个块的MDCT频谱进行分组排列的具体方案可以是编码端执行的前述步骤S14中的任何一种。
例如,将短帧的8个块中属于暂态组的若干个块放置到前面,属于其他组的若干个块放置到后面。
仍以步骤S38中的举例为例,若分组信息为:
Block索引:0 1 2 3 4 5 6 7。
分组标志信息groupIndicator:1 1 1 0 0 0 0 1。
则频谱排列布后的频谱形式如下:
Block索引:3 4 5 6 0 1 2 7。
即排列后的第0块的频谱为排列前的第3块的频谱,排列后的第1块的频谱为排列前的第4块的频谱,排列后的第2块的频谱为排列前的第5块的频谱,排列后的第3块的频谱为排列前的第6块的频谱,排列后的第4块的频谱为排列前的第0块的频谱,排列后的第5块的频谱为排列前的第1块的频谱,排列后的第6块的频谱为排列前的第2块的频谱,排列后的第7块的频谱为排列前的第7块的频谱。
S310.对分组排列后的MDCT频谱进行组内频谱交织处理,获得组内交织后MDCT频谱。
分组排列后的MDCT频谱,对每个组进行组内的交织处理,处理方式与步骤S35类似,只不过交织处理仅限于对属于同一分组的MDCT频谱进行处理。
仍以上述举例为例,排列后的频谱中,对暂态组(排列前的第3、4、5、6块,即排列后的第0、1、2、3块)进行交织,对其他组(排列前的第0、1、2、7块,即排列后的第4、5、6、7块)进行交织处理。
S311.利用编码神经网络,对组内交织后MDCT频谱进行编码。
本申请实施例对利用利用编码神经网络,对组内交织后MDCT频谱进行编码的具体方法不做限定。例如:组内交织后MDCT频谱,经过编码神经网络处理,生成潜在变量(latent variables)。对潜在变量进行量化处理,获得量化后的潜在变量。对量化后的潜在变量进行算术编码,将算术编码结果写入码流。
S312.若当前帧不是短帧,则按照其他类型帧对应的编码方法对当前帧的MDCT频谱进行编码。
对于其他类型帧的编码,可以不进行分组、排列以及组内交织处理。例如,直接对步骤S34获得的当前帧的MDCT频谱利用编码神经网络进行编码。
例如,确定与窗类型对应的窗函数,对当前帧的音频信号进行加窗处理,获得加窗处理后的信号;相邻帧的窗有叠接时,对加窗处理后的信号进行时频正变换,如MDCT变换,获得当前帧的MDCT频谱;对当前帧的MDCT频谱进行编码。
如图10所示,解码端执行的多声道信号的解码方法包括:
S41.根据接收到的码流解码,获得当前帧的窗类型。
确定当前帧的窗类型是否为短窗,若是,执行如下步骤S42,若不是,执行如下步骤S410。
S42.若当前帧的窗类型为短窗,根据接收到的码流解码,获得分组数量和分组标志信息。
S43.根据接收到的码流解码,利用解码神经网络,获得解码MDCT频谱。
解码神经网络与编码神经网络相对应。例如,利用解码神经网络解码的具体方法:根据接收到的码流,进行算术解码,获得量化后的潜在变量。将量化后的潜在变量进行去量化处理,获得去量化后的潜在变量。将去量化后的潜在变量作为输入,经过解码神经网络处理,生成解码MDCT频谱。
S44.根据分组数量和分组标志信息,对解码MDCT频谱进行组内解交织处理,获得组内解交织处理的MDCT频谱。
根据分组数量和分组标志信息,确定属于同一组的MDCT频谱块。例如,解码MDCT频谱分为8个块。分组数量等于2,分组标志信息groupIndicator为1 1 1 0 0 0 0 1。分组标志信息中标志值为0的比特位的数量为4,那么解码MDCT频谱中前4个块的MDCT谱为一组,属于暂态组,需要进行组内解交织处理;标志值为1的比特位数量为4,那么后4个块的MDCT谱为一组,属于非暂态组,需要进行组内解交织处理。组内解交织处理获得的8个块的MDCT频谱即为该8个块的组内解交织处理的MDCT频谱。
S45.根据分组数量和分组标志信息,对组内解交织处理的MDCT频谱进行逆分组排列处理,获得逆分组排列处理的MDCT频谱。
根据分组标志信息groupIndicator,将组内解交织处理的MDCT频谱排列为按时间先后排序的M个块频谱。
例如,分组数量等于2,分组标志信息groupIndicator为1 1 1 0 0 0 0 1,则需要将组内解交织处理获得的第0块的MDCT频谱,调整为第3块的MDCT频谱(分组标志信息中第一个标志值为0的比特对应的元素位置索引为3);将组内解交织处理获得的第1块的 MDCT频谱,调整为第4块的MDCT频谱(分组标志信息中第二个标志值为0的比特对应的元素位置索引为4);将组内解交织处理获得的第2块的MDCT频谱,调整为第5块的MDCT频谱(分组标志信息中第三个标志值为0的比特对应的元素位置索引为5);将组内解交织处理获得的第3块的MDCT频谱,调整为第6块的MDCT频谱(分组标志信息中第四个标志值为0的比特对应的元素位置索引为6);将组内解交织处理获得的第4块的MDCT频谱,调整为第0块的MDCT频谱(分组标志信息中第一个标志值为1的比特对应的元素位置索引为0);将组内解交织处理获得的第5块的MDCT频谱,调整为第1块的MDCT频谱(分组标志信息中第二个标志值为1的比特对应的元素位置索引为1);将组内解交织处理获得的第6块的MDCT频谱,调整为第2块的MDCT频谱(分组标志信息中第三个标志值为1的比特对应的元素位置索引为2);组内解交织处理获得的第7块的MDCT频谱,不作调整,直接作为第7块的MDCT频谱。
编码端,频谱分组排列后的短帧频谱形式如下:Block索引3 4 5 6 0 1 2 7。
解码端,逆分组排列处理的短帧频谱恢复为8个短帧的按时间先后排序的8个块频谱:Block索引0 1 2 3 4 5 6 7。
S46.对逆分组排列处理的MDCT频谱进行交织处理,获得交织处理的MDCT频谱。
若当前帧的窗类型为短窗,将逆分组排列处理的MDCT频谱进行交织处理,方法同前。
S47.对交织处理的MDCT频谱进行解码后处理,获得解码后处理的MDCT频谱。
解码后处理可以包括BWE逆处理、TNS逆处理、FDNS逆处理等等处理。
S48.对解码后处理的MDCT频谱进行解交织处理,获得重构的MDCT频谱。
S49.对重构的MDCT频谱进行逆MDCT变换以及加窗处理,获得重构音频信号。
重构的MDCT频谱包括M个块的MDCT频谱,分别对每一块的MDCT频谱进行逆MDCT变换。对逆变换后的信号进行加窗以及混叠相加处理后,即可获得短帧的重构音频信号。
S410.若当前帧的窗类型为其他窗类型,按照其他类型帧对应的解码方法解码,获得重构音频信号。
例如,根据接收到的码流解码,利用解码神经网络,获得重构的MDCT频谱。根据窗型(长窗、切入窗、切出窗)进行反变换和OLA,获得重构音频信号。
采用本申请实施例提出的方法,若当前帧的窗类型为短窗,根据当前帧的M个块的频谱,获得当前帧的分组数量和分组标志信息;根据当前帧的分组数量和分组标志信息对当前帧的M个块的频谱进行分组排列,获得分组排列的音频信号;利用编码神经网络对分组排列的频谱进行编码。能够保证当前帧音频信号为暂态信号时,能够将包含暂态特征的MDCT频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。
本申请实施例也可以用于立体声编码,不同之处在于:首先,按照前述实施例中编码端步骤S31-310对立体声的左右声道分别进行处理后获得的左声道的组内交织后MDCT频谱和右声道的组内交织后MDCT频谱。然后步骤S311变为:利用编码神经网络对左声道的组内交织后MDCT频谱和右声道的组内交织后MDCT频谱进行编码。
编码神经网络的输入不再是单声道的组内交织后MDCT频谱,而是按照步骤S31-310对立体声的左右声道分别进行处理后获得的左声道的组内交织后MDCT频谱和右声道的组内 交织后MDCT频谱。
编码神经网络可以是CNN网络,将左声道的组内交织后MDCT频谱和右声道的组内交织后MDCT频谱,作为CNN网络两个通道的输入。
相对应的,解码端执行的流程包括:
根据接收到的码流解码,获得当前帧的左声道的窗类型以及分组数量和分组标志信息。
根据接收到的码流解码,获得当前帧的右声道的窗类型以及分组数量和分组标志信息。
根据接收到的码流解码,利用解码神经网络,获得解码的立体声的MDCT频谱。
根据当前帧的左声道的窗类型以及分组数量和分组标志信息以及解码的左声道的MDCT频谱进行按照实施例一解码侧单声道解码的步骤进行处理,获得重构的左声道信号。
根据当前帧的右声道的窗类型以及分组数量和分组标志信息以及解码的右声道的MDCT频谱进行按照实施例一解码侧单声道解码的步骤进行处理,获得重构的右声道信号。
采用本申请实施例提出的方法,若当前帧的窗类型为短窗,根据当前帧的M个块的频谱,获得当前帧的分组数量和分组标志信息;根据当前帧的分组数量和分组标志信息对当前帧的M个块的频谱进行分组排列,获得分组排列的音频信号;利用编码神经网络对分组排列的频谱进行编码。能够保证当前帧音频信号为暂态信号时,能够将包含暂态特征的MDCT频谱调整到编码重要性更高的位置,使得利用神经网络编解码处理后重建的音频信号能更好地保留暂态特征。
本申请实施例也可以用于立体声编码。如图11所示,应用本申请实施例提出的编码器中对左右声道的分组信息进行调整的编码流程,包括:
S51.获得当前帧立体声信号的M个块的左声道频谱和M个块的右声道频谱。
对立体声信号进行分帧处理,获得当前帧的立体声信号。当前帧的立体声信号包括当前帧的左声道信号和当前帧的右声道信号。
将当前帧的左声道信号作为当前帧的音频信号,按照前述图7所示的编码端步骤S11、S12中的方法,确定当前帧的左声道信号的窗类型;若当前帧的左声道信号的窗类型为短帧,对当前帧的左声道信号进行短帧的加窗处理并进行时频变换,获得M个块的左声道频谱。
同样的,将当前帧的右声道信号作为当前帧的音频信号,按照前述图7所示的编码端步骤S11、S12中的方法,确定当前帧的右声道信号的窗类型;若当前帧的右声道信号的窗类型为短帧,对当前帧的右声道信号进行短帧的加窗处理并进行时频变换,获得M个块的右声道频谱。
S52.根据M个块的左声道频谱,获得左声道的分组数量和分组标志信息。
若当前帧的左声道信号的窗类型为短帧,根据M个块的左声道频谱,前述图7所示编码端步骤S13中的方法获得左声道的分组数量和分组标志信息。
S53.根据M个块的右声道频谱,获得右声道的分组数量和分组标志信息。
若当前帧的右声道信号的窗类型为短帧,根据M个块的右声道频谱,前述图7所示编码端步骤S13中的方法获得右声道的分组数量和分组标志信息。
S54.根据左、右声道的分组标志信息,确定是否进行分组标志信息调整,若需要进行调整,则根据左、右声道的分组标志信息,确定调整后的左、右声道的分组标志信息。
当左声道的分组数量和右声道的分组数量相等,左、右声道的分组标志信息的各个标志值不一致,且左声道的分组标志信息所指示的暂态块的数量和右声道的分组标志信息所指示的暂态块的数量不同时,根据左声道的分组标志信息和右声道的分组标志信息进行分组标志信息调整,获得调整后的分组标志信息;否则左、右声道的分组标志信息的各个标志值完全一致,或分组标志信息不一致但左右声道的暂态块的数量相同时,不进行调整处理,直接将左、右声道的分组标志信息作为左、右声道调整的分组标志信息。
完全一致是指每一个标志值都相等,不一致包括不完全一致或完全不一致,指有的相等有的不等或全部不等。比较是按照对应的位置进行比较的。比如1 1 1 0 0 0 1 1和1 1 1 0 0 0 0 1表示不完全一致。1 1 1 0 0 0 1 1和1 1 1 0 0 0 1 1表示完全一致,1 1 1 0 0 0 1 1和0 0 0 1 1 1 0 0表示完全不一致。
调整的具体方法可以是将左声道的分组标志信息与右声道的分组标志信息按对应的比特位进行与计算,将结果作为左、右声道调整的分组标志信息中对应比特位的值。
另一种实现方式是:首先根据左、右声道的分组数量,判断是否要比较左、右声道的分组标志信息。如果左、右声道的分组数量均等于2,则进一步比较左、右声道的分组标志信息以确定是否进行分组标志信息调整;否则,不需要进行分组标志信息调整。
左、右声道调整的分组标志信息编码后写入码流,传输到解码端。
S55.根据左、右声道调整的分组标志信息,对M个块的左声道频谱和M个块的右声道频谱进行分组排列,获得分组排列的立体声频谱。
分组排列的具体方法同前述图7所示步骤S14中的一致。根据调整的分组标志信息,分别对M个块的左声道频谱和M个块的右声道频谱进行分组排列,获得分组排列的左声道频谱和右声道频谱。
S56.利用编码神经网络对分组排列的立体声频谱进行编码。
一种方法是:根据调整的分组标志信息,对分组排列的左声道频谱先进行组内交织处理,获得组内交织的左声道频谱。同样,根据调整的分组标志信息,对分组排列的右声道频谱先进行组内交织处理,获得组内交织的右声道频谱然后。再利用编码神经网络,对组内交织的立体声频谱进行编码,写入码流。
立体声编码使用的编码神经网路可以是CNN网络,其中左声道频谱和右声道频谱分别作为CNN网络中一个通道的输入信号。
如图12所示,与前述图11所示的编码端对应的解码流程,包括如下步骤:
S61.根据接收到的码流解码,获得当前帧左、右声道的分组数量和分组标志信息。
根据接收到的码流解码,获得当前帧的左、右声道的窗类型。若当前帧的左声道的窗类型为短帧,则根据接收到的码流解码,获得左声道的分组数量和分组标志信息。若当前帧的右声道的窗类型为短帧,则根据接收到的码流解码,获得右声道的分组数量和分组标志信息。
S62.根据接收到的码流利用解码神经网络解码,获得组内解交织处理后的立体声频谱。
解码端与编码端对应。具体步骤包括:
首先,根据接收到的码流解码,利用解码神经网络,获得左声道解码频谱和右声道解码频谱。
然后,根据左声道的分组数量和分组标志信息,可以确定左声道解码频谱中属于同一分组的频谱。对属于同一分组的频谱进行组内解交织处理,获得组内解交织处理后的左声道频谱。同样,根据右声道的分组数量和分组标志信息,可以确定右声道解码频谱中属于同一分组的频谱。对属于同一分组的频谱进行组内解交织处理,获得组内解交织处理后的右声道频谱。解交织处理与编码端的解交织处理相同。
S63.根据左、右声道的分组数量和分组标志信息,对组内解交织处理后的立体声频谱进行逆分组排列处理,获得逆分组排列处理后的立体声频谱。
根据左声道的分组数量和分组标志信息,对组内解交织处理后的左声道频谱进行逆分组排列处理,获得逆分组排列处理后的左声道频谱。同样,根据右声道的分组数量和分组标志信息,对组内解交织处理后的右声道频谱进行逆分组排列处理,获得逆分组排列处理后的右声道频谱。逆分组排列处理的具体方法为前述图11所示编码端的步骤S55的分组排列的逆过程,此处不再详细说明。
S64.根据重构的立体声频谱,获得重构的立体声信号。
根据重构的左声道频谱,获得重构的左声道信号。根据重构的右声道频谱,获得重构的右声道信号。通过左、右声道的频谱获得重构的立体声信号的具体方法为前述图11所示编码端的步骤S56的编码的逆过程,此处不再详细说明。
前述实施例在立体声信号的左、右声道的窗类型均为短窗,但是左、右声道的分组标志信息不一致时,对于左、右声道分组标志值不一致的块,利用神经网络编解码后,重建音频信号的暂态特征不能很好的恢复。因此本申请实施例还包括对立体声信号进行左右声道分组调整的方案。
在本申请的一个实施例中,编码方法如图13所示:
S71.对立体声信号进行分帧处理,获得当前帧的立体声信号。
当前帧的立体声信号包括当前帧的左声道信号和当前帧的右声道信号。
S72.根据当前帧的立体声信号分别进行左、右声道的暂态检测,获得左、右声道的暂态检测结果。
左、右声道的暂态检测的具体方法同前述图7所示的步骤S12。
S73.根据左、右声道的暂态检测结果,分别确定当前帧的左、右声道信号的窗类型。
根据暂态检测结果确定窗类型的方法同前述图7所示的步骤S13。
S74.若当前帧的左声道信号的窗类型为短帧,根据当前帧的左声道信号,获得M个块的左声道频谱。
若当前帧的左声道信号的窗类型为短帧,对当前帧的左声道信号进行短帧的加窗处理并进行MDCT变换,获得M个块的左声道MDCT频谱。对当前帧的左声道的MDCT频谱进行交织处理,获得交织后的左声道MDCT频谱。对交织后的左声道MDCT频谱进行编码预处理,获得预处理后的左声道MDCT频谱。预处理可以包括FDNS、TNS、BWE等处理。对预处理后的左声道MDCT频谱进行解交织处理,获得M个块的左声道MDCT频谱。
S75.若当前帧的右声道信号的窗类型为短帧,根据当前帧的右声道信号,获得M个块的右声道频谱。
若当前帧的右声道信号的窗类型为短帧,对当前帧的右声道信号进行短帧的加窗处理 并进行MDCT变换,获得M个块的右声道MDCT频谱。对当前帧的右声道的MDCT频谱进行交织处理,获得交织后的右声道MDCT频谱。对交织后的右声道MDCT频谱进行编码预处理,获得预处理后的右声道MDCT频谱。预处理可以包括FDNS、TNS、BWE等处理。对预处理后的右声道MDCT频谱进行解交织处理,获得M个块的右声道MDCT频谱。
S76.根据M个块的左声道频谱,获得左声道的分组数量和分组标志信息。
获得分组数量和分组标志信息的具体方法同前述图7所示的步骤S18。
S77.根据M个块的右声道频谱,获得右声道的分组数量和分组标志信息。
获得分组数量和分组标志信息的具体方法同前述图7所示的步骤S18。
S78.根据左、右声道的分组标志信息,确定是否进行分组标志信息调整,若需要进行调整,则根据左、右声道的分组标志信息,确定调整后的左、右声道的分组标志信息。
情况1:若左、右声道的分组标志信息指示左、右声道中暂态组包含的频谱块的位置完全相同,则不对左、右声道的分组标志信息进行调整。即,左声道暂态组包含的块的数量与右通道暂态组包含的块的数量相同,且左声道暂态组包含的块的位置与右通道暂态组包含的块的位置相同,则不对左、右声道的分组标志信息进行调整。
举例如下:
左声道的分组标志信息:1 1 1 1 1 1 0 0。
右声道的分组标志信息:1 1 1 1 1 1 0 0。
上述分组信息表明,左右通道的暂态组包含的频谱块的位置完全重叠,这种情况下,也不需要对左右通道分组信息进行任何调整。
情况2:若左声道暂态组包含的块的数量与右通道暂态组包含的块的数量相同,则不对左、右声道的分组标志信息进行调整。即左声道暂态组包含的块的数量与右通道暂态组包含的块的数量相同,且左声道暂态组包含的块的位置与右通道暂态组包含的块的位置不一致,则不对左、右声道的分组标志信息进行调整。
举例如下:
左声道的分组标志信息:0 0 0 1 1 1 1 1。
右声道的分组标志信息:1 1 1 1 1 0 0 0。
上述分组信息表明,左右通道的暂态组包含的块的数量相同,但左声道暂态组包含的块的位置与右通道暂态组包含的块的位置不一致,这种情况下,不需要对左、右通道分组标志信息进行任何调整。
在如下的情况3、4中,左声道暂态组包含的暂态块的数量与右通道暂态组包含的暂态块的数量不相同,则需要对左、右声道中至少一个声道的分组标志信息进行调整。其中,如下情况3中对左、右声道中一个声道的分组标志信息进行调整,在情况4中对左、右声道中一个声道的分组标志信息进行调整或者对两个声道的分组标志信息都进行调整。
情况3:若左右声道的分组标志信息指示左声道暂态组包含的块的数量与右通道暂态组包含的块的数量不同,且左、右声道中暂态组包含的块的位置完全不同,则对暂态组包含的块的数量较少的通道的分组标志信息进行调整,以保证左右通道的暂态组包含的块的数量相同。
举例如下:
左声道的分组标志信息groupIndicator_L:00011111。
右声道的分组标志信息groupIndicator_R:11110000。
则对左声道的分组标志信息进行调整,使左声道的暂态组中的块的数量和右通道的暂态组中的块的数量相同,例如可以将左声道序号为3(序号从0开始)的块的暂态标识改为暂态,此时调整后的分组信息如下:
左声道的分组标志信息groupIndicator_L:0 0 0 0 1 1 1 1。
右声道的分组标志信息groupIndicator_R:1 1 1 1 0 0 0 0。
通过上述调整,可保证左右声道的暂态组中的块的数量相同。
情况4:若左、右声道的分组标志信息指示左声道暂态组包含的块的数量与右通道暂态组包含的块的数量不同,且左、右声道中暂态组包含的块的位置不完全相同,即左、右声道的暂态组包含的频谱块的位置仅有部分不同,则需要进行分组信息调整。调整的方式可以是将左、右通道的暂态组进行并集处理,即扩大暂态组的范围。
举例如下,左、右声道的分组标志信息的序号都从0开始标号,需要对右声道的分组信息进行调整:
左声道的分组标志信息groupIndicator_L:1 1 1 0 0 0 0 1。
右声道的分组标志信息groupIndicator_R:1 1 1 1 0 0 0 1。
将左、右通道的暂态组进行并集处理,即扩大暂态组的范围,如上示例调整后的分组信息如下:
左声道的分组标志信息groupIndicator_L:1 1 1 0 0 0 0 1。
右声道的分组标志信息groupIndicator_R:1 1 1 0 0 0 0 1。
将右声道的序号为3的块由非暂态组调整到暂态组,从而将左、右通道的暂态块数量相同,即使左、右声道的暂态组包含的频谱块的位置保持一致。左、右声道调整后的分组标志信息编码后写入码流,传输到解码端。
举例如下,需要对左、右声道的分组信息进行调整:
左声道的分组标志信息groupIndicator_L:1 1 0 0 0 0 11。
右声道的分组标志信息groupIndicator_R:1 1 1 1 0 0 0 1。
将左、右通道的暂态组进行并集处理,即扩大暂态组的范围,如上示例调整后的分组信息如下:
左声道的分组标志信息groupIndicator_L:1 1 0 0 0 0 0 1。
右声道的分组标志信息groupIndicator_R:1 1 0 0 0 0 0 1。
S79.根据左、右声道调整后的分组标志信息,对M个块的左声道频谱和M个块的右声道频谱进行分组排列,获得分组排列的立体声频谱。
分组排列处理的具体方法同前述图7所示的步骤S14中的一致。根据调整后的分组标志信息,分别对M个块的左声道频谱和M个块的右声道频谱进行分组排列,获得分组排列的左声道频谱和右声道频谱。
S710.利用编码神经网络对分组排列的立体声频谱进行编码,写入码流。
一种方法是:根据调整后的分组标志信息,对分组排列的左声道频谱先进行组内交织处理,获得组内交织的左声道频谱。同样,根据调整后的分组标志信息,对分组排列的右 声道频谱先进行组内交织处理,获得组内交织的右声道频谱。然后,再利用编码神经网络,对组内交织的立体声频谱进行编码。
立体声编码使用的编码神经网路可以是CNN网络,其中左声道频谱和右声道频谱分别作为CNN网络中一个通道的输入信号。
在本申请的一些实施例中,解码方法如图14所示,主要包括如下步骤:
S81.根据接收到的码流解码,获得当前帧的左声道的窗类型。
S82.根据接收到的码流解码,获得当前帧的右声道的窗类型。
S83.若当前帧的左声道的窗类型为短帧,则根据接收到的码流解码,获得左声道的分组数量和分组标志信息。
S84.若当前帧的右声道的窗类型为短帧,则根据接收到的码流解码,获得右声道的分组数量和分组标志信息。
S85.根据接收到的码流解码,利用解码神经网络,获得左声道解码频谱和右声道解码频谱。
S86.根据左声道的分组数量和分组标志信息,对左声道解码频谱进行组内解交织处理,获得组内解交织处理后的左声道频谱。
然后,根据左声道的分组数量和分组标志信息,可以确定左声道解码频谱中属于同一分组的频谱。对属于同一分组的频谱进行组内解交织处理,获得组内解交织处理后的左声道频谱。
S87.根据右声道的分组数量和分组标志信息,对右声道解码频谱进行组内解交织处理,获得组内解交织处理后的右声道频谱。
同样,根据右声道的分组数量和分组标志信息,可以确定右声道解码频谱中属于同一分组的频谱。对属于同一分组的频谱进行组内解交织处理,获得组内解交织处理后的右声道频谱。解交织处理与编码端的解交织处理相同。
S88.根据左声道的分组数量和分组标志信息,对组内解交织处理后的左声道频谱进行逆分组排列处理,获得逆分组处理后的左声道频谱。
逆分组排列处理的具体方法同前述图8所示的步骤S24。
S89.根据右声道的分组数量和分组标志信息,对组内解交织处理后的右声道频谱进行逆分组排列处理,获得逆分组处理后的右声道频谱。
逆分组排列处理的具体方法同前述图8所示的步骤S24。
S810.对逆分组处理后的左声道频谱进行交织处理,获得交织处理后的左声道频谱。
若当前帧的左声道的窗类型为短帧,将逆分组处理后的左声道频谱进行交织处理。
S811.对逆分组处理后的右声道频谱进行交织处理,获得交织处理后的右声道频谱。
若当前帧的右声道的窗类型为短帧,将逆分组处理后的右声道频谱进行交织处理。
S812.对交织处理后的左声道频谱进行解码后处理,获得解码后处理后的左声道频谱。
S813.对交织处理后的右声道频谱进行解码后处理,获得解码后处理后的右声道频谱。
解码后处理可以包括BWE、TNS逆处理、FDNS逆处理等处理。
S814.对解码后处理后的左声道频谱进行解交织处理,获得重构的左声道频谱。
S815.对解码后处理后的右声道频谱进行解交织处理,获得重构的右声道频谱。
S816.对重构的左声道频谱进行逆MDCT变换以及去加窗处理,获得重构的左声道信号。
S817.对重构的右声道频谱进行逆MDCT变换以及去加窗处理,获得重构的右声道信号。
本申请实施例中,根据左声道的分组标志信息和右声道的分组标志信息进行分组标志信息调整,获得左、右声道调整后的分组标志信息;根据左、右声道调整后的分组标志信息,对M个块的左声道频谱和M个块的右声道频谱进行分组排列,获得分组排列的立体声频谱。通过对左右声道的分组标志信息进行调整,确保将分组排列后的立体声频谱作为编码神经网络输入的时候左右声道分组保持一致,使得重建立体声信号左右声道的暂态特征能很好的恢复。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。
请参阅图15所示,本申请实施例提供的一种多声道信号的编码装置1500,可以包括:暂态标识获得模块1501、分组信息获得模块1502、分组信息调整模块1503、频谱获得模块1504和编码模块1505,其中,
暂态标识获得模块,用于根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识;所述第一声道的M个块包括所述第一声道的第一块,所述第一块的第一暂态标识用于指示所述第一块为暂态块,或者指示所述第一块为非暂态块;
分组信息获得模块,用于根据所述M个第一暂态标识获得所述第一声道的M个块的第一分组信息;
所述暂态标识获得模块,用于根据所述当前帧的第二声道的M个块的频谱获得所述第二声道的M个块的M个第二暂态标识;所述第二声道的M个块包括所述第二声道的第二块,所述第二块的第二暂态标识用于指示所述第二块为暂态块,或者指示所述第二块为非暂态块;
所述分组信息获得模块,用于根据所述M个第二暂态标识获得所述第二声道的M个块的第二分组信息;
分组信息调整模块,用于当所述第一分组信息和所述第二分组信息满足预设条件时,根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息,所述第一调整分组信息与所述第一分组信息对应,所述第二调整分组信息与所述第二分组信息对应;其中,所述第一调整分组信息与所述第一分组信息相同且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息与所述第二分组信息相同;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;
频谱获得模块,用于根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱;
所述频谱获得模块,用于根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱;
编码模块,用于利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,以获得频谱编码结果;将所述频谱编码结果写入码流。
请参阅图16所示,本申请实施例提供的一种多声道信号的解码装置1600,可以包括:分组信息获得模块1601、解码模块1602、频谱获得模块1603和重构信号获得模块1604,其中,
分组信息获得模块,用于从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,所述第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识;
所述分组信息获得模块,用于从所述码流中获得所述当前帧的第二声道的M个块的第二解码分组信息,所述第二解码分组信息用于指示所述第二声道的M个块的第二解码暂态标识;
解码模块,用于利用解码神经网络对所述码流进行解码,以获得所述第一声道的M个块的解码频谱和所述第二声道的M个块的解码频谱;
重构信号获得模块,用于根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号;
所述重构信号获得模块,用于根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号。
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储有程序,该程序执行包括上述方法实施例中记载的部分或全部步骤。
接下来介绍本申请实施例提供的另一种多声道信号的编码装置,请参阅图17所示,多声道信号的编码装置1700包括:
接收器1701、发射器1702、处理器1703和存储器1704(其中多声道信号的编码装置1700中的处理器1703的数量可以一个或多个,图17中以一个处理器为例)。在本申请的一些实施例中,接收器1701、发射器1702、处理器1703和存储器1704可通过总线或其它方式连接,其中,图17中以通过总线连接为例。
存储器1704可以包括只读存储器和随机存取存储器,并向处理器1703提供指令和数据。存储器1704的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1704存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1703控制多声道信号的编码装置的操作,处理器1703还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,多声道信号的编码装置的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1703中,或者由处理器1703实现。处理器1703可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1703中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1703可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1704,处理器1703读取存储器1704中的信息,结合其硬件完成上述方法的步骤。
接收器1701可用于接收输入的数字或字符信息,以及产生与多声道信号的编码装置的相关设置以及功能控制有关的信号输入,发射器1702可包括显示屏等显示设备,发射器1702可用于通过外接接口输出数字或字符信息。
本申请实施例中,处理器1703用于执行前述实施例图4、图7、图9、图11、图13所示的由多声道信号的编码装置执行的方法。
接下来介绍本申请实施例提供的另一种多声道信号的解码装置,请参阅图18所示,多声道信号的解码装置1800包括:
接收器1801、发射器1802、处理器1803和存储器1804(其中多声道信号的解码装置1800中的处理器1803的数量可以一个或多个,图18中以一个处理器为例)。在本申请的一些实施例中,接收器1801、发射器1802、处理器1803和存储器1804可通过总线或其它方式连接,其中,图18中以通过总线连接为例。
存储器1804可以包括只读存储器和随机存取存储器,并向处理器1803提供指令和数据。存储器1804的一部分还可以包括NVRAM。存储器1804存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1803控制多声道信号的解码装置的操作,处理器1803还可以称为CPU。具体的应用中,多声道信号的解码装置的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1803中,或者由处理器1803实现。 处理器1803可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1803中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1803可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1804,处理器1803读取存储器1804中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中,处理器1803,用于执行前述实施例图5、图8、图10、图12、图14所示的由多声道信号的解码装置执行的方法。
在另一种可能的设计中,当多声道信号的编码装置或者多声道信号的解码装置为终端内的芯片时,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行上述第一方面任意一项的音频编码方法,或者第二方面任意一项的音频解码方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-onlymemory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(randomaccessmemory,RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面或第二方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (32)

  1. 一种多声道信号的编码方法,其特征在于,包括:
    根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识;所述第一声道的M个块包括所述第一声道的第一块,所述第一块的第一暂态标识用于指示所述第一块为暂态块,或者指示所述第一块为非暂态块;
    根据所述M个第一暂态标识获得所述第一声道的M个块的第一分组信息;
    根据所述当前帧的第二声道的M个块的频谱获得所述第二声道的M个块的M个第二暂态标识;所述第二声道的M个块包括所述第二声道的第二块,所述第二块的第二暂态标识用于指示所述第二块为暂态块,或者指示所述第二块为非暂态块;
    根据所述M个第二暂态标识获得所述第二声道的M个块的第二分组信息;
    当所述第一分组信息和所述第二分组信息满足预设条件时,根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息,所述第一调整分组信息与所述第一分组信息对应,所述第二调整分组信息与所述第二分组信息对应;其中,所述第一调整分组信息与所述第一分组信息相同且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息与所述第二分组信息相同;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;
    根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱;
    根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱;
    利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,以获得频谱编码结果;
    将所述频谱编码结果写入码流。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述第一调整分组信息,以及所述第二调整分组信息进行编码,以获得分组信息编码结果;
    将所述分组信息编码结果写入所述码流。
  3. 根据权利要求1或2或所述的方法,其特征在于,所述第一分组信息包括:所述第一声道的M个块的第一分组数量或第一分组数量标识,所述第一分组数量标识用于指示所述第一分组数量,当所述第一分组数量大于1时,所述第一分组信息还包括:所述M个第一暂态标识;或者,所述第一分组信息包括:所述M个第一暂态标识;
    和/或,
    所述第二分组信息包括:所述第二声道的M个块的第二分组数量或第二分组数量标识,所述第二分组数量标识用于指示所述第二分组数量,当所述第二分组数量大于1时,所述第二分组信息还包括:所述M个第二暂态标识;或者,所述第二分组信息包括:所述M个第二暂态标识;
    和/或,
    所述第一调整分组信息包括:所述第一声道的M个块的第一调整分组数量或第一调整 分组数量标识,所述第一调整分组数量标识用于指示所述第一调整分组数量,当所述第一调整分组数量大于1时,所述第一调整分组信息还包括:所述第一声道的M个块的M个第一调整暂态标识,所述第一块的第一调整暂态标识与所述第一块的第一暂态标识不同或所述第一块的第一调整暂态标识与所述第一块的第一暂态标识相同;或者,所述第一调整分组信息包括:所述M个第一调整暂态标识;
    和/或,
    所述第二调整分组信息包括:所述第二声道的M个块的第二调整分组数量或第二调整分组数量标识,所述第二调整分组数量标识用于指示所述第二调整分组数量,当所述第二调整分组数量大于1时,所述第二调整分组信息还包括:所述第二声道的M个块的M个第二调整暂态标识,所述第二块的第二调整暂态标识与所述第二块的第二暂态标识不同或所述第二块的第二调整暂态标识与所述第二块的第二暂态标识相同;或者,所述第二调整分组信息包括:所述M个第二调整暂态标识。
  4. 根据权利要求3所述的方法,其特征在于,所述预设条件包括:所述第一分组信息与所述第二分组信息不一致。
  5. 根据权利要求4所述的方法,其特征在于,所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述M个第一暂态标识和所述M个第二暂态标识不一致;
    或,
    所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致;
    或,
    所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,所述M个第一暂态标识和所述M个第二暂态标识不一致,且所述第一声道的M个块中的第N块和所述第二声道的M个块中的第N块均为暂态,0≤N<M。
  6. 根据权利要求5所述的方法,其特征在于,所述第一声道的M个块具有各自的索引,所述第二声道的M个块具有各自的索引;
    当所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致时,如果所述第一声道的M个块中的暂态块的索引与所述第二声道的M个块中的暂态块的索引没有交集,所述根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息包括:
    当所述第一声道的暂态块数量小于所述第二声道的暂态块数量时,对所述第一分组信息进行调整,以获得所述第一调整分组信息,所述第一调整分组信息指示的所述第一声道 的暂态块数量与所述第二分组信息指示的所述第二声道的暂态块数量相等;
    或,
    当所述第一声道的暂态块数量大于所述第二声道的暂态块数量时,对所述第二分组信息进行调整,以获得所述第二调整分组信息,所述第二调整分组信息指示的所述第二声道的暂态块数量与所述第一分组信息指示的所述第一声道的暂态块数量相等。
  7. 根据权利要求5所述的方法,其特征在于,所述第一声道的M个块具有各自的索引,所述第二声道的M个块具有各自的索引;
    当所述第一分组信息与所述第二分组信息不一致包括:所述M个第一暂态标识指示所述第一声道的M个块包括暂态块和非暂态块,所述M个第二暂态标识指示所述第二声道的M个块包括暂态块和非暂态块,且所述第一声道的暂态块数量与所述第二声道的暂态块数量不一致时,如果所述第一声道的M个块中的暂态块的索引与所述第二声道的M个块中的暂态块的索引有交集,所述根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息包括:
    当所述M个第一暂态标识指示的暂态块的索引是所述M个第二暂态标识指示的暂态块的索引的一部分时,对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识,所述M个第一调整暂态标识指示的所有暂态块的索引与所述M个第二暂态标识指示的所有暂态块的索引相同;
    当所述M个第二暂态标识指示的暂态块的索引是所述M个第一暂态标识指示的暂态块的索引的一部分时,对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识,所述M个第二调整暂态标识指示的所有暂态块的索引与所述M个第一暂态标识指示的所有暂态块的索引相同;
    当所述M个第一暂态标识指示的暂态块的索引与所述M个第二暂态标识指示的暂态块的索引部分相同时,对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识,对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识,所述M个第一调整暂态标识指示的所有暂态块的索引与所述M个第二调整暂态标识指示的所有暂态块的索引相同。
  8. 根据权利要求7所述的方法,其特征在于,所述对所述M个第一暂态标识中的至少一个进行调整以获得所述M个第一调整暂态标识包括:
    当所述第一块的第一暂态标识指示所述第一块为非暂态块时,如果所述第二声道的M个块的第三块的第二暂态标识指示所述第三块为暂态块,将所述第一块的第一暂态标识调整为所述第一块的第一调整暂态标识,所述第一块的第一调整暂态标识指示所述第一块为暂态块,所述第一块的索引与所述第三块的索引相同;
    所述对所述M个第二暂态标识中的至少一个进行调整以获得所述M个第二调整暂态标识包括:
    当所述第二块的第二暂态标识指示所述第二块为非暂态块时,如果所述第一声道的M 个块的第四块的第一暂态标识指示所述第四块为暂态块,将所述第二块的第二暂态标识调整为所述第二块的第二调整暂态标识,所述第二块的第二调整暂态标识指示所述第二块为暂态块,所述第二块的索引与所述第四块的索引相同。
  9. 根据权利要求3至8中任一所述的方法,其特征在于,当所述第一调整分组数量大于1或所述M个第一调整暂态标识指示所述第一声道的M个块包括暂态块和非暂态块时,所述根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱包括:
    根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱;
    当所述第二调整分组数量大于1或所述M个第二调整暂态标识指示所述第二声道的M个块包括暂态块和非暂态块时,所述根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱包括:
    根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
    将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为暂态块的频谱分到第一暂态组中,以及将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为非暂态块的频谱分到第一非暂态组中;将所述第一暂态组中的块的频谱排列至所述第一非暂态组中的块的频谱之前,以获得所述第一待编码频谱;
    或,
    所述根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
    将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为暂态块的频谱分到第二暂态组中,以及将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为非暂态块的频谱分到第二非暂态组中;将所述第二暂态组中的块的频谱排列至所述第二非暂态组中的块的频谱之前,以获得所述第二待编码频谱。
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述第一调整分组信息对所述第一声道的M个块的频谱进行分组排列,以获得第一待编码频谱,包括:
    将所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为暂态块的频谱排列至所述第一声道的M个块中被所述M个块的第一调整暂态标识指示为非暂态块的频谱之前,以获得所述第一待编码频谱;
    或,
    所述根据所述第二调整分组信息对所述第二声道的M个块的频谱进行分组排列,以获得第二待编码频谱,包括:
    将所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为暂态块的频谱排列至所述第二声道的M个块中被所述M个块的第二调整暂态标识指示为非暂态块的频谱之前,以获得所述第二待编码频谱。
  12. 根据权利要求3至11任一所述的方法,其特征在于,所述利用编码神经网络对所 述第一待编码频谱和所述第二待编码频谱进行编码之前,所述方法还包括:
    对所述第一待编码频谱进行组内交织处理,以获得组内交织处理后的第一频谱;
    对所述第二待编码频谱进行组内交织处理,以获得组内交织处理后的第二频谱;
    所述利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,包括:
    利用所述编码神经网络对所述组内交织处理后的第一频谱和所述组内交织处理后的第二频谱进行编码。
  13. 根据权利要求12所述的方法,其特征在于,所述第一声道的M个块中被所述M个第一调整暂态标识指示为暂态块的数量为P个,所述第一声道的M个块中被所述M个第一调整暂态标识指示为非暂态块的数量为Q个,M=P+Q;
    所述对所述第一待编码频谱进行组内交织处理,包括:
    对所述P个块的频谱进行交织处理,以获得所述P个块的交织处理后的频谱;
    对所述Q个块的频谱进行交织处理,以获得所述Q个块的交织处理后的频谱。
  14. 根据权利要求1至13中任一所述的方法,其特征在于,所述根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识前,所述方法还包括:
    获得所述第一声道的第一窗类型,所述第一窗类型为短窗类型或非短窗类型;
    获得所述第二声道的第二窗类型,所述第二窗类型为短窗类型或非短窗类型;
    当所述第一窗类型和所述第二窗类型均为短窗类型时,才执行根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识的步骤。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    对所述第一窗类型和所述第二窗类型进行编码以获得窗类型编码结果;
    将所述窗类型编码结果写入所述码流。
  16. 根据权利要求1至15中任一所述的方法,其特征在于,所述根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识,包括:
    根据所述第一声道的M个块的频谱获得所述第一声道的M个块的M个第一频谱能量;
    根据所述M个第一频谱能量获得所述第一声道的M个块的第一频谱能量平均值;
    根据所述M个第一频谱能量与所述第一频谱能量平均值获得所述M个第一暂态标识。
  17. 根据权利要求16所述的方法,其特征在于,当所述第一块的第一频谱能量大于所述第一频谱能量平均值的K倍时,所述第一块的第一暂态标识指示所述第一块为暂态块;或,
    当所述第一块的第一频谱能量小于或等于所述第一频谱能量平均值的K倍时,所述第一块的暂态标识指示所述第一块为非暂态块;
    其中,所述K为大于或等于1的实数。
  18. 一种多声道信号的解码方法,其特征在于,包括:
    从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,所述第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识;
    从所述码流中获得所述当前帧的第二声道的M个块的第二解码分组信息,所述第二解码分组信息用于指示所述第二声道的M个块的第二解码暂态标识;
    利用解码神经网络对所述码流进行解码,以获得所述第一声道的M个块的解码频谱和所述第二声道的M个块的解码频谱;
    根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号;
    根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号。
  19. 根据权利要求18所述的方法,其特征在于,所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
    当所述第一解码分组信息指示所述第一声道的M个块的第一解码分组数量大于1时,对所述第一声道的M个块的解码频谱进行逆分组排列处理,以获得所述第一声道的M个块的逆分组排列处理后的频谱;
    根据所述第一声道的M个块的逆分组排列处理后的频谱获得所述第一声道的第一重构信号;
    所述根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号包括:
    当所述第二解码分组信息指示所述第二声道的M个块的第二解码分组数量大于1时,对所述第二声道的M个块的解码频谱进行逆分组排列处理,以获得所述第二声道的M个块的逆分组排列处理后的频谱;
    根据所述第二声道的M个块的逆分组排列处理后的频谱获得所述第二声道的第二重构信号。
  20. 根据权利要求18所述的方法,其特征在于,所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
    对所述第一声道的M个块的解码频谱进行组内解交织处理,以获得所述第一声道的M个块的组内解交织处理后的频谱;
    根据所述第一声道的M个块的组内解交织处理后的频谱获得所述第一重构信号;
    所述根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号,包括:
    对所述第二声道的M个块的解码频谱进行组内解交织处理,以获得所述第二声道的M个块的组内解交织处理后的频谱;
    根据所述第二声道的M个块的组内解交织处理后的频谱获得所述第二重构信号。
  21. 根据权利要求18所述的方法,其特征在于,所述第一声道的M个块中被所述M个第一解码暂态标识指示为暂态块的数量为P个,所述第一声道的M个块中被所述M个第一解码暂态标识指示为非暂态块的数量为Q个,其中,M=P+Q;
    所述根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号,包括:
    对所述第一声道的所述P个块的解码频谱进行组内解交织处理和对所述第一声道的所 述Q个块的解码频谱进行组内解交织处理,以获得所述第一声道的M个块的组内解交织处理后的频谱;
    根据所述第一解码分组信息对所述第一声道的M个块的组内解交织处理后的频谱进行逆分组排列处理,以获得所述第一声道的M个块的逆分组排列处理后的频谱;
    根据所述第一声道的M个块的逆分组排列处理后的频谱获得所述第一声道的第一重构信号。
  22. 根据权利要求21所述的方法,其特征在于,
    所述根据所述第一解码分组信息对所述第一声道的M个块的组内解交织处理后的频谱进行逆分组排列处理,包括:
    根据所述第一解码分组信息获得所述第一声道的所述P个块的索引;
    根据所述第一解码分组信息获得所述第一声道的所述Q个块的索引;
    根据所述P个块的索引和所述Q个块的索引对所述第一声道的M个块的组内解交织处理后的频谱进行所述逆分组排列处理。
  23. 根据权利要求18至22中任一所述的方法,其特征在于,所述方法还包括:
    从所述码流中获得当前帧的第一声道的窗类型;
    从所述码流中获得所述当前帧的第二声道的窗类型;
    当所述第一窗类型和所述第二窗类型均为短窗类型时,才执行从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息的步骤。
  24. 根据权利要求18至23中任一所述的方法,其特征在于,所述第一解码分组信息包括:所述第一声道的M个块的第一解码分组数量或第一解码分组数量标识,所述第一解码分组数量标识用于指示所述第一解码分组数量,当所述第一解码分组数量大于1时,所述第一解码分组信息还包括:M个第一解码暂态标识;或者,所述第一解码分组信息包括:所述M个第一解码暂态标识;
    和/或,
    所述第二解码分组信息包括:所述第二声道的M个块的第二解码分组数量或第二解码分组数量标识,所述第二解码分组数量标识用于指示所述第二解码分组数量,当所述第二解码分组数量大于1时,所述第二解码分组信息还包括:M个第二解码暂态标识;或者,所述第二解码分组信息包括:所述M个第二解码暂态标识。
  25. 一种多声道信号的编码装置,其特征在于,包括:
    暂态标识获得模块,用于根据待编码多声道信号的当前帧的第一声道的M个块的频谱获得所述第一声道的M个块的M个第一暂态标识;所述第一声道的M个块包括所述第一声道的第一块,所述第一块的第一暂态标识用于指示所述第一块为暂态块,或者指示所述第一块为非暂态块;
    分组信息获得模块,用于根据所述M个第一暂态标识获得所述第一声道的M个块的第一分组信息;
    所述暂态标识获得模块,用于根据所述当前帧的第二声道的M个块的频谱获得所述第二声道的M个块的M个第二暂态标识;所述第二声道的M个块包括所述第二声道的第二块,所述第二块的第二暂态标识用于指示所述第二块为暂态块,或者指示所述第二块为非暂态 块;
    所述分组信息获得模块,用于根据所述M个第二暂态标识获得所述第二声道的M个块的第二分组信息;
    分组信息调整模块,用于当所述第一分组信息和所述第二分组信息满足预设条件时,根据所述第一分组信息和所述第二分组信息获得第一调整分组信息和第二调整分组信息,所述第一调整分组信息与所述第一分组信息对应,所述第二调整分组信息与所述第二分组信息对应;其中,所述第一调整分组信息与所述第一分组信息相同且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息与所述第二分组信息相同;或,所述第一调整分组信息是基于对所述第一分组信息进行调整获得的且所述第二调整分组信息是基于对所述第二分组信息进行调整获得的;
    频谱获得模块,用于根据所述第一调整分组信息和所述第一声道的M个块的频谱获得第一待编码频谱;
    所述频谱获得模块,用于根据所述第二调整分组信息和所述第二声道的M个块的频谱获得第二待编码频谱;
    编码模块,用于利用编码神经网络对所述第一待编码频谱和所述第二待编码频谱进行编码,以获得频谱编码结果;将所述频谱编码结果写入码流。
  26. 一种多声道信号的解码装置,其特征在于,包括:
    分组信息获得模块,用于从码流中获得多声道信号的当前帧的第一声道的M个块的第一解码分组信息,所述第一解码分组信息用于指示所述第一声道的M个块的第一解码暂态标识;
    所述分组信息获得模块,用于从所述码流中获得所述当前帧的第二声道的M个块的第二解码分组信息,所述第二解码分组信息用于指示所述第二声道的M个块的第二解码暂态标识;
    解码模块,用于利用解码神经网络对所述码流进行解码,以获得所述第一声道的M个块的解码频谱和所述第二声道的M个块的解码频谱;
    重构信号获得模块,用于根据所述第一解码分组信息和所述第一声道的M个块的解码频谱获得所述第一声道的第一重构信号;
    所述重构信号获得模块,用于根据所述第二解码分组信息和所述第二声道的M个块的解码频谱获得所述第二声道的第二重构信号。
  27. 一种多声道信号的编码装置,其特征在于,所述多声道信号的编码装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求1至17中任一项所述的方法。
  28. 根据权利要求27所述的多声道信号的编码装置,其特征在于,所述多声道信号的编码装置还包括:所述存储器。
  29. 一种多声道信号的解码装置,其特征在于,所述多声道信号的解码装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求18至24中任一项所述的方法。
  30. 根据权利要求29所述的多声道信号的解码装置,其特征在于,所述多声道信号的解码装置还包括:所述存储器。
  31. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至17、或者18至24中任意一项所述的方法。
  32. 一种计算机可读存储介质,包括如权利要求1至17任一项所述的方法所生成的码流。
PCT/CN2022/096602 2021-07-29 2022-06-01 一种多声道信号的编解码方法和装置 WO2023005415A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020247004632A KR20240032117A (ko) 2021-07-29 2022-06-01 다중 채널 신호 인코딩 및 디코딩 방법 그리고 장치
EP22848025.7A EP4362012A1 (en) 2021-07-29 2022-06-01 Encoding and decoding methods and apparatuses for multi-channel signals
US18/423,990 US20240169998A1 (en) 2021-07-29 2024-01-26 Multi-Channel Signal Encoding and Decoding Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110865298.2 2021-07-29
CN202110865298.2A CN115691514A (zh) 2021-07-29 2021-07-29 一种多声道信号的编解码方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/423,990 Continuation US20240169998A1 (en) 2021-07-29 2024-01-26 Multi-Channel Signal Encoding and Decoding Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2023005415A1 true WO2023005415A1 (zh) 2023-02-02

Family

ID=85057730

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096602 WO2023005415A1 (zh) 2021-07-29 2022-06-01 一种多声道信号的编解码方法和装置

Country Status (5)

Country Link
US (1) US20240169998A1 (zh)
EP (1) EP4362012A1 (zh)
KR (1) KR20240032117A (zh)
CN (1) CN115691514A (zh)
WO (1) WO2023005415A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
CN1783727A (zh) * 2002-08-21 2006-06-07 中山正音数字技术有限公司 用于对多声道数字音频信号进行压缩编码的编码方法
JP2007011384A (ja) * 2006-07-07 2007-01-18 Victor Co Of Japan Ltd 音声符号化方法及び音声復号化方法
CN101055721A (zh) * 2004-09-17 2007-10-17 广州广晟数码技术有限公司 多声道数字音频编码设备及其方法
CN102157151A (zh) * 2010-02-11 2011-08-17 华为技术有限公司 一种多声道信号编码方法、解码方法、装置和系统
CN103295577A (zh) * 2013-05-27 2013-09-11 深圳广晟信源技术有限公司 用于音频信号编码的分析窗切换方法和装置
CN108885876A (zh) * 2016-03-10 2018-11-23 奥兰治 用于对多声道音频信号的参数编码和解码的空间化信息进行的优化编码和解码

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1783727A (zh) * 2002-08-21 2006-06-07 中山正音数字技术有限公司 用于对多声道数字音频信号进行压缩编码的编码方法
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
CN101055721A (zh) * 2004-09-17 2007-10-17 广州广晟数码技术有限公司 多声道数字音频编码设备及其方法
CN101246689A (zh) * 2004-09-17 2008-08-20 广州广晟数码技术有限公司 音频编码系统
JP2007011384A (ja) * 2006-07-07 2007-01-18 Victor Co Of Japan Ltd 音声符号化方法及び音声復号化方法
CN102157151A (zh) * 2010-02-11 2011-08-17 华为技术有限公司 一种多声道信号编码方法、解码方法、装置和系统
CN103295577A (zh) * 2013-05-27 2013-09-11 深圳广晟信源技术有限公司 用于音频信号编码的分析窗切换方法和装置
CN108885876A (zh) * 2016-03-10 2018-11-23 奥兰治 用于对多声道音频信号的参数编码和解码的空间化信息进行的优化编码和解码

Also Published As

Publication number Publication date
EP4362012A1 (en) 2024-05-01
US20240169998A1 (en) 2024-05-23
CN115691514A (zh) 2023-02-03
KR20240032117A (ko) 2024-03-08

Similar Documents

Publication Publication Date Title
US9516446B2 (en) Scalable downmix design for object-based surround codec with cluster analysis by synthesis
KR102492119B1 (ko) 오디오 코딩/디코딩 모드를 결정하는 방법 및 관련 제품
BR112020018466A2 (pt) representando áudio espacial por meio de um sinal de áudio e de metadados associados
US20230298600A1 (en) Audio encoding and decoding method and apparatus
WO2019029724A1 (zh) 时域立体声编解码方法和相关产品
WO2023005415A1 (zh) 一种多声道信号的编解码方法和装置
WO2023005414A1 (zh) 一种音频信号的编解码方法和装置
WO2019105436A1 (zh) 音频编解码方法和相关产品
TWI834163B (zh) 三維音頻訊號編碼方法、裝置和編碼器
WO2022262576A1 (zh) 三维音频信号编码方法、装置、编码器和系统
WO2022156556A1 (zh) 音频对象的比特分配方法和装置
WO2023173941A1 (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
EP4354430A1 (en) Three-dimensional audio signal processing method and apparatus
WO2023142783A1 (zh) 一种音频处理方法和终端
WO2022237851A1 (zh) 一种音频编码、解码方法及装置
WO2022253187A1 (zh) 一种三维音频信号的处理方法和装置
EP4174855A1 (en) Coding/decoding method and apparatus for multi-channel audio signal
CN116798438A (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
WO2023051370A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序
US20230154473A1 (en) Audio coding method and related apparatus, and computer-readable storage medium
WO2019029680A1 (zh) 时域立体声参数的编码方法和相关产品
US20240087578A1 (en) Three-dimensional audio signal coding method and apparatus, and encoder
EP4336494A1 (en) Encoding method and apparatus for multi-channel audio signals
WO2022242479A1 (zh) 三维音频信号编码方法、装置和编码器
KR20240005905A (ko) 3차원 오디오 신호 코딩 방법 및 장치, 및 인코더

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22848025

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022848025

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20247004632

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020247004632

Country of ref document: KR

Ref document number: KR1020247004632

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2022848025

Country of ref document: EP

Effective date: 20240125

NENP Non-entry into the national phase

Ref country code: DE