WO2021208792A1 - 音频信号编码方法、解码方法、编码设备以及解码设备 - Google Patents

音频信号编码方法、解码方法、编码设备以及解码设备 Download PDF

Info

Publication number
WO2021208792A1
WO2021208792A1 PCT/CN2021/085920 CN2021085920W WO2021208792A1 WO 2021208792 A1 WO2021208792 A1 WO 2021208792A1 CN 2021085920 W CN2021085920 W CN 2021085920W WO 2021208792 A1 WO2021208792 A1 WO 2021208792A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
frequency range
information
current frame
range
Prior art date
Application number
PCT/CN2021/085920
Other languages
English (en)
French (fr)
Inventor
夏丙寅
李佳蔚
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to BR112022020773A priority Critical patent/BR112022020773A2/pt
Priority to EP21788941.9A priority patent/EP4131261A4/en
Priority to MX2022012891A priority patent/MX2022012891A/es
Priority to KR1020227039651A priority patent/KR20230002697A/ko
Publication of WO2021208792A1 publication Critical patent/WO2021208792A1/zh
Priority to US17/965,979 priority patent/US20230048893A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • This application relates to the field of communications, and in particular to an audio signal encoding method, decoding method, encoding device, and decoding device.
  • the high frequency part and the low frequency part of the audio data are processed separately.
  • the correlation between signals of different frequency bands is often further used for coding.
  • using low-frequency band signals to generate high-frequency band signals through methods such as spectrum duplication or band expansion.
  • there are often some tonal components in the high-frequency spectrum that are not similar to the low-frequency spectrum and the existing solutions cannot process these dissimilar tonal components, which leads to lower encoding quality of the actual encoded data. . Therefore, how to obtain high-quality encoded data has become an urgent problem to be solved.
  • This application provides an audio signal encoding method, decoding method, encoding device, and decoding device, which are used to implement higher-quality audio encoding and decoding and improve user experience.
  • the present application provides an audio signal encoding method, including: acquiring a current frame of an audio signal, the current frame including a high-band signal and a low-band signal;
  • the configuration information of the current frame obtains the parameters of the frequency band expansion; the frequency area information is obtained, and the frequency area information is used to indicate the first frequency range in the high-band signal that needs tonal component detection; the tonal component detection is performed in the first frequency range to obtain Information on the tonal component of the high-band signal; code stream multiplexing the parameters of the frequency band extension and the information of the tonal component to obtain the payload code stream.
  • the tonal component can be detected according to the frequency range indicated by the frequency region information.
  • the frequency range is determined according to the configuration information of the band extension and the sampling frequency of the audio signal, so that the detected tonal component
  • the information can cover more frequency ranges with dissimilar tonal components between high-band signals and low-band signals, and encode based on the information of the tonal components covering more frequency ranges, thereby improving the coding quality.
  • the method provided in the first aspect may further include: performing code stream multiplexing on the frequency region information to obtain a configuration code stream. Therefore, in the embodiments of the present application, the frequency region information can be sent to the decoding device through the configuration code stream, so that the decoding device can decode according to the frequency range indicated by the frequency region information included in the configuration code stream, so that the high frequency can be decoded. The information of the dissimilar tonal components in the band signal and the low-band signal is decoded to further improve the decoding quality.
  • acquiring the frequency region information may include: determining the frequency region information according to the sampling frequency and configuration information of the audio signal.
  • the audio signal has one frame or multiple frames, and the corresponding frequency region information can be determined when each frame is encoded, or multiple frames can use the same frequency region information, and multiple implementation modes are provided. It can be adjusted according to actual application scenarios.
  • the frequency area information may include at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification The information is used to indicate whether the first frequency range is the same as the second frequency range corresponding to the frequency band extension indicated by the configuration information, and the relationship information is used to indicate the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the relationship between the size of the frequency ranges, the number of changes in the frequency range is the number of frequency regions that differ between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range. Therefore, it is possible to accurately determine the frequency range in which tonal component detection needs to be performed based on the frequency region information.
  • the configuration information of the frequency band extension includes the upper limit of the frequency band extension and/or the second number, where the second number is the number of frequency regions in the second frequency range; the above method may further include: according to the current frame One or more of the encoding rate, the number of channels of the audio signal, the sampling frequency of the audio signal, the upper limit of band expansion, or the second number, determine the first number. Therefore, in the embodiments of the present application, the frequency region that needs to be detected for tonal components can be accurately determined according to one or more of the encoding rate of the current frame, the number of audio signal channels, the sampling frequency, the upper limit of the band expansion, or the second number. quantity.
  • the upper limit of the frequency band extension includes one or more of the following: the highest frequency in the second frequency range, the highest frequency point sequence number, the highest frequency band sequence number, or the highest frequency region sequence number.
  • the number of channels of the audio signal is at least one; the foregoing is based on one or more of the encoding rate of the current frame, the number of channels of the audio signal, the sampling frequency, the upper limit of the band expansion, or the second number
  • Determining the first number may include: determining the first judgment flag of the current channel in the current frame according to the coding rate and the number of channels of the current frame, the coding rate of the current frame is the coding rate of the current frame; according to the first judgment flag, combined with the first judgment flag Two quantity, determine the first quantity of the current channel; or, determine the second judgment flag of the current channel in the current frame according to the sampling frequency and the upper limit of the band expansion; determine the first judgment flag of the current channel according to the second judgment flag, combined with the second quantity Number; or, according to the encoding rate and the number of channels in the current frame, determine the first judgment indicator of the current channel in the current frame, and determine the second judgment indicator of the current channel in the current frame according to the sampling frequency and
  • the first number can be determined by combining the second number in various ways, so as to accurately determine the number of frequency regions that need to be detected for the tonal component.
  • determining the first judgment identifier of the current channel in the current frame according to the encoding rate and the number of channels in the current frame may include: obtaining each of the current frames according to the encoding rate and the number of channels in the current frame. The average coding rate of the channel; according to the average coding rate and the first threshold, the first judgment identifier of the current channel is obtained.
  • the first judgment indicator of the current channel can be obtained according to the average coding rate, so that the first judgment indicator indicates whether the average coding rate is greater than the first threshold, so that the first number obtained subsequently is more accurate.
  • determining the first judgment identifier of the current channel in the current frame according to the encoding rate and the number of channels in the current frame may also include: determining the encoding rate of the current channel according to the encoding rate and the number of channels in the current frame. The actual encoding rate; according to the actual encoding rate of the current channel and the second threshold, the first judgment identifier of the current channel is obtained.
  • each channel may be assigned an actual encoding rate, so that the first judgment identifier indicates whether the actual encoding rate of the current channel is greater than the second threshold, so that the first number obtained subsequently is more accurate.
  • the foregoing determination of the second judgment identifier of the current channel in the current frame according to the sampling frequency and the upper limit of the band expansion may include: when the upper limit of the band expansion includes the highest frequency, comparing the highest included in the upper limit of the band expansion Whether the frequency is the same as the highest frequency of the audio signal, determine the second judgment identifier of the current channel in the current frame; or, when the upper limit of frequency band extension includes the highest frequency band number, compare the highest frequency band number included in the upper limit of frequency band extension with the highest frequency band number of the audio signal Whether they are the same, the second judgment identifier of the current channel in the current frame is determined, and the highest frequency band number of the audio signal is determined by the sampling frequency.
  • the second judgment identifier can be determined by comparing the highest frequency included in the upper limit of the band expansion with the highest frequency of the audio signal, or the highest frequency point sequence number, the highest frequency band sequence number, or the sequence number of the highest frequency included in the upper limit of the band expansion
  • the highest frequency area serial number is compared with the highest frequency point serial number, the highest frequency band serial number, or the highest frequency region serial number corresponding to the audio signal to determine whether the highest frequency of the audio signal exceeds the upper frequency limit of the frequency band extension, so as to obtain a more accurate first number .
  • the above-mentioned determining the first number of the current channel in the current frame may include: if both the first determination identifier and the second determination identifier meet a preset condition, expanding the second number corresponding to the frequency band One or more frequency regions are added on the basis as the first number of the current channel; or if the first judgment flag or the second judgment flag does not meet the preset condition, the second number corresponding to the frequency band extension is taken as the first number of the current channel quantity.
  • the frequency range of the tonal component that needs to be detected exceeds the frequency range corresponding to the frequency band extension, and the number of frequency regions needs to be increased, thereby The number of frequency regions for the tonal component detection can cover the corresponding frequency range of the frequency band extension, so that the finally obtained tonal component information can cover all the tonal component information in the current frame of the tonal signal, and the coding quality is improved.
  • the pitch detection can be performed on the frequency range corresponding to the frequency band extension in the current frame, and it can also completely cover all the tonal component information in the current frame to improve the coding quality.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band extension indicated by the configuration information; when the frequency region information includes that the first number is less than or equal to the second number corresponding to the band extension ,
  • the distribution of frequency regions in the first frequency range is the same as the distribution of frequency regions in the second frequency range indicated in the configuration information; when the first number is greater than the second number, the upper frequency limit of the first frequency range is greater than the second frequency
  • the upper frequency limit of the range, the distribution of the frequency region of the overlapping part of the first frequency range and the second frequency range is the same as the distribution of the frequency region within the second frequency range,
  • the distribution of frequency regions is determined according to a preset method.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range of the frequency band extension. Subsequent comparisons can be made between the number of frequency regions in the first frequency range and the frequency regions in the second frequency range. The number of, determines the division method of the frequency region in the first frequency range, so as to accurately determine the frequency region included in the first frequency range.
  • the frequency region in the non-coincident part of the first frequency range and the second frequency range satisfies the following condition: the width of the frequency region in the non-coincident part of the first frequency range and the second frequency range It is less than or equal to the preset value, and the upper frequency limit of the frequency region in the non-overlapping part of the first frequency range and the second frequency range is less than or equal to the highest frequency of the audio signal. Therefore, in the implementation of the present application, the division method for the non-overlapping part of the first frequency range and the second frequency range may be limited, that is, the width does not exceed the preset value, and the upper frequency limit of the frequency region is less than or equal to that of the audio signal. The highest frequency, which can achieve a more reasonable division of frequency regions.
  • the implementation manner of this application divides the frequency range, and the first frequency range can be divided into one or more frequency regions, and each frequency region can be divided into one or more frequency bands.
  • the frequency bands in the frequency range can be sorted, and each frequency band has a different sequence number, so that the size of the frequency can be compared by comparing the sequence numbers of the frequency bands.
  • the number of frequency regions in the first frequency range is a preset number. Therefore, in the embodiments of the present application, the number of frequency regions that need to be detected for tonal components can also be set to a preset number, so that the workload can be directly reduced.
  • the preset number may be written in the configuration code stream, or may not be written in the configuration code stream.
  • the information of the tone component may include a position quantity parameter of the tone component, and an amplitude parameter or energy parameter of the tone component.
  • the pitch component information may also include the noise floor parameter of the high-band signal.
  • the present application provides a decoding method, which includes: obtaining a payload code stream; demultiplexing the payload code stream to obtain the frequency band extension parameters and tone component information of the current frame of the audio signal;
  • the extended parameters obtain the high-band signal of the current frame; perform reconstruction according to the tonal component information and the frequency region information to obtain the reconstructed tonal signal, and the frequency region information is used to indicate the first frequency range in the current frame where the tonal component needs to be reconstructed;
  • the high-frequency band signal and the reconstructed tone signal are used to obtain the decoded signal of the current frame.
  • the frequency range that needs to be reconstructed of the tonal components can be determined according to the frequency region information.
  • the frequency range is determined according to the configuration information of the frequency band extension and the sampling frequency of the audio signal, so that the frequency region information can be used for high frequency
  • the tonal components of the dissimilar components between the band signal and the low-band signal are reconstructed to improve the decoding quality.
  • the method may further include: obtaining a configuration code stream; and obtaining frequency region information according to the configuration code stream. Therefore, in the embodiments of the present application, decoding can be performed according to the frequency range indicated by the frequency region information included in the configuration code stream, so that the information of dissimilar tonal components in the high-band signal and the low-band signal can be decoded to improve Decoding quality.
  • the frequency area information may include at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification The information is used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same, and the relationship information is used to indicate whether the first frequency range and the second frequency range are different when the first frequency range is different from the second frequency range.
  • the number of changes in the frequency region is the number of frequency regions with differences between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • performing reconstruction according to the tonal component information and frequency region information to obtain the reconstructed tonal signal includes: determining, according to the frequency region information, the number of frequency regions that require tonal component reconstruction is the first number; The first quantity is to determine each frequency region in the first frequency range for the tonal component reconstruction; in the first frequency range, the tonal component is reconstructed according to the tonal component information to obtain the reconstructed tonal signal.
  • tonal component reconstruction can be performed based on the frequency range indicated by the frequency region information, so that the information of dissimilar tonal components in the high-band signal and the low-band signal can be decoded, and the decoding quality can be improved.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band expansion indicated by the configuration information.
  • Each frequency region may include: if the first number is less than or equal to the second number, the distribution of the frequency regions in the first frequency range is determined according to the distribution of the frequency regions in the second frequency range, and the second number is the second frequency range If the first number is greater than the second number, it is determined that the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range, and the first frequency range is determined according to the distribution of the frequency areas in the second frequency range The distribution of frequency regions in the part that overlaps with the second frequency range, and the distribution of the frequency regions in the non-overlapping part of the first frequency range and the second frequency range is determined in a preset manner to obtain each frequency region in the first frequency range Distribution.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range of the frequency band extension. Subsequent comparisons can be made between the number of frequency regions in the first frequency range and the number of frequency regions in the second frequency range. , Determining the division manner of the frequency regions in the first frequency range, so as to accurately determine the frequency regions included in the first frequency range.
  • the frequency region in the non-coincidence portion of the first frequency range and the second frequency range satisfies the following condition: the width of the frequency region in the non-coincidence portion of the first frequency range and the second frequency range is less than Or equal to the preset value, and the upper frequency limit of the frequency region in the non-overlapping part of the first frequency range and the second frequency range is less than or equal to the highest frequency of the audio signal. Therefore, in the implementation of the present application, the division method for the non-overlapping part of the first frequency range and the second frequency range may be limited, that is, the width does not exceed the preset value, and the upper frequency limit of the frequency region is less than or equal to that of the audio signal. The highest frequency, which can achieve a more reasonable division of frequency regions.
  • this application provides an encoding device, including:
  • the audio acquisition module is used to acquire the current frame of the audio signal, the current frame includes a high-frequency band signal and a low-frequency band signal;
  • the parameter acquisition module is used to obtain the parameters of the frequency band extension of the current frame according to the high-frequency band signal, the low-frequency band signal and the preset configuration information of the frequency band expansion;
  • a frequency acquisition module configured to acquire frequency region information, and the frequency region information is used to indicate the first frequency range in the high-band signal that needs to be detected for tonal components;
  • a tone component encoding module configured to perform tone component detection in the first frequency range to obtain the information of the tone component of the high frequency band signal
  • the code stream multiplexing module is used to perform code stream multiplexing on the parameters of the frequency band extension and the information of the tone component to obtain the payload code stream.
  • the encoding device may further include:
  • the code stream multiplexing module is also used to perform code stream multiplexing on frequency region information to obtain a configuration code stream.
  • the frequency acquisition module is specifically configured to determine the frequency region information according to the sampling frequency of the audio signal and the configuration information of the frequency band extension.
  • the frequency area information includes at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification information Used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same, and the relationship information is used to indicate the difference between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the number of changes in the frequency region is the number of frequency regions with differences between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the frequency region information includes at least a first number
  • the configuration information of the frequency band extension includes a frequency band extension upper limit and/or a second number
  • the second number is the number of frequency regions in the second frequency range
  • the frequency acquisition module is specifically configured to determine the first number according to one or more of the encoding rate of the current frame, the number of audio signal channels, the sampling frequency, the upper limit of band expansion, or the second number.
  • the upper limit of the frequency band extension includes one or more of the following: the highest frequency in the second frequency range, the highest frequency point sequence number, the highest frequency band sequence number, or the highest frequency region sequence number.
  • the number of audio signal channels is at least one
  • Frequency acquisition module specifically used for:
  • the encoding rate of the current frame is the encoding rate of the current frame; according to the first judgment indicator, combined with the second number, determine the first judgment indicator of the current channel A quantity
  • the judgment flag combined with the second quantity, determines the first quantity of the current channel in the current frame.
  • the frequency acquisition module is specifically configured to: obtain the average coding rate of each channel in the current frame according to the coding rate and the number of channels of the current frame; obtain the current coding rate according to the average coding rate and the first threshold The first judgment indicator of the channel.
  • the frequency acquisition module can be specifically used to: determine the actual encoding rate of the current channel according to the encoding rate of the current frame and the number of channels; obtain the current channel according to the actual encoding rate of the current channel and the second threshold The first judgment flag.
  • the frequency acquisition module may be specifically used to: when the upper limit of the frequency band extension includes the highest frequency, compare whether the highest frequency included in the upper limit of the frequency band extension is the same as the highest frequency of the audio signal to determine the current channel in the current frame Or, when the upper limit of frequency band extension includes the highest frequency band sequence number, compare whether the highest frequency band sequence number included in the upper limit of frequency band extension is the same as the highest frequency band sequence number of the audio signal, and determine the second judgment identifier of the current channel in the current frame, audio The highest frequency band number of the signal is determined by the sampling frequency.
  • the frequency acquisition module can be specifically used for:
  • the second number corresponding to the frequency band extension is used as the first number of the current channel.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band extension indicated by the configuration information; when the frequency region information includes that the first number is less than or equal to the second number corresponding to the band extension
  • the distribution of the frequency region in the first frequency range and the distribution of the frequency region in the second frequency range when the first number is greater than the second number, the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range, the first The distribution of frequency regions in the overlapping part of a frequency range and the second frequency range is the same as the distribution of frequency regions in the second frequency range.
  • the distribution of frequency regions in the non-overlapping part of the first frequency range and the second frequency range is based on the prediction Set the way to determine.
  • the width of the frequency region in the non-overlapping part of the first frequency range and the second frequency range is smaller than the preset value, and the frequency in the non-overlapping part of the first frequency range and the second frequency range
  • the upper frequency limit of the area is less than or equal to the highest frequency of the audio signal.
  • the frequency range corresponding to the high-band signal includes at least one frequency region, where one frequency region includes at least one frequency band.
  • the number of frequency regions in the first frequency range is a preset number.
  • the pitch component information includes a position quantity parameter of the pitch component, and an amplitude parameter or an energy parameter of the pitch component.
  • the pitch component information further includes a noise floor parameter of the high-band signal.
  • this application provides a decoding device, including:
  • the acquisition module is used to acquire the payload code stream
  • the demultiplexing module is used to demultiplex the payload code stream to obtain the frequency band extension parameters and tone component information of the current frame of the audio signal;
  • the frequency band extension decoding module is used to obtain the high frequency band signal of the current frame according to the parameters of the frequency band extension;
  • the reconstruction module is used to reconstruct according to the tonal component information and frequency region information to obtain a reconstructed tonal signal, and the frequency region information is used to indicate the first frequency range in the current frame where the tonal component needs to be reconstructed;
  • the signal decoding module is used to obtain the decoded signal of the current frame according to the high frequency band signal and the reconstructed tone signal.
  • the obtaining module may also be used to: obtain a configuration code stream; obtain frequency region information according to the configuration code stream.
  • the frequency area information includes at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification information Used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same, and the relationship information is used to indicate the difference between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the number of changes in frequency regions is the number of frequency regions with differences between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the reconstruction module can be specifically used to: determine the number of frequency regions that need to be reconstructed of tonal components as the first number according to the frequency region information; and determine the first frequency range to perform the tone according to the first number Each frequency region of the component reconstruction; in the first frequency range, the tonal component is reconstructed according to the information of the tonal component to obtain a reconstructed tonal signal.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band expansion indicated by the configuration information
  • the acquiring module can be specifically used to: if the first number is less than or equal to the second number , The frequency area in the overlapping part of the first frequency range and the second frequency range is determined according to the distribution of the frequency areas in the second frequency range, and the second number is the number of frequency areas in the second frequency range; if the first number If it is greater than the second number, it is determined that the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range, and the frequency area in the overlapping part of the first frequency range and the second frequency range is determined according to the distribution of the frequency areas in the second frequency range And the distribution of frequency regions in the non-overlapping part of the first frequency range and the second frequency range is determined according to a preset manner, so as to obtain the distribution of each frequency region in the first frequency range.
  • the frequency regions divided in the non-overlapping part of the first frequency range and the second frequency range meet the following conditions:
  • the width is less than the preset value, and the upper frequency limit of the frequency region divided in the non-overlapping part of the first frequency range and the second frequency range is less than or equal to the highest frequency of the audio signal.
  • the pitch component information includes a position quantity parameter of the pitch component, and an amplitude parameter or an energy parameter of the pitch component.
  • the pitch component information further includes a noise floor parameter of the high-band signal.
  • the present application provides an encoding device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute the code shown in any one of the above-mentioned first aspects. Functions related to processing in the audio signal coding method.
  • the present application provides a decoding device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute the decoding method shown in any two of the above second aspect Processing-related functions in the.
  • the present application provides a communication system, including: an encoding device and a decoding device, the encoding device is configured to execute the audio signal encoding method shown in any one of the foregoing first aspect, and the decoding device is configured to execute any of the foregoing second aspect.
  • an embodiment of the present application provides a digital processing chip.
  • the chip includes a processor and a memory.
  • the memory and the processor are interconnected by wires, and instructions are stored in the memory.
  • the processor is used to execute the first aspect or the first aspect described above. Any optional implementation manner, or a processing-related function in the second aspect or any optional implementation manner of the second aspect.
  • the embodiments of the present application provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the first aspect or any optional implementation manner of the first aspect, or, The method in any optional implementation of the second aspect or the second aspect.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the foregoing first aspect or any optional implementation manner of the first aspect, or, the second aspect Or the method in any optional embodiment of the second aspect.
  • the present application provides a network device, which can be applied to a device such as an encoding device or a decoding device.
  • the network device is coupled with a memory, and is used to read and execute instructions stored in the memory so that all
  • the network device implements the steps of the method provided by any one of the first aspect to the second aspect of the present application.
  • the port detection device is a chip or a system on a chip.
  • the present application provides a computer-readable storage medium that stores a payload code stream generated according to the method provided in any one of the first aspect to the second aspect of the present application.
  • the present application provides a computer program stored on a computer-readable storage medium.
  • the computer program includes instructions. When the instructions are executed, any one of the first to second aspects of the present application is implemented. A method provided by any embodiment of the aspect.
  • FIG. 1 is a schematic diagram of the architecture of a communication system provided by this application.
  • FIG. 2 is a schematic structural diagram of another communication system provided by this application.
  • FIG. 3 is a schematic structural diagram of an encoding and decoding device provided by this application.
  • FIG. 4 is a schematic structural diagram of another encoding and decoding device provided by this application.
  • FIG. 5 is a schematic flowchart of an audio signal encoding method provided by this application.
  • FIG. 6A is a schematic diagram of a frequency region division method provided by an embodiment of this application.
  • FIG. 6B is a schematic diagram of another frequency region division method provided by an embodiment of this application.
  • FIG. 6C is a schematic diagram of another frequency region division method provided by an embodiment of this application.
  • FIG. 7 is a schematic flowchart of a decoding method provided by this application.
  • FIG. 8 is a schematic structural diagram of an encoding device provided by this application.
  • FIG. 9 is a schematic structural diagram of a decoding device provided by this application.
  • FIG. 10 is a schematic structural diagram of another encoding device provided by this application.
  • FIG. 11 is a schematic structural diagram of another decoding device provided by this application.
  • This application provides an audio signal encoding method, decoding method, encoding device, and decoding device, which are used to implement higher-quality audio encoding and decoding and improve user experience.
  • the audio signal encoding method and decoding method provided in this application can be applied to various systems with data transmission.
  • FIG. 1 is a schematic diagram of the architecture of a communication system provided by the present application.
  • the communication system may include multiple devices, such as terminals or servers, etc., and the multiple devices may be connected through a network.
  • the network can be a wired communication network or a wireless communication network, such as: the fifth-generation mobile communication technology (5th-Generation, 5G) system, the long-term evolution (LTE) system, and the global mobile communication system ( global system for mobile communication, GSM or code division multiple access (CDMA) network, wideband code division multiple access (WCDMA) network, etc., and wireless fidelity (wireless fidelity, WiFi) ), WAN and other communication networks or communication systems.
  • 5G fifth-generation mobile communication technology
  • LTE long-term evolution
  • GSM global system for mobile communication
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • WiFi wireless fidelity
  • the number of the terminal device may be one or multiple, such as terminal 1, terminal 2, or terminal 3 as shown in FIG. 1.
  • the terminal in the communication system may include a head mounted display device (Head Mount Display, HMD), the head mounted display device may be a combination of a VR box and a terminal, a VR all-in-one machine, a personal computer (PC) VR , Augmented reality (AR) equipment, mixed reality (MR) equipment, etc.
  • the terminal equipment may also include cellular phones, smart phones, personal digital assistants, PDA), tablet computer, laptop computer (laptop computer), personal computer (PC), or computing device deployed on the user side, etc.
  • the number of servers can be one or more.
  • the multiple servers can be distributed servers or centralized servers, which can be adjusted according to actual application scenarios. There is no restriction on this.
  • the aforementioned terminal or server can be used as an encoding device or as a decoding device. It can be understood that the aforementioned terminal or server can execute the audio signal encoding method provided in this application, and can also execute the decoding method used in this application.
  • the encoding device and the decoding device may also be independent devices. For example, one terminal may be used as the encoding device and the other terminal may be used as the decoding device.
  • two terminals are taken as an example below to describe the communication system provided in the present application in more detail.
  • both the terminal 1 and the terminal 2 may include an audio collection module, a multi-channel encoder, a channel encoder, a channel decoder, a multi-channel decoder, and an audio playback module.
  • the terminal 1 performs the audio signal encoding method and the terminal 2 performs the decoding method as an example for a brief exemplary description.
  • the terminal 1 performs the audio signal encoding method and the terminal 2 performs the decoding method as an example for a brief exemplary description.
  • the specific steps performed please refer to the description in FIG. 4 or FIG. 5 below.
  • the audio collection module of the terminal 1 can obtain audio signals
  • the audio collection module can include devices such as sensors, microphones, cameras, and recorders, or the audio collection module can also directly receive audio signals sent by other devices.
  • the audio signal is a multi-channel signal
  • the audio signal is encoded by a multi-channel encoder, and then the signal obtained by encoding by the multi-channel encoder is encoded by a channel encoder to obtain an encoded code stream.
  • the code stream is transmitted to the network device 1 in the communication network, and the network device 1 transmits to the network device 2 through the digital channel, and then the network device 2 transmits the code stream to the terminal 2.
  • the network device 1 or the network device 2 may be a forwarding device in a communication network, such as a router or a switch.
  • the terminal 2 After receiving the coded code stream, the terminal 2 performs channel decoding on the coded code stream through a channel decoder to obtain a channel-decoded signal.
  • the audio playback module can play back the audio signal.
  • the audio playback module may include devices such as speakers or earphones.
  • the audio signal can also be collected by the audio collection module of the terminal 2, and the coded stream is obtained through the multi-channel encoder and the channel encoder, and the coded stream is sent to the terminal 1 via the communication network. Then, it is decoded by the channel decoder and multi-channel decoder of the terminal 1 to obtain the audio signal, and the audio is played through the audio playback module of the terminal 1.
  • the encoding device in the communication system may be a forwarding device that does not have audio collection and audio playback functions.
  • FIG. 3 a schematic structural diagram of an encoding device provided in the present application.
  • the encoding device may include a channel decoder 301, an audio decoder 302, a multi-channel encoder 303, and a channel encoder 304.
  • channel decoding can be performed by the channel decoder 301 to obtain a channel decoded signal.
  • the audio decoder 302 performs audio decoding on the channel decoded signal to obtain an audio signal.
  • the audio signal is multi-channel encoded by the multi-channel encoder 303 to obtain a multi-channel encoded signal.
  • the channel encoder 304 performs channel encoding on the multi-channel encoded signal to obtain the updated code stream, and The updated code stream is sent to other devices to complete the forwarding of the code stream.
  • the types of encoders and decoders used may also be different.
  • the channel decoded signal is multi-channel decoded by the multi-channel decoder 402 to recover the audio Signal.
  • the audio signal is encoded by the audio encoder 403, and the data encoded by the audio encoder 403 is channel-encoded by the channel encoder 404 to obtain an updated coded stream.
  • the aforementioned multi-channel audio signal scene has been introduced.
  • the aforementioned multi-channel audio signal can also be replaced with a stereo signal, a two-channel signal, etc., taking a stereo signal as an example, the aforementioned multi-channel audio signal can be replaced with a stereo signal.
  • the multi-channel encoder can be replaced with a stereo encoder, or the multi-channel decoder can also be replaced with a stereo decoder, etc.
  • Three-dimensional audio has become a new trend in the development of audio services because it can bring users a better immersive experience.
  • Three-dimensional audio can be understood as including multi-channel audio.
  • the original audio signal format that needs to be compressed and encoded can be divided into: channel-based audio signal format, object-based audio signal format, scene-based audio signal format, and mixed signals of any three audio signal formats Format.
  • the audio signals that the audio encoder needs to compress and encode include multiple signals, which can also be understood as multiple channels. Under normal circumstances, the audio encoder uses the correlation between channels to down-mix multiple signals to obtain down-mixed signals and multi-channel coding parameters.
  • the number of channels included in the downmix signal is much smaller than the number of channels of the input audio signal.
  • a multi-channel signal can be downmixed into a stereo signal. Then, use to encode the downmix signal. You can also choose to further downmix the stereo signal into a mono signal and stereo coding parameters, and encode the mono signal after the blind mixing.
  • the number of bits used for encoding downmix signals and multi-channel encoding parameters is much smaller than that of independently encoding multi-channel input signals. Therefore, the workload of the encoder can be reduced, and the data volume of the encoded code stream obtained after encoding can be reduced, and the transmission efficiency can be improved.
  • the correlation between signals of different frequency bands is often further used for coding.
  • the encoding device encodes the low-frequency band signal and the correlation data between the low-frequency band signal and the high-frequency band, so as to encode the high-frequency band signal with a smaller number of bits, thereby reducing the encoding bit rate of the entire encoder.
  • EVS Enhance Voice Services
  • 3GPP 3rd generation partnership project
  • MPEG moving picture experts group
  • the correlation between signals of different frequency bands is utilized, and the high-frequency band signals are encoded by using frequency band extension technology or spectrum replication technology.
  • this application provides an audio signal encoding method and decoding method for improving the encoding and decoding quality of audio signals, even in a scene where there are tonal components that are not similar to the low-frequency spectrum in the high-frequency spectrum. Obtain a high-quality code stream, so that the decoder can decode and obtain high-quality audio signals, and improve user experience.
  • FIG. 5 a schematic flowchart of an audio signal encoding method provided by the present application is as follows.
  • the current frame may be any frame in the audio signal, and the current frame may include a high-band signal and a low-band signal, and the frequency of the high-band signal is higher than the frequency of the low-band signal.
  • the division of high-band signals and low-band signals can be determined by the frequency band threshold. Signals above the frequency band threshold are high-band signals, and signals below the frequency band threshold are low-band signals.
  • the frequency band threshold can be determined according to The transmission bandwidth and the processing capability of the encoder or decoder are determined, which is not limited in this application.
  • the high-band signal and the low-band signal are relative terms.
  • a signal lower than a certain frequency that is, a frequency band threshold
  • a signal higher than that frequency is a high-band signal (the frequency corresponds to
  • the signal can be divided into low-band signals, can also be divided into high-band signals).
  • the frequency varies according to the bandwidth of the current frame. For example, when the current frame is a 0-8khz wideband signal, the frequency may be 4khz; when the current frame is a 0-16khz ultra-wideband signal, the frequency may be 8khz.
  • the audio signal in the embodiment of the present application may include multiple frames.
  • the current frame may specifically refer to a certain frame in the audio signal.
  • the codec of the current frame of the audio signal is used as an example.
  • the previous frame or the next frame of the current frame in the audio signal can be coded and decoded according to the codec mode of the current frame audio signal, and the codec of the previous frame or the next frame of the current frame in the audio signal The process will not be explained one by one.
  • the audio signal in the embodiment of the present application may be a mono audio signal, or may also be a stereo signal (or a multi-channel signal).
  • the stereo signal can be the original stereo signal, it can also be a stereo signal composed of two signals (the left channel signal and the right channel signal) included in the multi-channel signal, or it can be a multi-channel signal.
  • the audio signal may be a multi-channel signal or a single-channel signal.
  • the audio signal is a multi-channel signal
  • the signal of each channel can be encoded.
  • the current channel only the encoding process of the signal of one of the channels (hereinafter referred to as the current channel) is taken as an example for illustration.
  • the following steps 502-506 can be performed for each channel in the audio signal, and the repeated steps will not be repeated in this application.
  • the channels mentioned in this application can also be replaced with channels.
  • the aforementioned multi-channels can also be replaced with multi-channels.
  • the following embodiments are referred to as channels.
  • the high frequency band in the process of encoding the high frequency band signal and the low frequency band signal, the high frequency band can be divided into multiple frequency regions.
  • the frequency band extension parameters can be determined in units of frequency regions, that is to say, each frequency region has its own frequency band extension parameters.
  • the parameters of the frequency band extension may include different parameters in different scenarios, and the parameters specifically included in the parameters of the frequency band extension may be determined according to actual application scenarios.
  • the parameters of the frequency band expansion may include high-band linear predictive coding (linear predictive coding, LPC) parameters, high-band gain or filtering parameters, and so on.
  • the parameters of the frequency domain expansion may also include parameters such as a time domain envelope or a frequency domain envelope.
  • the configuration information of the frequency band extension may be pre-configured information, which may be specifically determined according to the data processing capability of the encoder or the decoder.
  • the configuration information of the frequency band extension may include an upper limit of the frequency band extension or a second number, etc., where the second number is the number of frequency regions for which the frequency band is extended.
  • the second frequency range corresponding to the frequency band extension can be indicated by the upper limit of the frequency band extension or the second quantity.
  • the lower frequency limit of the second frequency range can usually be fixed.
  • the frequency band threshold in step 501 can pass the frequency band.
  • the extended upper limit indicates the upper frequency limit of the second frequency range, so that the second frequency range can be determined according to the determined lower frequency limit and upper frequency limit.
  • the lower frequency limit of the second frequency range can usually be fixed.
  • the frequency band threshold in step 501 can be used to query the second frequency corresponding to the second frequency through a preset table. The boundary of the frequency region, thereby determining the second frequency range.
  • the upper limit of the frequency band extension included in the configuration information of the frequency band extension may include, but is not limited to, one or more of the following: the highest frequency value in the second frequency range, the highest frequency point sequence number, the highest frequency band sequence number, or the highest frequency region sequence number .
  • the sequence number of the highest frequency point in the second frequency range is the sequence number of the highest frequency point in the second frequency range
  • the sequence number of the highest frequency band is the sequence number of the highest frequency band in the second frequency range
  • the sequence number of the highest frequency region is the second The serial number of the frequency zone with the highest frequency in the frequency range.
  • the aforementioned highest frequency point sequence number, highest frequency band sequence number, and highest frequency region sequence number may increase as the value of the frequency increases.
  • the sequence number of the lower frequency point is smaller than the sequence number of the higher frequency point.
  • the sequence number of the lower frequency band is smaller than the sequence number of the higher frequency frequency band, and the sequence number of the lower frequency frequency region is smaller than the sequence number of the higher frequency frequency region.
  • the numbering of frequency points, frequency bands or frequency regions can be numbered according to a preset sequence, or a fixed number can be assigned to each frequency point, frequency band or frequency region, which can be specifically based on actual application scenarios. Make adjustments, and this application does not limit it.
  • the encoding parameters of the high-band signal or the low-band signal can also be obtained.
  • Obtain time-domain noise shaping parameters, frequency-domain noise shaping parameters, or spectrum quantization parameters of high-band signals or low-band signals, among which the time-domain noise-shaping parameters and frequency-domain noise-shaping parameters are used to preprocess the spectral coefficients to be coded Can improve the quantization coding efficiency of the spectral coefficients.
  • the spectral quantization parameters are the quantized spectral coefficients and the corresponding gain parameters.
  • the frequency region information is used to indicate the first frequency range in the high-band signal of the current frame.
  • the frequency range that needs to be detected for tonal components is referred to as the first frequency range
  • the frequency range corresponding to the frequency band extension indicated by the configuration information is referred to as the second frequency range
  • the lower frequency limit and the first frequency range of the first frequency range are the same, so I won't repeat them in the following.
  • the frequency area information includes one or more of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, and so on.
  • the first number is the number of frequency regions in the first frequency range.
  • the frequency range can be divided into frequency regions (tile); each frequency region can be divided into at least one frequency band according to a preset frequency band division method, and a frequency band can be understood as a scale factor band. (scale factor band, SFB).
  • scale factor band SFB
  • the frequency area may be divided in units of 1 KHz, and then within each frequency area, the frequency band may be divided in units of 200 Hz.
  • the corresponding frequency widths of different frequency regions may be the same or different; the frequency widths corresponding to different frequency bands may be the same or different.
  • the identification information is used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same. For example, when the identification information includes 0, it means that the first frequency range is different from the second frequency range, and when the identification information includes 1, it means that the first frequency range is the same as the second frequency range.
  • the relationship information is used to indicate the magnitude relationship between the first frequency range and the second frequency range. For example, 2 bits may be used to indicate the magnitude relationship between the first frequency range and the second frequency range, such as the same, increase, or decrease relationship. For example, when the relationship information includes 00, it means that the first frequency range is equal to the second frequency range. When the relationship information includes 01, it means that the first frequency range is greater than the second frequency range; when the relationship information includes 10, it means the first frequency. The range is smaller than the second frequency range and so on.
  • the number of frequency region changes is the number of frequency regions with a difference between the first frequency range and the second frequency range.
  • the range of the number of changes in the frequency region can be [-N, N], where N means that the first frequency range has N more frequency regions than the second frequency range, and -N means that the first frequency range is N less than the second frequency range. area.
  • the frequency area information includes at least the first number.
  • the frequency area information also includes but is not limited to one or more of identification information, relationship information, or the number of frequency area changes.
  • indicating the first frequency range through the frequency region information can be understood as: when the frequency region information includes the first number, the boundary of each frequency region in the first number of frequency regions can be determined by querying a preset table, That is, the frequency range covered by each frequency region, thereby obtaining the first frequency range.
  • the lower boundary of the first frequency region in the first number of frequency regions is the lower boundary of the second frequency range for band expansion. It is understandable that when the first number of frequency regions are continuous in the frequency domain, the first frequency can also be determined only according to the lower boundary of the first frequency region and the upper boundary of the last frequency region. Scope.
  • the frequency region information includes identification information
  • the identification information indicates that the first frequency range and the second frequency range are the same
  • the second frequency range may be used as the first frequency range.
  • the relationship information can be used to determine the magnitude relationship between the first frequency range and the second frequency range, for example, the first frequency range is larger than the second frequency range, or The second frequency range is larger than the first frequency range and so on.
  • the frequency area information may also include relationship information. In this case, the relationship information may also indicate that the first frequency range and the second frequency range are the same.
  • the size relationship between the first frequency range and the second frequency range can be determined according to the relationship information, and then the first frequency is determined according to the number of changes in the frequency region
  • the number of frequency regions in the different frequency ranges between the range and the second frequency range is then determined according to a preset method, such as a table lookup, a preset bandwidth plan, etc., to determine the specific range of the first frequency range. For example, if the first frequency range and the second frequency range are not the same, the relationship information can be used to determine which of the first frequency range and the second frequency range is greater.
  • the first frequency range is greater than the second frequency range, then According to the number of frequency regions in the part where the first frequency range and the second frequency range do not overlap, query the preset table, or divide according to the preset bandwidth, so that the first frequency range and the second frequency range do not overlap Part of the boundary of the frequency region, thereby determining the accurate frequency range covered by the first frequency range.
  • Method 1 Determine the frequency region information according to the sampling frequency of the audio signal and the preset configuration information of the frequency band extension
  • the frequency region information includes at least the first number, and the number of audio signal channels is at least one.
  • step 503 may specifically include: determining the first number of current channels according to one or more of the encoding rate of the current frame, the number of audio signal channels, the sampling frequency, the upper limit of band expansion, or the second number.
  • the first number can be determined according to the first judgment indicator of the current channel, the first number can also be determined according to the second judgment indicator, and the first number can also be determined according to the first judgment indicator and the second judgment indicator of the current channel.
  • the first judgment indicator of each channel in the current frame can be determined according to the encoding rate and the number of channels in the current frame, including the first judgment indicator of the current channel, or according to the sampling frequency and the upper limit of the frequency band extension.
  • the second judgment flag is the encoding rate of the current frame is the total encoding rate of all channels in the current frame.
  • the specific method for obtaining the first judgment identifier of the current channel may include, but is not limited to, one or more of the following:
  • the value of the first judgment flag of the current channel is determined to be 1, when When the average coding rate is not higher than 24 kbps, the first judgment flag of the current channel is determined to be 0.
  • the method of determining the actual encoding rate of each channel can include multiple methods. For example, the encoding rate can be randomly assigned to each channel, or the encoding rate can be assigned to each channel according to the data size of each channel.
  • the coding rate allocated can be allocated to each channel in a fixed manner, etc.
  • the specific allocation method can be adjusted according to actual application scenarios. For example, if the total encoding rate available for the current audio signal (ie, the encoding rate of the current frame) is 256kbps, and the audio signal has three channels, such as channel 1, channel 2, and channel 3, the three channels can be allocated Coding rate, such as assigning 192kbps for channel 1, 44kbps for channel 2, and 20kbps for channel 3. Then, compare the actual encoding rate of each channel with 64kbps (ie the second threshold).
  • the value of the first judgment flag of the current channel is determined to be 1, when the actual encoding of the current channel When the rate is not higher than 64kbps, the first judgment flag of the current channel is determined to be 0, the value of the obtained first judgment flag of channel 1 is 1, and the value of the first judgment flag of channel 2 and channel 3 is 0.
  • the specific method for obtaining the second judgment identifier of the current channel may include: when the upper limit of the frequency band extension includes the value of the highest frequency, comparing whether the value of the highest frequency included in the upper limit of the frequency band extension is the same as the value of the highest frequency of the audio signal , Determine the second judgment flag, the highest frequency of the audio signal is usually half of the sampling frequency, of course, the sampling frequency can also be set to be greater than 2 times the highest frequency; or, when the upper limit of the frequency band extension includes the highest frequency band number, compare the frequency band extension Whether the highest frequency band number included in the upper limit is the same as the highest frequency band number of the audio signal, determine the second judgment identifier, the highest frequency band number of the audio signal is determined by the sampling frequency, and the highest frequency band number of the audio signal may be where the highest frequency of the audio signal is located The serial number of the frequency band.
  • the data included in the upper limit of the band extension and the data of the highest frequency of the acquired audio signal can be converted to the same Then, compare the data of the same type to obtain the second judgment identifier.
  • the upper limit of the frequency band extension includes the value of the highest frequency, and the highest frequency point number of the audio signal is obtained, the highest frequency value corresponding to the highest frequency point number of the audio signal can be determined, and the upper limit of the frequency band extension includes The value of the highest frequency and the value of the corresponding highest frequency of the determined audio signal, thereby obtaining the second judgment identifier.
  • the value of the second determination identifier may be 0; otherwise, the value of the second determination identifier may be 1.
  • the frequency band number corresponding to the upper limit of the frequency band extension is compared with the highest frequency band number of the audio signal.
  • the value of the second judgment flag may be 0. Otherwise, the value of the second judgment flag is 1.
  • the highest frequency corresponding to the upper limit of the band extension does not exceed the highest frequency of the audio signal.
  • the specific manner of determining the first quantity may include:
  • the preset condition may be: satisfying that the average encoding rate of the current channel is greater than the first threshold, or that the actual encoding rate of the current channel is greater than one of the second thresholds, and that the highest frequency band sequence number included in the upper limit of the frequency band extension is satisfied.
  • the number of increased frequency regions can be determined according to the difference between the highest frequency of the audio signal and the upper limit of the band extension, and the difference between the highest frequency of the audio signal and the upper limit of the band extension can be divided into one or more frequency regions,
  • the upper frequency limit of the first frequency range is higher than the highest frequency corresponding to the upper limit of the frequency band extension, so that more tonal component information in the high frequency band signal can be detected.
  • the aforementioned preset condition may be that both the first judgment flag and the second judgment flag are 1. If the first judgment flag and the second judgment flag of the current channel are both 1, then one is added to the second number. Or multiple frequency regions, as the first number of the current channel. Wherein, the added one or more frequency regions may be obtained by dividing the part of the first frequency range above the upper limit of the frequency band extension according to a preset dividing manner.
  • the second quantity is used as the first quantity. It can be understood that when the highest frequency of the audio signal is in the second frequency range, the second frequency range can be directly used as the first frequency range, and the tonal component detection of the first frequency range can also be realized. More comprehensive detection of the tonal components in the.
  • whether to add additional frequency regions (tile) to the second number to obtain the first number of the current channel can be jointly determined by the following two conditions:
  • bitrate_ch bitrate_tot/n_channels
  • the frequency band expansion processing can be compared, for example, Intelligent Gap Filling (IGF, Intelligent Gap Filling) cuts off the SFB serial number and the total SFB number, and judges whether the frequency range corresponding to the IGF can cover the full frequency band of the audio signal. If it cannot cover the entire audio signal Band, add one or more tiles.
  • IGF Intelligent Gap Filling
  • igfStopSfb is the IGF cutoff SFB sequence number
  • nr_of_sfb_long is the total number of SFBs
  • flag_addTile is the first judgment flag
  • num_tiles is the number of tiles in the IGF band
  • num_tiles_detect is the number of tiles for tone component detection.
  • the number of frequency regions in the first frequency range may also be a preset number.
  • the preset number may be determined by the user, or may be determined according to an empirical value, and may be specifically adjusted according to actual application scenarios.
  • the preset number may be written in the configuration code stream, or may not be written in the configuration code stream.
  • the default number of frequency regions between the encoding device and the decoding device may be the number of frequency regions included in the second frequency range plus N, where N may be a preset positive integer.
  • other information of the current channel can also be acquired, such as identification information, relationship information, or the number of changes in frequency regions. For example, it is possible to compare whether the first frequency range and the second frequency range are the same to obtain identification information; to compare the magnitude relationship between the first frequency range and the second frequency range to obtain relationship information; to compare the first number and the second frequency range The difference between the two numbers, thereby obtaining the number of changes in the frequency region, and so on.
  • Manner 2 Obtain the frequency region information used in the previous frame or the first frame of the audio signal as the frequency region information of the current frame.
  • the frequency region information can be obtained by the aforementioned method when encoding the previous frame of the current frame.
  • the frequency region information can be directly read; the frequency region information can also be The first frame of the audio signal is obtained by way of encoding when encoding. For example, all the frames included in the audio signal can be encoded using the same frequency region information, thereby reducing the workload of the encoding device and improving the encoding efficiency.
  • the frequency region information can be obtained in a variety of ways, and the frequency region information used in each frame can be dynamically determined in real time through mode 1, so that the frequency range indicated by the frequency region information can be adaptively determined. Cover the frequency range where the tonal components of the high-frequency signal and the low-frequency signal in each frame are not similar to improve the coding quality; multiple frames can also share the same frequency region information, reduce the workload of calculating the frequency region information, and improve the coding quality and coding efficiency . Therefore, the audio signal encoding method provided in this application can flexibly adapt to more scenarios.
  • the boundaries of each frequency region requiring tonal component detection can also be determined based on the frequency region information, so that it can be more accurate Determine the first frequency range. It can be understood that after determining the number of frequency regions in the first frequency range, it is also necessary to determine the division manner of each frequency region in the first frequency range.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band expansion indicated by the configuration information; when the first number is less than or equal to the second number, the distribution and configuration of the frequency regions in the first frequency range
  • the distribution of the frequency regions in the second frequency range indicated in the information is the same, that is, the frequency regions in the first frequency range are divided in the same manner as the frequency regions in the second frequency range.
  • the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range, that is, the first frequency range covers and is greater than the second frequency range, and the first frequency range overlaps the second frequency range
  • the distribution of the frequency region of is the same as the distribution of the frequency region in the second frequency range, that is, the frequency region of the overlapping part of the first frequency range and the second frequency range is divided in the same way as the frequency region in the second frequency range
  • the distribution of the frequency regions in the non-overlapping part of the first frequency range and the second frequency range is determined according to a preset method, that is, the frequency regions in the non-overlapping part of the first frequency range and the second frequency range are divided according to the preset method .
  • the division of frequency regions of the frequency band extension is usually pre-configured, that is, the configuration information may include the division of each frequency region in the second frequency range, when the first number is less than or equal to the second frequency region corresponding to the frequency band extension.
  • the first frequency range can be divided according to the frequency area division manner in the second frequency range, so as to obtain each frequency area in the first frequency range. For example, if the frequency region in the second frequency range is divided in a unit of 1KHz, the first frequency range can also be divided in a unit of 1KHz to obtain one or more frequency regions in the first frequency range.
  • the first frequency range can completely cover and be greater than the second frequency range.
  • the part that overlaps with the second frequency range can be divided according to the frequency region division method in the second frequency range.
  • the part in the first frequency range that does not overlap with the second frequency range, that is, the difference between the first number and the second number corresponds to The frequency regions of, can be divided in a preset manner, so as to accurately determine the boundaries of each frequency region included in the first frequency range that needs to be detected for tonal components.
  • the preset manner may include a preset width, a frequency upper limit of the frequency region, and the like.
  • FIG. 6A For a scenario where the first number is less than or equal to the second number, refer to FIG. 6A, where the frequency region in the first frequency range is divided in the same manner as the frequency region in the second frequency range.
  • FIG. 6B For a scenario where the first number is greater than the second number, refer to FIG. 6B, where the frequency region division method of the part of the first frequency range that overlaps with the second frequency range is the same as the frequency region division method in the second frequency range.
  • One or more frequency regions that are more than the second frequency range in a frequency range can be divided in a preset manner, that is, for
  • the division method of the frequency region may be the same as or different from that of the overlapping part.
  • the non-overlapping part can be divided into one or more frequency regions.
  • the non-overlapping part can also be divided into the last frequency region of the overlapping part, as shown in FIG. 6C.
  • the conditions that the divided frequency regions need to meet may include: the upper frequency limit of the frequency region is less than or equal to the highest frequency of the audio signal, which is usually the frequency region of the frequency region.
  • the upper frequency limit is less than or equal to the highest frequency of the audio signal, and the width of the frequency region is less than or equal to the preset value.
  • the number of frequency region changes included in the aforementioned frequency region information is the number of frequency regions included in the non-overlapping portion of the first frequency range and the second frequency range.
  • the frequency bands in the frequency region can be numbered, and the frequency band number corresponding to the upper frequency limit of the frequency region in the non-overlapping part is less than or equal to the frequency band number corresponding to the highest frequency of the audio signal, and the part is not overlapping
  • the width of the frequency region within is less than or equal to the preset value, and the frequency band number corresponding to the highest frequency of the audio signal is determined by the sampling frequency and the frequency band division method.
  • the upper frequency limit of the lower frequency region is the lower limit of the higher frequency region.
  • the number of frequency regions in the first frequency range and the division method of each frequency region are determined, so that subsequent tonal component detection can be performed according to the frequency region to obtain a more comprehensive Tonal component detection.
  • the tonal component detection may be performed in units of frequency regions, or the tonal component detection may be performed in units of frequency bands in the frequency region.
  • the boundaries of each frequency region included in the first frequency range are also determined.
  • the method for determining the boundary of each frequency region included in the first frequency range may include: if the first number is less than or equal to the second number, determining the boundary of the first frequency range according to the boundary of each frequency region in the second frequency range. The boundaries of each frequency region included. If the first number is greater than the second number, for the part of the first frequency range that overlaps with the second frequency range, the boundary of each frequency region in the second frequency range can be used to determine the value of each frequency region included in the first frequency range. Boundary. For a part of the first frequency range that does not overlap with the second frequency range, the frequency area may be divided according to a preset division method, and the boundary of the frequency area may be determined.
  • the method of determining the boundary of each frequency region in the first frequency range may include: if the first number is less than or equal to the second number, then the boundary of each frequency region in the second frequency range corresponding to the frequency band extension is taken as the first The boundary of each frequency area in the frequency range; if the first number is greater than the second number, the boundary of each frequency area in the second frequency range is taken as the boundary of at least one low-frequency area in the first frequency range, and according to the preset Method to determine the boundary of at least one high-frequency region, the low-frequency region is the first frequency range, the upper frequency limit is lower than the upper limit of the frequency band extension Frequency area.
  • determining the boundary of the at least one high frequency region according to a preset method may specifically include: adjacent to the first frequency region, and the frequency is lower than The upper frequency limit of the frequency region of the first frequency region is used as the lower frequency limit of the first frequency region, and the upper frequency limit of the first frequency region is determined according to a preset manner, and the first frequency region is included in at least one high frequency region; wherein, the first frequency The upper frequency limit of the region is less than or equal to the highest frequency of the audio signal, and the width of the first frequency region is less than or equal to the preset value; or, the frequency band number corresponding to the upper frequency limit of the first frequency region is less than or equal to the highest frequency of the audio signal The frequency band number, and the width of the first frequency region is less than or equal to the preset value, and the frequency band number corresponding to the highest frequency of the audio signal is determined by the sampling frequency and the preset frequency band division method.
  • the following takes a specific application scenario as an example to illustrate the manner of determining each frequency region in the first frequency range.
  • the tile boundary can be the SFB sequence number of the boundary, or the frequency of the boundary, or both.
  • the newly added tiles do not need to cover the entire remaining high frequency band from the IGF cutoff frequency to Fs/2. Therefore, the maximum width of the newly added tiles can be limited to 128 frequency points, that is, the width of the frequency region is less than Or equal to the preset value. Among them, Fs is the sampling frequency.
  • the method for determining the width of the newly added tile and the method for updating the tile banding table and the tile-sfb correspondence table are as follows:
  • igfStopSfb is the ending SFB sequence number of IGF
  • sfbIdx is the SFB sequence number
  • tileWidth_new is the width of the new tile
  • nr_of_sfb_long is the total SFB number
  • sfb_offset is the SFB boundary
  • the lower limit of the i-th SFB is sfb_offset[i]
  • the upper limit is sfb_offset[ i+1]
  • tile_sfb_wrap represents the correspondence between tiles and sfb.
  • the starting SFB sequence number of the i-th tile is tile_sfb_wrap[i]
  • the ending SFB sequence number is tile_sfb_wrap[i+1]-1.
  • the boundary of each frequency region in the first frequency range can be determined, so that the tonal component detection can be performed more accurately.
  • tonal component detection is performed on the first frequency range to obtain the tonal component information of the high frequency band signal.
  • the pitch component information may include a position quantity parameter of the pitch component, and an amplitude parameter or energy parameter of the pitch component.
  • the tonal component information also includes the noise floor parameter of the high-band signal.
  • the number of positions parameter indicates that the position of the tonal component and the number of tonal components are represented by the same parameter.
  • the pitch component information may include the position parameter of the pitch component, the quantity parameter of the pitch component, and the amplitude parameter or energy parameter of the pitch component; in this case, the position and quantity of the pitch component are different. The parameter representation.
  • the first frequency range indicated in the frequency region information may include one or more frequency regions (tile), one frequency region may include one or more frequency bands, and one frequency band may include one or more subbands.
  • Step 504 may specifically include: determining the position quantity parameter of the tone component of the current frequency region and the amplitude parameter of the tone component of the current frequency region according to the high-band signal of the current frequency region in the first number of frequency regions in the high-band signal Or energy parameters, etc.
  • the current frequency region Before determining the tonal components of the current frequency region, you can determine whether the current region includes tonal components.
  • the current frequency region When the current frequency region includes tonal components, the current frequency region’s high-band signals are used to determine the current frequency region’s tonal components The position quantity parameter and the amplitude parameter or energy parameter of the tonal component of the current frequency region. In this way, only the parameters of the frequency region with tonal components are obtained, thereby improving the coding efficiency.
  • the tonal component information of the current frame also includes the tonal component indication information, and the tonal component indication information is used to indicate whether the tonal component is included in the current frequency region.
  • the audio decoder to perform decoding according to the indication information, which improves decoding efficiency.
  • determining the tonal component information of the current frequency region based on the high-band signal of the current frequency region may include: according to the high-band signal of the current frequency region in at least one frequency region in the current frequency region Perform peak search within the current area to obtain at least one of peak number information, peak position information, and peak amplitude information in the current area; determine according to at least one of peak number information, peak position information, and peak amplitude information in the current frequency area The position quantity parameter of the tonal component in the current frequency region and the amplitude parameter or energy parameter of the tonal component in the current frequency region.
  • the high-band signal for peak search may be a frequency domain signal or a time domain signal.
  • the peak search may be specifically performed according to at least one of the power spectrum, the energy spectrum, or the amplitude spectrum of the current frequency region.
  • the position quantity parameter of the tonal component in the current frequency region and the amplitude of the tonal component in the current frequency region are determined according to at least one of peak number information, peak position information, and peak amplitude information in the current frequency region
  • the parameters or energy parameters may include: determining the position information, quantity information, and amplitude information of the tonal components in the current frequency region according to at least one of peak number information, peak position information, and peak amplitude information in the current frequency region;
  • the position information, quantity information, and amplitude information of the tonal components of the area determine the position quantity parameter of the tonal component in the current frequency area and the amplitude parameter or energy parameter of the tonal component in the current frequency area.
  • the information of the parameters of the frequency band extension and the information of the tonal component may be stream-multiplexed to obtain the payload code stream.
  • code stream multiplexing in addition to performing code stream multiplexing on the frequency band extension parameters and tone component information, it is also possible to perform code stream multiplexing in combination with other information of low-band signals or high-band signals. For example, combining low-band coding parameters, time-domain noise shaping parameters, frequency-domain noise shaping parameters, or spectrum quantization parameters for code stream multiplexing, so as to obtain a high-quality payload code stream.
  • the signal type information can be used to indicate whether a certain frequency region or a certain frequency band has a tonal component. If there is no tonal component, a certain frequency region can be written in the code stream.
  • the signal type information of the tonal component does not exist in the frequency band, thereby indicating that there is no tonal component in a certain frequency region or frequency band, which improves the decoding efficiency; if there is a tonal component, the information of the tonal component needs to be written into the code stream, and at the same time, it is also indicated
  • the signal type information of which frequency regions have tonal components is written into the code stream, and the frequency band expansion parameters or time domain noise shaping parameters, frequency domain noise shaping parameters or spectrum quantization parameters are written into the code stream to improve the coding quality.
  • the frequency region information can be code stream multiplexed to obtain the configuration code stream.
  • the frequency region information can be written into the configuration code stream, so that the decoding device can decode the audio signal according to the frequency region information included in the configuration code stream, so that the tonal components of the frequency range indicated by the frequency region information can be decoded. Perform reconstruction to obtain high-quality decoded data.
  • step 506 in the embodiment of the present application is an optional step.
  • step 506 can be performed without the need to stream each frame.
  • This step 506 is executed during multiplexing, that is, multiple frames in the audio signal can share the same frequency region information, thereby reducing occupied resources and improving coding efficiency.
  • step 506 can also be executed when each frame is encoded, which is not limited in this application.
  • the payload code stream can carry specific information of each frame of the audio signal
  • the configuration code stream can carry configuration information common to each frame of the audio signal.
  • the payload code stream and the configuration code stream can be independent code streams, or they can be included in the same code stream, that is, the payload code stream and the configuration code stream can be different parts of the same code stream, which can be adjusted according to actual application scenarios. , This application does not limit this.
  • the tonal component can be detected according to the frequency range indicated by the frequency region information, so that the detected tonal component information can cover more tonal components between the high-band signal and the low-band signal Dissimilar frequency ranges, thereby improving coding quality.
  • FIG. 7 a schematic flowchart of a decoding method provided by the present application is as follows.
  • the code stream is demultiplexed to obtain the frequency band extension parameter and tone component information of the current frame of the audio signal.
  • the information of the tone component may include the position quantity parameter of the tone component, and the amplitude parameter or energy parameter of the tone component.
  • the number of positions parameter indicates that the position of the tonal component and the number of tonal components are represented by the same parameter.
  • the information of the tonal component includes the position parameter of the tonal component, the quantity parameter of the tonal component, and the amplitude parameter or energy parameter of the tonal component; in this case, the position and quantity of the tonal component are different.
  • the parameter representation is the parameter representation.
  • the frequency range corresponding to the high-band signal includes at least one frequency region, one frequency region includes at least one frequency band, and one frequency band includes at least one subband; accordingly, the tone component information includes the current frame
  • the position quantity parameter of the tonal component of the high-frequency signal includes the position quantity parameter of each tonal component of at least one frequency region, and the amplitude parameter or energy parameter of the tonal component of the high-frequency signal of the current frame includes the respective tone component of at least one frequency region.
  • the amplitude parameter or energy parameter of the tonal component can be in the unit of frequency region, of course, it can also be in the unit of frequency band or sub-band, etc., which can be adjusted according to actual application scenarios.
  • performing code stream demultiplexing on the payload code stream to obtain the tonal component information of the current frame of the audio signal includes: acquiring the current frequency area of at least one frequency area or the tonal component of the current frequency band.
  • Position quantity parameter Analyze the amplitude parameter or energy parameter of the tonal component in the current frequency area or the current frequency band from the payload code stream according to the position quantity parameter of the tonal component in the current frequency area or the current frequency band.
  • the payload code stream in addition to obtaining the frequency band extension parameters and tonal component information of the current frame of the audio signal, it can also obtain the parameters related to the low-band signal, such as: low-band coding Parameters, time domain noise shaping parameters, frequency domain noise shaping parameters, spectrum quantization parameters, etc.
  • the audio signal may be a multi-channel signal or a single-channel signal.
  • the payload stream of the signal of each channel can be demultiplexed and signal reconstructed.
  • the signal of only one channel hereinafter referred to as the current channel
  • the encoding process of is taken as an example to illustrate. In practical applications, steps 702 to 707 can be performed for each channel in the audio signal, and the repeated steps are not repeated in this application.
  • time-domain expansion can be performed according to frequency-band expansion parameters, such as high-band LPC parameters, high-band gain or filtering parameters, etc., to obtain high-band signals.
  • frequency domain expansion can be performed according to parameters such as time domain envelope or frequency domain envelope to obtain a high frequency band signal.
  • the low-band coding parameters obtained by demultiplexing the code stream it is also possible to decode according to the low-band coding parameters obtained by demultiplexing the code stream to obtain a low-band signal.
  • the high-frequency signal can also be restored in combination with the low-frequency signal to obtain a more accurate high-frequency signal. It can be understood that after demultiplexing the payload code stream, the relevant information between the low-band signal and the high-band signal can be obtained. After the low-band signal is obtained, it can be The relevant information between the frequency bands is used to recover the high-frequency signal to obtain the high-frequency signal.
  • the configuration code stream sent by the encoding device may be received, and the configuration code stream may include part of the configuration parameters when the encoding device performs encoding.
  • the configuration code stream please refer to the relevant description in the foregoing step 506, which will not be repeated here.
  • the configuration code stream can be demultiplexed to obtain frequency region information.
  • steps 704-705 in this application are optional steps. Steps 704-705 can be executed when a code stream corresponding to a certain frame of the audio signal is received. That is, multiple frames can share frequency region information, and It may be that steps 704-705 are executed on the code stream corresponding to each frame of the received audio signal, which may be specifically adjusted according to actual application scenarios.
  • the encoding device may also send the configuration information of the frequency band extension to the decoding device through the configuration code stream, or the encoding device and the decoding device may share preset configuration information, which may be specifically adjusted according to actual application scenarios.
  • the frequency range indicated by the frequency region information is reconstructed according to the tone component information to obtain a reconstructed tone signal.
  • the frequency range that needs to be reconstructed of the tone components is referred to as the first frequency range
  • the frequency range corresponding to the frequency band extension is referred to as the second frequency range
  • the lower frequency limit of the first frequency range and the second frequency range The lower limit of the frequency is the same, so I won’t go into details below.
  • the first frequency range may be divided into one or more frequency regions, and one frequency region may include one or more frequency bands.
  • Reconstruction according to the information of the tonal components and the information of the frequency region may specifically include: determining the number of frequency regions that need to be reconstructed of the tonal components as the first number according to the information of the frequency region; and determining the tones in the first frequency range according to the first number
  • Each frequency region of the component reconstruction; in the first frequency range, the tonal component is reconstructed according to the information of the tonal component to obtain a reconstructed tonal signal.
  • determining each frequency region in the first frequency range for tonal component reconstruction may include: if the first number is less than or equal to the second number of frequency regions in the second frequency range, then according to the first number
  • the distribution of frequency regions in the second frequency range determines the distribution of frequency regions in the first frequency range, that is, each frequency region in the first frequency range is determined according to the division method of frequency regions in the second frequency range; if the first number is greater than The second number is to determine the distribution of the frequency region in the overlapping part of the first frequency range and the second frequency range according to the distribution of the frequency region in the second frequency range, and determine the first frequency range and the second frequency range according to a preset method The distribution of the frequency regions in the non-overlapping part, thereby obtaining the distribution of each frequency region in the first frequency range.
  • the overlapping part of the first frequency range and the second frequency range can be divided according to the frequency division method in the second frequency range, and the first frequency range and the second frequency range can be divided according to a preset way.
  • the non-overlapping part of the second frequency range is divided to obtain each frequency area in the first frequency range that needs to be reconstructed by tonal components. Therefore, the second number in the second frequency range can be combined to accurately determine the number of frequency regions in the frequency range in which tonal component reconstruction is required.
  • the upper frequency limit of the frequency region is less than or equal to the highest frequency of the audio signal, usually the frequency of the frequency region
  • the upper limit is less than or equal to half of the sampling frequency
  • the width of the frequency region is less than or equal to the preset value
  • the configuration information of the frequency band extension can be obtained through the configuration code stream, or the configuration information of the frequency band extension can also be obtained locally, and the second frequency range for performing the frequency band extension can be determined through the configuration information, and the second frequency The distribution or division of the frequency regions within the range, etc., so as to determine the distribution of the frequency regions in the first frequency range according to the distribution of the frequency regions in the second frequency range indicated by the configuration information.
  • the reconstruction can be performed in units of frequency regions, or reconstruction can be performed in units of frequency bands.
  • the number of tiles that need to be reconstructed to the pitch component may be num_tiles_detect.
  • the reconstructed tone signal obtained after reconstruction may be a time domain signal or a frequency domain signal.
  • the information of the tonal component may include the position parameter, the quantity parameter, the amplitude parameter, etc. of the tonal component, and the quantity parameter of the tonal component proves the quantity of the tonal component.
  • the reconstruction method of the tonal component at a position can be specifically:
  • it may be: calculating the position of the pitch component according to the position parameter of the pitch component.
  • tone_pos tile[p]+(sfb+0.5)*tone_res[p]
  • tile[p] is the starting frequency point of the p-th frequency region
  • sfb is the subband number of the tonal component in the frequency region
  • tone_res[p] is the frequency-domain resolution of the p-th frequency region (that is, the p-th frequency region).
  • the subband number of the tone component in the frequency region is the position parameter of the tone component. 0.5 means that the position of the tonal component in the sub-band where the tonal component exists is at the center of the sub-band.
  • the reconstructed tonal components can also be located in other positions of the subband.
  • it may be: calculating the amplitude of the tonal component according to the amplitude parameter of the tonal component.
  • tone_val pow(2.0,0.25*tone_val_q[p][tone_idx]–4.0)
  • tone_val_q[p][tone_idx] represents the amplitude parameter corresponding to the tone_idx position parameter in the p-th frequency region
  • tone_val represents the amplitude value of the frequency point corresponding to the tone_idx position parameter in the p-th frequency region.
  • tone_idx belongs to [0, tone_cnt[p]-1], and tone_cnt[p] is the number of tone components in the p-th frequency region.
  • the frequency domain signal corresponding to the position tone_pos of the tone component satisfies:
  • tone_pos represents the frequency domain signal corresponding to the position tone_pos of the tone component
  • tone_val represents the amplitude value of the frequency point corresponding to the tone_idx position parameter in the p-th frequency region
  • tone_pos indicates the position of the tone component corresponding to the tone_idx position parameter in the p-th frequency region.
  • the low-frequency signal in addition to obtaining the decoded signal of the current frame according to the high-frequency band signal and the reconstructed tone signal, the low-frequency signal can also be combined to obtain a more complete decoded signal of the current frame.
  • the tonal component is restored in combination with the high-band signal, so as to obtain the specific details of the high-band part and the tonal component in the current frame, and the current frame is restored in combination with the low-band signal to obtain The current frame that contains the complete tonal components.
  • the decoding device when the decoding device restores the tonal components, it can combine the frequency region information provided by the encoding device to restore the tonal components in the first frequency range, so that the current frame obtained includes more complete Even in the scene where there are often tonal components that are not similar to the low-frequency spectrum in the high-band spectrum, the current frame obtained by decoding can also have richer tonal components, improve the decoding quality, and improve the user Experience.
  • this application provides an encoding device for executing the audio signal encoding method shown in FIG. 5 above.
  • FIG. 8 is a schematic structural diagram of an encoding device provided by the present application, as described below.
  • the encoding device may include:
  • the audio acquisition module 801 is used to acquire the current frame of the audio signal, and the current frame includes a high-band signal and a low-band signal;
  • the parameter obtaining module 802 is configured to obtain the parameters of the frequency band extension of the current frame according to the high-frequency band signal, the low-frequency band signal and the preset configuration information of the frequency band expansion;
  • the frequency acquisition module 803 is configured to acquire frequency region information, and the frequency region information is used to indicate the first frequency range in the high-band signal that needs to be detected for tonal components;
  • the tonal component encoding module 804 is configured to perform tonal component detection in the first frequency range to obtain the information of the tonal component of the high-band signal;
  • the code stream multiplexing module 805 is configured to perform code stream multiplexing on the information of the frequency band extension parameters and the tone component to obtain the payload code stream.
  • the encoding device may further include:
  • the code stream multiplexing module 805 is also used to perform code stream multiplexing on the frequency region information to obtain a configuration code stream.
  • the frequency acquisition module 803 is specifically configured to determine the frequency region information according to the sampling frequency of the audio signal and the configuration information of the frequency band extension.
  • the frequency area information includes at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification information Used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same, and the relationship information is used to indicate the difference between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the number of changes in the frequency region is the number of frequency regions with differences between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the frequency region information includes at least a first number
  • the configuration information of the frequency band extension includes a frequency band extension upper limit and/or a second number
  • the second number is the number of frequency regions in the second frequency range
  • the frequency acquisition module 803 is specifically configured to determine the first number according to one or more of the encoding rate of the current frame, the number of audio signal channels, the sampling frequency, the upper limit of band expansion, or the second number.
  • the upper limit of the frequency band extension includes one or more of the following: the highest frequency in the second frequency range, the highest frequency point sequence number, the highest frequency band sequence number, or the highest frequency region sequence number.
  • the number of audio signal channels is at least one
  • the frequency acquisition module 803 is specifically used for:
  • the encoding rate of the current frame is the encoding rate of the current frame; according to the first judgment indicator, combined with the second number, determine the first judgment indicator of the current channel A quantity
  • the judgment flag combined with the second quantity, determines the first quantity of the current channel in the current frame.
  • the frequency obtaining module 803 is specifically configured to: obtain the average coding rate of each channel in the current frame according to the coding rate and the number of channels of the current frame; and obtain the average coding rate and the first threshold according to the average coding rate and the first threshold. The first judgment indicator of the current channel.
  • the frequency acquisition module 803 can be specifically used to: determine the actual encoding rate of the current channel according to the encoding rate of the current frame and the number of channels; obtain the current encoding rate according to the actual encoding rate of the current channel and the second threshold The first judgment indicator of the channel.
  • the frequency acquisition module 803 may be specifically used to compare whether the highest frequency included in the upper frequency band extension limit is the same as the highest frequency of the audio signal when the upper limit of the frequency band extension includes the highest frequency, and determine whether the current frame in the current frame is the same as the highest frequency of the audio signal.
  • the highest frequency band number of the audio signal is determined by the sampling frequency.
  • the frequency acquisition module 803 may be specifically used for:
  • the second number corresponding to the frequency band extension is used as the first number of the current channel.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band extension indicated by the configuration information; when the frequency region information includes that the first number is less than or equal to the second number corresponding to the band extension ,
  • the distribution of frequency regions in the first frequency range is the same as the distribution of frequency regions in the second frequency range; when the first number is greater than the second number, the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range,
  • the distribution of the frequency region in the overlapping part of the first frequency range and the second frequency range is the same as the distribution of the frequency region in the second frequency range.
  • the distribution of the frequency region in the non-overlapping part of the first frequency range and the second frequency range is in accordance with Determined by the preset method.
  • the frequency region in the non-coincidence portion of the first frequency range and the second frequency range satisfies the following condition: the width of the frequency region in the non-coincidence portion of the first frequency range and the second frequency range is less than A preset value, and the upper frequency limit of the frequency region in the non-overlapping part of the first frequency range and the second frequency range is less than or equal to the highest frequency of the audio signal.
  • the frequency range corresponding to the high-band signal includes at least one frequency region, where one frequency region includes at least one frequency band.
  • the number of frequency regions in the first frequency range is a preset number.
  • the pitch component information includes a position quantity parameter of the pitch component, and an amplitude parameter or an energy parameter of the pitch component.
  • the pitch component information further includes a noise floor parameter of the high-band signal.
  • this application provides a decoding device for executing the decoding method shown in FIG. 7 above.
  • FIG. 9 a schematic structural diagram of a decoding device provided by the present application is as follows.
  • the decoding device may include:
  • the obtaining module 901 is used to obtain the payload code stream
  • the demultiplexing module 902 is configured to perform code stream demultiplexing on the payload code stream to obtain the frequency band extension parameters and tone component information of the current frame of the audio signal;
  • the frequency band extension decoding module 903 is configured to obtain the high frequency band signal of the current frame according to the parameters of the frequency band extension;
  • the reconstruction module 904 is configured to perform reconstruction according to the tonal component information and frequency region information to obtain a reconstructed tone signal, and the frequency region information is used to indicate the first frequency range in the current frame where the tonal component needs to be reconstructed;
  • the signal decoding module 905 is used to obtain the decoded signal of the current frame according to the high frequency band signal and the reconstructed tone signal.
  • the obtaining module 901 may also be used to: obtain a configuration code stream; obtain frequency region information according to the configuration code stream.
  • the frequency area information includes at least one of the following: a first quantity, identification information, relationship information, or a frequency area change quantity, where the first quantity is the number of frequency areas within the first frequency range, and the identification information Used to indicate whether the first frequency range and the second frequency range corresponding to the frequency band extension are the same, and the relationship information is used to indicate the difference between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the number of changes in the frequency region is the number of frequency regions with differences between the first frequency range and the second frequency range when the first frequency range is different from the second frequency range.
  • the reconstruction module 904 may be specifically used to: according to the frequency region information, determine that the number of frequency regions that need to be reconstructed of tonal components is the first number; according to the first number, determine the first frequency range to perform Each frequency region of the tonal component reconstruction; in the first frequency range, the tonal component is reconstructed according to the information of the tonal component to obtain a reconstructed tonal signal.
  • the lower limit of the first frequency range is the same as the lower limit of the second frequency range for band expansion indicated by the configuration information
  • the acquiring module can be specifically used to: if the first number is less than or equal to the second number , The distribution of each frequency area in the first frequency range is determined according to the distribution of frequency areas in the second frequency range, the second number is the number of frequency areas in the second frequency range; if the first number is greater than the second number, It is determined that the upper frequency limit of the first frequency range is greater than the upper frequency limit of the second frequency range, the distribution of the frequency area in the overlapping part of the first frequency range and the second frequency range is determined according to the distribution of the frequency area in the second frequency range, and according to The preset manner determines the distribution of the frequency regions in the non-overlapping part of the first frequency range and the second frequency range, and obtains each frequency region in the first frequency range.
  • the frequency region in the non-coincident part of the first frequency range and the second frequency range satisfies the following condition: the width of the frequency region divided in the non-coincident part of the first frequency range and the second frequency range It is less than the preset value, and the upper frequency limit of the frequency region divided in the non-overlapping part of the first frequency range and the second frequency range is less than or equal to the highest frequency of the audio signal.
  • the pitch component information includes a position quantity parameter of the pitch component, and an amplitude parameter or an energy parameter of the pitch component.
  • the pitch component information further includes a noise floor parameter of the high-band signal.
  • the encoding device 1000 may include a processor 1001, a memory 1002, and a transceiver 1003.
  • the processor 1001, the memory 1002, and the transceiver 1003 are interconnected by wires.
  • the memory 1002 stores program instructions and data.
  • the memory 1002 stores the program instructions and data corresponding to the steps executed by the encoding device in the foregoing embodiment corresponding to FIG. 5.
  • the processor 1001 is configured to execute the steps executed by the encoding device shown in any of the foregoing embodiments in FIG. 5, for example, may execute steps 501 to 505 in the foregoing FIG. 5, and so on.
  • the transceiver 1003 can be used to receive and send data, for example, can be used to perform step 506 in FIG. 5 described above.
  • the encoding device 1000 may include more or less components than that shown in FIG. 10, which is only an exemplary description in this application and is not limited.
  • the decoding device 1100 may include a processor 1101, a memory 1102, and a transceiver 1103.
  • the processor 1101, the memory 1102, and the transceiver 1103 are interconnected by wires.
  • the memory 1102 stores program instructions and data.
  • the memory 1102 stores the program instructions and data corresponding to the steps executed by the decoding device in the foregoing embodiment corresponding to FIG. 7.
  • the processor 1101 is configured to execute the steps executed by the decoding device shown in any of the foregoing embodiments in FIG. 7, for example, may execute steps 702, 703, 705-707, etc. in the foregoing FIG. 7.
  • the transceiver 1103 can be used to receive and send data, for example, can be used to perform step 701 or 704 in FIG. 7 described above.
  • the decoding device 1100 may include more or less components than that in FIG. 11, which is only an exemplary description in this application and is not limited.
  • the present application also provides a communication system, which may include an encoding device and a decoding device.
  • the encoding device may be the encoding device shown in FIG. 8 or FIG. 10, and may be used to execute the steps performed by the encoding device in any of the implementation manners shown in FIG. 5 above.
  • the decoding device may be the decoding device shown in FIG. 9 or FIG. 11, and may be used to execute the steps performed by the decoding device in any of the embodiments shown in FIG. 7.
  • This application provides a network device that can be applied to devices such as encoding devices or decoding devices.
  • the network device is coupled with a memory, and is used to read and execute instructions stored in the memory, so that the network device implements The steps of the method performed by the encoding device or the decoding device in any of the foregoing embodiments in FIGS. 5-7.
  • the network device is a chip or a system on a chip.
  • the present application provides a chip system including a processor, which is used to support the encoding device or the decoding device to implement the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above methods .
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the chip system when the chip system is a chip in an encoding device or a decoding device, the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may, for example, It is the input/output interface, pin or circuit, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the encoding device or the decoding device, etc. executes the steps of the method executed by the encoding device or the decoding device in any one of the embodiments of FIGS. 5-7.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the OLT or ONU, such as read-only Memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • read-only Memory read-only memory
  • RAM random access memory
  • the embodiments of the present application also provide a processor, which is configured to be coupled with a memory and used to execute methods and functions related to an encoding device or a decoding device in any one of the foregoing embodiments.
  • the embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, it implements the method flow related to the encoding device or the decoding device in any of the foregoing method embodiments.
  • the computer may be the foregoing encoding device or decoding device.
  • the processor mentioned in the chip system, encoding device, or decoding device in the above embodiments of this application, or the processor provided in the above embodiments of this application may be a central processing unit (CPU), It can also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs), or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the number of processors in the chip system, encoding device, or decoding device in the above embodiments of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an example. Explain, not limit.
  • the number of memories in the embodiment of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an exemplary description and is not limited.
  • the memory or readable storage medium mentioned in the chip system, encoding device, or decoding device in the above embodiments in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may be Includes both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM, DR RAM
  • the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface. It can be adjusted according to actual application scenarios and is not limited.
  • the embodiments of the present application also provide a computer program or a computer program product including a computer program.
  • the computer program When the computer program is executed on a computer, the computer will enable the computer to implement the encoding device or the encoding device in any of the foregoing method embodiments.
  • the computer may be the aforementioned encoding device or decoding device.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or other network devices, etc.) execute all or part of the steps of the methods described in the various embodiments in Figures 5-7 of this application.
  • the storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code.
  • the words “if” or “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)

Abstract

一种音频信号编码方法,包括:获取音频信号的当前帧(501),当前帧包括高频带信号和低频带信号;根据高频带信号、低频带信号和频带扩展的配置信息得到当前帧的频带扩展的参数(502);获取频率区域信息(503),频率区域信息用于指示高频带信号中需要进行音调成分检测的第一频率范围;在第一频率范围进行音调成分检测以获取高频带信号的音调成分的信息(504);对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流(505)。还公开了对应的解码方法,编码设备,解码设备,通信系统,网络设备以及计算机可读存储介质。

Description

音频信号编码方法、解码方法、编码设备以及解码设备
本申请要求于2020年04月15日提交中国专利局、申请号为“202010297340.0”、申请名称为“音频信号编码方法、解码方法、编码设备以及解码设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种音频信号编码方法、解码方法、编码设备以及解码设备。
背景技术
随着社会的进步和技术的不断发展,用户对音频服务的需求越来越高。如何在有限编码比特率的情况下为用户提供更高质量的服务,或者利用更低的编码比特率为用户提供相同质量的服务一直以来都是音频编解码研究的重点。
通常,在对音频数据进行编码的过程中,是将音频数据中的高频部分和低频部分分别进行处理。为了降低编码比特率,往往进一步利用不同频带信号间的相关性进行编码。例如,利用低频带信号,通过频谱复制或者频带扩展等方法,产生高频带信号。然而,高频带的频谱中往往存在一些与低频带的频谱不相似的音调成分,而现有方案并不能对这些不相似的音调成分进行处理,这就导致实际的编码数据的编码质量较低。因此,如何得到高质量的编码数据,成为亟待解决的问题。
发明内容
本申请提供一种音频信号编码方法、解码方法、编码设备以及解码设备,用于实现更高质量的音频编解码,提高用户体验。
第一方面,本申请提供一种音频信号编码方法,包括:获取音频信号的当前帧,当前帧包括高频带信号和低频带信号;根据高频带信号、低频带信号和预设的频带扩展的配置信息得到当前帧的频带扩展的参数;获取频率区域信息,频率区域信息用于指示高频带信号中需要进行音调成分检测的第一频率范围;在第一频率范围进行音调成分检测以获取高频带信号的音调成分的信息;对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流。
因此,本申请实施方式中,可以根据频率区域信息所指示的频率范围进行音调成分的检测,该频率范围是根据频带扩展的配置信息和音频信号的采样频率确定的,从而使检测得到的音调成分的信息可以覆盖更多的高频带信号与低频带信号之间音调成分不相似的频率范围,并根据覆盖更多频率范围的音调成分的信息进行编码,从而提高编码质量。
在一种可能的实施方式中,第一方面提供的方法还可以包括:对频率区域信息进行码流复用,得到配置码流。因此,本申请实施方式中,可以将频率区域信息通过配置码流发送至解码设备,从而使解码设备可以根据配置码流中所包括的频率区域信息指示的频率范围进行解码,从而可以对高频带信号和低频带信号中不相似的音调成分的信息进行解码,进一步提高解码质量。
在一种可能的实施方式中,获取频率区域信息,可以包括:根据音频信号的采样频率以及配置信息确定频率区域信息。本申请实施方式中,音频信号有一帧或者多帧,可以在对每一帧进行编码时确定对应的频率区域信息,也可以是多帧使用相同的频率区域信息,提供了多种实施方式,具体可以根据实际应用场景进行调整。
在一种可能的实施方式中,频率区域信息可以包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与配置信息指示的频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系,频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。因此,可以根据该频率区域信息,准确地确定需要进行音调成分检测的频率范围。
在一种可能的实施方式中,频带扩展的配置信息包括频带扩展上限和/或第二数量,第二数量为第二频率范围内的频率区域的数量;上述方法还可以包括:根据当前帧的编码速率、音频信号的通道数量、音频信号的采样频率、频带扩展上限或者第二数量中的一个或多个,确定第一数量。因此,本申请实施方式中,可以根据当前帧的编码速率、音频信号的通道数量、采样频率、频带扩展上限或者第二数量中的一个或多个,准确地确定需要进行音调成分检测的频率区域的数量。
在一种可能的实施方式中,频带扩展上限包括以下一个或多个:第二频率范围内的最高频率、最高频点序号、最高频带序号或最高频率区域序号。
在一种可能的实施方式中,音频信号的通道数量为至少一个;上述的根据当前帧的编码速率、音频信号的通道数量、采样频率、频带扩展上限或者第二数量中的一个或多个,确定第一数量,可以包括:根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,当前帧的编码速率是当前帧的编码速率;根据第一判断标识,结合第二数量,确定当前通道的第一数量;或者,根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第二判断标识,结合第二数量,确定当前通道的第一数量;或者,根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,以及根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第一判断标识和第二判断标识,结合第二数量,确定当前帧中当前通道的第一数量。
因此,本申请实施方式中,可以通过多种方式结合第二数量确定第一数量,从而准确地确定出需要进行音调成分检测的频率区域的数量。
在一种可能的实施方式中,根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,可以包括:根据当前帧的编码速率和通道数量得到当前帧中的每个通道的平均编码速率;根据平均编码速率和第一阈值,得到当前通道的第一判断标识。
本申请实施方式中,可以根据平均编码速率来得到当前通道的第一判断标识,从而通过第一判断标识来表示平均编码速率是否大于第一阈值,使后续得到的第一数量更准确。
在一种可能的实施方式中,上述的根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,还可以包括:根据当前帧的编码速率和通道数量确定当前通道 的实际编码速率;根据当前通道的实际编码速率和第二阈值,得到当前通道的第一判断标识。
本申请实施方式中,可以为每个通道分配实际编码速率,从而通过第一判断标识来表示当前通道的实际编码速率是否大于第二阈值,使后续得到的第一数量更准确。
在一种可能的实施方式中,上述的根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识,可以包括:当频带扩展上限包括最高频率时,比较频带扩展上限包括的最高频率与音频信号的最高频率是否相同,确定当前帧中当前通道的第二判断标识;或者,当频带扩展上限包括最高频带序号时,比较频带扩展上限包括的最高频带序号与音频信号的最高频带序号是否相同,确定当前帧中当前通道的第二判断标识,音频信号的最高频带序号由采样频率确定。
本申请实施方式中,可以通过频带扩展上限包括的最高频率与音频信号的最高频率进行比较,来确定第二判断标识,或者,对频带扩展上限所包括的最高频点序号、最高频带序号或最高频率区域序号等和音频信号对应的最高频点序号、最高频带序号或最高频率区域序号等进行比较,确定音频信号的最高频率是否超过频带扩展的频率上限,从而得到更准确的第一数量。
在一种可能的实施方式中,上述的确定当前帧中当前通道的第一数量,可以包括:若第一判断标识和第二判断标识都符合预设条件,则在频带扩展对应的第二数量基础上增加一个或多个频率区域,作为当前通道的第一数量;或若第一判断标识或第二判断标识不满足预设条件,则将频带扩展对应的第二数量作为当前通道的第一数量。
因此,本申请实施方式中,当第一判断标识和第二判断标识都符合预设条件时,表示需要检测音调成分的频率范围超过了频带扩展对应的频率范围,需要增加频率区域的数量,从而使进行音调成分检测的频率区域的数量可以覆盖频带扩展对应的频率范围,从而使最终得到的音调成分的信息可以覆盖音调信号的当前帧中的所有音调成分的信息,提高编码质量。而当第一判断标识或者第二判断标识不符合预设条件,则可以对当前帧中的频带扩展对应的频率范围进行音调检测,也可以完全覆盖当前帧中的所有音调成分的信息,提高编码质量。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同;当频率区域信息包括第一数量小于或等于频带扩展对应的第二数量时,第一频率范围内频率区域的分布与配置信息中指示的第二频率范围内的频率区域的分布相同;当第一数量大于第二数量时,第一频率范围的频率上限大于第二频率范围的频率上限,第一频率范围与第二频率范围重合部分的频率区域的分布与第二频率范围内的频率区域的分布相同,第一频率范围中与第二频率范围的不重合部分内的频率区域的分布为根据预设方式确定的。
因此,本申请实施方式中,第一频率范围的下限与频带扩展的第二频率范围的下限相同,后续可以通过对比第一频率范围内的频率区域的数量,和第二频率范围内的频率区域的数量,确定第一频率范围内的频率区域的划分方式,从而准确地确定第一频率范围所包括的频率区域。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内的频率区域满足以下条件:第一频率范围中与第二频率范围的不重合部分内的频率区域的宽度小于或等于预设值,且第一频率范围与第二频率范围的不重合部分内的频率区域的频率上限小于或等于音频信号的最高频率。因此,本申请实施方式中,可以限定了对于第一频率范围中与第二频率范围的不重合部分的划分方式,即宽度不超过预设值,且频率区域的频率上限小于或等于音频信号的最高频率,从而可以实现更合理的频率区域的划分。
在一种可能的实施方式中,本申请实施方式对频率范围进行了划分,可以将第一频率范围划分为一个或者多个频率区域,每个频率区域又可以划分为一个或者多个频带。并且,可以对频率范围内的频带进行排序,每个频带具有不同的序号,从而可以通过对比频带的序号,来对比频率的大小。
在一种可能的实施方式中,第一频率范围内的频率区域的数量为预设数量。因此,本申请实施方式中,也可以将需要进行音调成分检测的频率区域的数量设定为预设数量,从而可以直接降低工作量。
可选地,当第一频率范围内的频率区域的数量为预设数量时,该预设数量可以写入配置码流中,也可以不写入配置码流中。
在一种可能的实施方式中,音调成分的信息中可以包括音调成分的位置数量参数、以及音调成分的幅度参数或能量参数。
在一种可能的实施方式中,音调成分信息还可以包括高频带信号的噪声基底参数。
第二方面,本申请提供一种解码方法,包括:获取载荷码流;对载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息;根据频带扩展的参数得到当前帧的高频带信号;根据音调成分的信息和频率区域信息进行重建,得到重建音调信号,频率区域信息用于指示当前帧中需要进行音调成分重建的第一频率范围;根据高频带信号和重建音调信号,得到当前帧的解码信号。
本申请实施方式中,可以根据频率区域信息确定需要进行音调成分重建的频率范围,该频率范围是根据频带扩展的配置信息和音频信号的采样频率确定的,从而可以根据该频率区域信息对高频带信号和低频带信号之间不相似的成分的音调成分进行音调成分重建,提高解码质量。
在一种可能的实施方式中,方法还可以包括:获取配置码流;根据配置码流获取频率区域信息。因此,本申请实施方式中,可以根据配置码流中所包括的频率区域信息指示的频率范围进行解码,从而可以对高频带信号和低频带信号中不相似的音调成分的信息进行解码,提高解码质量。
在一种可能的实施方式中,频率区域信息可以包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系,频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。
在一种可能的实施方式中,根据音调成分的信息和频率区域信息进行重建,得到重建音调信号,包括:根据频率区域信息,确定需要进行音调成分重建的频率区域的数量为第一数量;根据第一数量,确定第一频率范围内进行音调成分重建的各个频率区域;在第一频率范围内,根据音调成分的信息对音调成分进行重建,得到重建音调信号。
因此,在本申请实施方式中,可以基于频率区域信息指示的频率范围进行音调成分重建,从而可以对高频带信号和低频带信号中不相似的音调成分的信息进行解码,提高解码质量。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同,上述的根据第一数量,确定第一频率范围内进行音调成分重建的各个频率区域,可以包括:若第一数量小于或等于第二数量,则根据第二频率范围内的频率区域的分布确定第一频率范围内的频率区域的分布,第二数量为第二频率范围内的频率区域的数量;若第一数量大于第二数量,则确定第一频率范围的频率上限大于第二频率范围的频率上限,根据第二频率范围内的频率区域的分布确定第一频率范围与第二频率范围重合部分内的频率区域的分布,以及按照预设方式确定第一频率范围与第二频率范围的不重合部分内的频率区域的分布,得到第一频率范围内的各个频率区域的分布。本申请实施方式中,第一频率范围的下限与频带扩展的第二频率范围的下限相同,后续可以通过对比第一频率范围内的频率区域的数量,和第二频率范围内的频率区域的数量,确定第一频率范围内的频率区域的划分方式,从而准确地确定第一频率范围所包括的频率区域。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内的频率区域满足以下条件:第一频率范围与第二频率范围的不重合部分内的频率区域的宽度小于或等于预设值,且第一频率范围与第二频率范围的不重合部分内的频率区域的频率上限小于或等于音频信号的最高频率。因此,本申请实施方式中,可以限定了对于第一频率范围中与第二频率范围的不重合部分的划分方式,即宽度不超过预设值,且频率区域的频率上限小于或等于音频信号的最高频率,从而可以实现更合理的频率区域的划分。
第三方面,本申请提供一种编码设备,包括:
音频获取模块,用于获取音频信号的当前帧,当前帧包括高频带信号和低频带信号;
参数获取模块,用于根据高频带信号、低频带信号和预设的频带扩展的配置信息得到当前帧的频带扩展的参数;
频率获取模块,用于获取频率区域信息,频率区域信息用于指示高频带信号中需要进行音调成分检测的第一频率范围;
音调成分编码模块,用于在第一频率范围进行音调成分检测以获取高频带信号的音调成分的信息;
码流复用模块,用于对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流。
第三方面及第三方面任一种可能的实施方式产生的有益效果可参照第一方面及第一方面任一种可能实施方式的描述。
在一种可能的实施方式中,编码设备还可以包括:
码流复用模块,还用于对频率区域信息进行码流复用,得到配置码流。
在一种可能的实施方式中,频率获取模块,具体用于根据音频信号的采样频率以及频带扩展的配置信息确定频率区域信息。
在一种可能的实施方式中,频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系,频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。
在一种可能的实施方式中,频率区域信息至少包括第一数量,频带扩展的配置信息包括频带扩展上限和/或第二数量,第二数量为第二频率范围内的频率区域的数量;
频率获取模块,具体用于根据当前帧的编码速率、音频信号的通道数量、采样频率、频带扩展上限或者第二数量中的一个或多个,确定第一数量。
在一种可能的实施方式中,频带扩展上限包括以下一个或多个:第二频率范围内的最高频率、最高频点序号、最高频带序号或最高频率区域序号。
在一种可能的实施方式中,音频信号的通道数量为至少一个;
频率获取模块,具体用于:
根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,当前帧的编码速率是当前帧的编码速率;根据第一判断标识,结合第二数量,确定当前通道的第一数量;
或者,
根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第二判断标识,结合第二数量,确定当前通道的第一数量;
或者,
根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,以及根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第一判断标识和第二判断标识,结合第二数量,确定当前帧中当前通道的第一数量。
在一种可能的实施方式中,频率获取模块,具体用于:根据当前帧的编码速率和通道数量得到当前帧中的每个通道的平均编码速率;根据平均编码速率和第一阈值,得到当前通道的第一判断标识。
在一种可能的实施方式中,频率获取模块,具体可以用于:根据当前帧的编码速率和通道数量确定当前通道的实际编码速率;根据当前通道的实际编码速率和第二阈值,得到当前通道的第一判断标识。
在一种可能的实施方式中,频率获取模块,具体可以用于:当频带扩展上限包括最高频率时,比较频带扩展上限包括的最高频率与音频信号的最高频率是否相同,确定当前帧中当前通道的第二判断标识;或者,当频带扩展上限包括最高频带序号时,比较频带扩展上限包括的最高频带序号与音频信号的最高频带序号是否相同,确定当前帧中当前通道的 第二判断标识,音频信号的最高频带序号由采样频率确定。
在一种可能的实施方式中,频率获取模块,具体可以用于:
若第一判断标识和第二判断标识都符合预设条件,则在频带扩展对应的第二数量基础上增加一个或多个频率区域,作为当前通道的第一数量;或
若第一判断标识或第二判断标识不满足预设条件,则将频带扩展对应的第二数量作为当前通道的第一数量。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同;当频率区域信息包括第一数量小于或等于频带扩展对应的第二数量时,第一频率范围内频率区域的分布与第二频率范围内的频率区域的分布;当第一数量大于第二数量时,第一频率范围的频率上限大于第二频率范围的频率上限,第一频率范围与第二频率范围重合部分的频率区域的分布与第二频率范围内的频率区域的分布相同,第一频率范围与第二频率范围的不重合部分内的频率区域的分布为根据预设方式确定的。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内的频率区域的宽度小于预设值,且第一频率范围与第二频率范围的不重合部分内的频率区域的频率上限小于或等于音频信号的最高频率。
在一种可能的实施方式中,高频带信号对应的频率范围包括至少一个频率区域,其中,一个频率区域包括至少一个频带。
在一种可能的实施方式中,第一频率范围内的频率区域的数量为预设数量。
在一种可能的实施方式中,音调成分信息包括音调成分的位置数量参数、以及音调成分的幅度参数或能量参数。
在一种可能的实施方式中,音调成分信息还包括高频带信号的噪声基底参数。
第四方面,本申请提供一种解码设备,包括:
获取模块,用于获取载荷码流;
解复用模块,用于对载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息;
频带扩展解码模块,用于根据频带扩展的参数得到当前帧的高频带信号;
重建模块,用于根据音调成分的信息和频率区域信息进行重建,得到重建音调信号,频率区域信息用于指示当前帧中需要进行音调成分重建的第一频率范围;
信号解码模块,用于根据高频带信号和重建音调信号,得到当前帧的解码信号。
第四方面及第四方面任一种可能的实施方式产生的有益效果可参照第二方面及第二方面任一种可能实施方式的描述。
在一种可能的实施方式中,获取模块,还可以用于:获取配置码流;根据配置码流获取频率区域信息。
在一种可能的实施方式中,频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系, 频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。
在一种可能的实施方式中,重建模块,具体可以用于:根据频率区域信息,确定需要进行音调成分重建的频率区域的数量为第一数量;根据第一数量,确定第一频率范围进行音调成分重建的各个频率区域;在第一频率范围内,根据音调成分的信息对音调成分进行重建,得到重建音调信号。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同,获取模块,具体可以用于:若第一数量小于或等于第二数量,则按照第二频率范围内的频率区域的分布确定第一频率范围与第二频率范围的重合部分内的频率区域,第二数量为第二频率范围内的频率区域的数量;若第一数量大于第二数量,则确定第一频率范围的频率上限大于第二频率范围的频率上限,根据第二频率范围内的频率区域的分布确定第一频率范围与第二频率范围重合部分内的频率区域的分布,以及根据预设方式确定第一频率范围与第二频率范围的不重合部分内的频率区域的分布,从而得到第一频率范围内的各个频率区域的分布。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内划分的频率区域满足以下条件:第一频率范围与第二频率范围的不重合部分内划分的频率区域的宽度小于预设值,且第一频率范围与第二频率范围的不重合部分内划分的频率区域的频率上限小于或等于音频信号的最高频率。
在一种可能的实施方式中,音调成分信息包括音调成分的位置数量参数、以及音调成分的幅度参数或能量参数。
在一种可能的实施方式中,音调成分信息还包括高频带信号的噪声基底参数。
第五方面,本申请提供一种编码设备,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的音频信号编码方法中与处理相关的功能。
第六方面,本申请提供一种解码设备,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码执行上述第二方面任二项所示的解码方法中与处理相关的功能。
第七方面,本申请提供一种通信系统,包括:编码设备和解码设备,编码设备用于执行上述第一方面任一项所示的音频信号编码方法,解码设备用于执行上述第二方面任二项所示的解码方法。
第八方面,本申请实施例提供了一种数字处理芯片,芯片包括处理器和存储器,存储器和处理器通过线路互联,存储器中存储有指令,处理器用于执行如上述第一方面或第一方面任一可选实施方式,或者,第二方面或第二方面任一可选实施方式中与处理相关的功能。
第九方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面任一可选实施方式,或者,第二方面或第二方面任一可选实施方式中的方法。
第十方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面任一可选实施方式,或者,第二方面或第二方面任一可选实施方式中的方法。
第十一方面,本申请提供一种网络设备,该网络设备可以应用于编码设备或解码设备等设备中,网络设备与存储器耦合,用于读取并执行所述存储器中存储的指令,使得所述网络设备实现本申请第一方面至第二方面中任一方面的任一实施方式提供的方法的步骤。在一种可能的设计中,该端口检测装置为芯片或片上系统。
第十二方面,本申请提供一种计算机可读存储介质,存储有根据本申请第一方面至第二方面中任一方面的任一实施方式提供的方法生成的载荷码流。
第十三方面,本申请提供一种存储于计算机可读存储介质上的计算机程序,所述计算机程序包括指令,当所述指令被执行时,实现本申请第一方面至第二方面中任一方面的任一实施方式提供的方法。
附图说明
图1为本申请提供的一种通信系统的架构示意图;
图2为本申请提供的另一种通信系统的结构示意图;
图3为本申请提供的一种编解码设备的结构示意图;
图4为本申请提供的另一种编解码设备的结构示意图;
图5为本申请提供的一种音频信号编码方法的流程示意图;
图6A为本申请实施例提供的一种频率区域划分方式示意图;
图6B为本申请实施例提供的另一种频率区域划分方式示意图;
图6C为本申请实施例提供的另一种频率区域划分方式示意图;
图7为本申请提供的一种解码方法的流程示意图;
图8为本申请提供的一种编码设备的结构示意图;
图9为本申请提供的一种解码设备的结构示意图;
图10为本申请提供的另一种编码设备的结构示意图;
图11为本申请提供的另一种解码设备的结构示意图。
具体实施方式
本申请提供一种音频信号编码方法、解码方法、编码设备以及解码设备,用于实现更高质量的音频编解码,提高用户体验。
首先,本申请提供的音频信号编码方法和解码方法可以应用于各种存在数据传输的系统。
示例性地,参阅图1,本申请提供的一种通信系统的架构示意图。
其中,该通信系统中可以包括多个设备,如终端或服务器等,该多个设备之间可以通过网络连接。
该网络可以是有线的通信网络,也可以是无线的通信网络,例如:第五代移动通信技 术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络等,还可以无线保真(wireless fidelity,WiFi)、广域网等其他通信网络或通信系统。
该终端设备的数量可以是一个,也可以是多个,如图1中所示的终端1、终端2或终端3等。具体地,该通信系统中的终端可以包括头戴显示设备(Head Mount Display,HMD)、该头戴显示设备可以是VR盒子与终端的组合,VR一体机,个人计算机(personal computer,PC)VR,增强现实(augmented reality,AR)设备,混合现实(mixed reality,MR)设备等,该终端设备还可以包括蜂窝电话(cellular phone)、智能电话(smart phone)、个人数字助理(personal digital assistant,PDA)、平板型电脑、膝上型电脑(laptop computer)、个人电脑(personal computer,PC),或部署在用户侧的计算设备等。
服务器的数量可以是一个也可以是多个,当通信系统中存在多个服务器时,该多个服务器可以是分布式服务器,也可以是集中式服务器,具体可以根据实际应用场景进行调整,本申请对此不作限定。
具体地,前述终端或者服务器等,即可以作为编码设备,也可以作为解码设备。可以理解为,前述的终端或者服务器,即可以执行本申请提供的音频信号编码方法,也可以执行本申请体用的解码方法。当然,编码设备和解码设备也可以是互相独立的设备,例如可以是其中一个终端作为编码设备,另一个终端作为解码设备。
更具体地,参阅图2,下面以两个终端为例,对本申请提供的通信系统进行更详细的说明。
其中,终端1和终端2都可以包括音频采集模块、多声道编码器、信道编码器、信道解码器、多声道解码器以及音频回放模块。
下面以终端1执行音频信号编码方法,终端2执行解码方法为例进行简要的示例性说明,具体执行的步骤可以参阅以下图4或图5中的描述。
其中,终端1的音频采集模块可以获取到音频信号,该音频采集模块可以包括传感器、麦克风、摄像机、录音器等设备,或者该音频采集模块也可以直接接收到其他设备发送的音频信号。
若该音频信号为多声道信号,则通过多声道编码器对该音频信号进行编码,然后将多声道编码器编码得到的信号通过信道编码器进行编码,得到编码码流。
然后将编码码流传输至通信网络中的网络设备1,由网络设备1通过数字信道传输至网络设备2,然后由网络设备2将该编码码流传输至终端2。其中,网络设备1或者网络设备2可以是通信网络中的转发设备,如路由器或交换机等设备。
终端2接收到编码码流之后,通过信道解码器对编码码流进行信道解码,得到信道解码后的信号。
然后由多声道解码器对信道解码后的信号进行多声道解码,得到音频信号。音频回放模块可以回放该音频信号。其中,音频回放模块可以包括扬声器或者耳机等设备。
此外,也可以由终端2的音频采集模块采集音频信号,经由多声道编码器和信道编码器得到编码码流,并将该编码码流经通信网络发送至终端1。随后经由终端1的信道解码器和多声道解码器进行解码,得到音频信号,并通过终端1的音频回放模块播放音频。
在另一种场景中,通信系统中的编码设备可以是不具有音频采集和音频回放功能的转发设备。示例性地,如图3所示,本申请提供的一种编码设备的结构示意图。其中,该编码设备可以包括信道解码器301、音频解码器302、多声道编码器303和信道编码器304。当接收到编码码流时,可以通过信道解码器301进行信道解码,得到信道解码信号。然后由音频解码器302对信道解码信号进行音频解码,得到音频信号。随后经由多声道编码器303对音频信号进行多声道编码,得到多声道编码信号,最后由信道编码器304对多声道编码信号进行信道编码,得到更新后的编码码流,并将该更新后的编码码流发送至其他设备,完成对编码码流的转发。
在不同的场景中,所使用的编码器和解码器的类型也可能不相同。示例性地,如图4所示,在接收到编码码流并通过信道解码器401解码得到信道解码信号之后,通过多声道解码器402对该信道解码信号进行多声道解码,恢复出音频信号。然后通过音频编码器403对该音频信号进行编码,并通过信道编码器404对音频编码器403编码后的数据进行信道编码,得到更新后的编码码流。
此外,前述对多声道音频信号的场景进行了介绍,前述的多声道也可以替换为立体声信号、双通道信号等等,以立体声信号为例,前述的多声道音频信号可以替换为立体声信号,多声道编码器可以替换为立体声编码器,或者多声道解码器也可以替换为立体声解码器等。
下面以一个具体的场景为例,对音频信号的编码过程进行介绍。三维音频由于能够带给用户更好的浸入式体验,成为音频服务发展的新趋势,三维音频可以理解为包括了多声道的音频。实现三维音频服务,需要进行压缩编码的原始音频信号格式可以分为:基于声道的音频信号格式,基于对象的音频信号格式,基于场景的音频信号格式,以及任意三种音频信号格式的混合信号格式。对于前述各个格式的音频信号,音频编码器需要进行压缩编码的音频信号包括了多路信号,也可以理解为多个通道。通常情况下,音频编码器利用通道间的相关性将多路信号进行下混,得到下混信号和多通道编码参数。通常情况下,下混信号所包括的通道数远小于输入的音频信号的通道数量,例如,可以将多通道信号下混为立体声信号。然后,利用对下混信号进行编码。还可以选择将立体声信号进一步下混为单声道信号和立体声编码参数,对瞎混后的单声道信号进行编码。编码下混信号和多通道编码参数所用的比特数远小于独立编码多通道输入信号,因此可以降低编码器的工作量,以及编码后得到的编码码流的数据量,提高传输效率。
此外,为了降低编码比特率,往往进一步利用不同频带信号间的相关性进行编码。编码设备对低频带信号和低频带信号与高频带之间的相关性数据进行编码,以便用较少的比特数对高频带信号进行编码,从而降低整个编码器的编码比特率。例如,在第三代合作伙伴项目(3rd generation partnership project,3GPP)中的增强语音服务(Enhance Voice Services,EVS)编解码器,或者动态图像专家组(moving picture experts group, MPEG)编解码器等在编码过程中,利用了不同频带信号间的相关性,采用频带扩展技术或者频谱复制技术,对高频带信号进行编码。但在实际的音频信号中,高频带的频谱中往往存在一些与低频带的频谱不相似的音调成分,若未对该不相似的音调成分进行编码以及重建,则可能导致对音视频的编解码质量不高。
因此,本申请提供一种音频信号编码方法和解码方法,用于提高音频信号的编解码质量,即使在高频带的频谱中存在与低频带的频谱不相似的音调成分的场景中,也可以得到高质量的编码码流,从而使解码端可以解码得到高质量的音频信号,提高用户体验。
下面分别对本申请提供的音频信号编码方法和解码方法进行详细说明。
首先,对本申请提供的音频信号编码方法进行介绍。参阅图5,本申请提供的一种音频信号编码方法的流程示意图,如下所述。
501、获取音频信号的当前帧。
其中,当前帧可以是音频信号中的任意一个帧,当前帧中可以包括高频带信号和低频带信号,高频带信号的频率高于低频带信号的频率。其中,高频带信号和低频带信号的划分可以通过频带阈值确定,高于该频带阈值的信号为高频带信号,低于该频带阈值的信号为低频带信号,对于频带阈值的确定可以根据传输带宽以及编码器或者解码器的处理能力来确定,本申请对此不作限定。
其中,高频带信号和低频带信号是相对而言的,例如低于某个频率(即频带阈值)的信号为低频带信号,但是高于该频率的信号为高频带信号(该频率对应的信号既可以划到低频带信号,也可以划到高频带信号)。该频率根据当前帧的带宽不同会有不同。例如,在当前帧为0-8khz的宽带信号时,该频率可以为4khz;在当前帧为0-16khz的超宽带信号时,该频率可以为8khz。
需要说明的是,本申请实施例中的音频信号中可以包括多个帧,例如当前帧可以特指音频信号中的某一个帧,本申请实施例中以音频信号的当前帧的编解码进行示例说明,音频信号中当前帧的前一帧或者后一帧都可以根据该当前帧音频信号的编解码方式进行相应的编解码,对于音频信号中当前帧的前一帧或者后一帧的编解码过程不再逐一说明。另外,本申请实施例中的音频信号可以是单声道音频信号,或者,也可以为立体声信号(也可以为多声道信号)。其中,立体声信号可以是原始的立体声信号,也可以是多声道信号中包括的两路信号(左声道信号和右声道信号)组成的立体声信号,还可以是由多声道信号中包含的至少三路信号产生的两路信号组成的立体声信号,本申请实施例中对此并不限定。
还需要说明的是,本申请实施方式中,音频信号可以是多通道(multi-channel)信号,也可以是单通道信号。当音频信号为多通道信号时,可以对每个通道的信号进行编码,在本申请实施方式中,仅以其中一个通道(以下称为当前通道)的信号的编码过程为例进行示例性说明,在实际应用中,对于音频信号中的每个通道都可以执行以下步骤502-506,本申请对重复的步骤不再赘述。应理解,本申请所提及的声道也可以替换为通道,例如前述的多声道也可以替换为多通道,为便于理解,以下实施方式中称为通道。
502、根据高频带信号、低频带信号和预设的频带扩展的配置信息得到当前帧的频带扩展的参数。
其中,在对高频带信号和低频带信号进行编码的过程中,可以将高频带划分为多个频率区域。频带扩展的参数可以是以频率区域为单位确定的,也就是说各个频率区域都有自己的频带扩展的参数。
具体地,频带扩展的参数在不同的场景中可能包括了不同的参数,具体可以根据实际应用场景确定频带扩展的参数具体包括的参数。例如,在时域带宽扩展的场景中,该频带扩展的参数可以包括高频带线性预测编码(linear predictive coding,LPC)参数、高频带增益或滤波参数等。在频域带宽扩展的场景中,该频带扩展的参数还可以包括时域包络或频域包络等参数。
频带扩展的配置信息可以是预先配置的信息,具体可以根据编码器或者解码器的数据处理能力确定。在一种可能的实施方式中,频带扩展的配置信息可以包括频带扩展上限或者第二数量等,该第二数量为进行频带扩展的频率区域的数量。具体地,频带扩展对应的第二频率范围可以通过频带扩展上限或者第二数量指示的,例如,第二频率范围的频率下限通常可以是固定的,如前述步骤501中的频带阈值,可以通过频带扩展上限指示第二频率范围的频率上限,从而可以根据确定的频率下限和频率上限来确定第二频率范围。又例如,若配置信息中包括第二数量,通常第二频率范围的频率下限通常可以是固定的,如前述步骤501中的频带阈值,则可以通过预设的表格,查询到第二频率对应的频率区域的边界,从而确定第二频率范围。
具体地,频带扩展的配置信息中包括的频带扩展上限可以包括但不限于以下一个或多个:第二频率范围内的最高频率的值、最高频点序号、最高频带序号或最高频率区域序号。其中,第二频率范围内的最高频点序号为第二频率范围内频率最高的频点的序号,最高频带序号为第二频率范围内频率最高的频带的序号,最高频率区域序号为第二频率范围内频率最高的频率区域的序号。其中,前述的最高频点序号、最高频带序号和最高频率区域序号,可以是随着频率的值的增加而增加的,例如,频率较低频点的序号小于频率较高的频点的序号,频率较低的频带的序号小于频率较高的频带的序号,频率较低的频率区域的序号小于频率较高的频率区域的序号。需要说明的是,对于频点、频带或者频率区域的编号,可以是根据预设的顺序进行编号,也可以是为每个频点、频带或者频率区域分配固定的编号,具体可以根据实际应用场景进行调整,本申请对此不作限定。
此外,根据高频带信号、低频带信号和频带扩展的配置信息,除了可以得到当前帧的频带扩展的参数之外,还可以获取到高频带信号或者低频带信号的编码参数,例如,可以获取高频带信号或者低频带信号的时域噪声整形参数、频域噪声整形参数或频谱量化参数等,其中,时域噪声整形参数和频域噪声整形参数用于对待编码的频谱系数进行预处理,可以提高频谱系数的量化编码效率,频谱量化参数为量化后的频谱系数和对应的增益参数等。
503、获取频率区域信息
其中,频率区域信息用于指示当前帧的高频带信号中的第一频率范围。
本申请实施方式中,将需要进行音调成分检测的频率范围称为第一频率范围,将配置信息指示的频带扩展对应的频率范围称为第二频率范围,且第一频率范围的频率下限和第 二频率范围的频率下限相同,以下不再赘述。
在一种可能的实施方式中,频率区域信息包括以下一种或多种:第一数量、标识信息、关系信息或者频率区域变化数量等等。
该第一数量为第一频率范围内的频率区域的数量。
需要说明的是,在本申请中,可以将频率范围划分为频率区域(tile);每个频率区域又可以按照预设的频带划分方式划分为至少一个频带,一个频带可以理解为一个尺度因子带(scale factor band,SFB)。例如,可以以1KHz为单位划分频率区域,然后在每个频率区域内,以200Hz为单位划分频带。可以理解的是,不同频率区域的对应的频率宽度可以是相同的,也可以是不同的;不同频带对应的频率宽度可以是相同的,也可以是不同的。
标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同。例如,当该标识信息包括0时,表示第一频率范围与第二频率范围不相同,该标识信息包括1时,表示第一频率范围与第二频率范围相同。
关系信息用于指示第一频率范围与第二频率范围之间的大小关系。例如,可以用2比特来指示第一频率范围与第二频率范围之间的大小关系,如相同、增加或者减少等关系。例如,当关系信息包括00时,表示第一频率范围与第二频率范围相等,当关系信息包括01时,表示第一频率范围大于第二频率范围;当关系信息包括10时,表示第一频率范围小于第二频率范围等。
频率区域变化数量为第一频率范围与第二频率范围之间存在差异的频率区域的数量。例如,频率区域变化数量的范围可以是[-N,N],N表示第一频率范围比第二频率范围多N个频率区域,-N表示第一频率范围比第二频率范围少N个频率区域。
通常,在实际应用场景中,频率区域信息中至少包括第一数量,可选地,频率区域信息中还包括但不限于标识信息、关系信息或者频率区域变化数量等中的一个或多个。
此外,通过频率区域信息指示第一频率范围,可以理解为:当频率区域信息中包括第一数量时,可以通过查询预设的表格,确定第一数量的频率区域中每个频率区域的边界,即每个频率区域所覆盖的频率范围,从而得到第一频率范围。其中,第一数量个的频率区域中的第一个频率区域的下边界就是进行频带扩展的第二频率范围的下边界。可以理解的是,在第一数量个频率区域在频域上是连续的时候,也可以仅根据其中的第一个频率区域的下边界以及最后一个频率区域的上边界来确定所述第一频率范围。
此外,当频率区域信息中包括标识信息时,若该标识信息指示第一频率范围和第二频率范围相同,则可以将第二频率范围作为第一频率范围。若标识信息指示第一频率范围和第二频率范围不相同,则可以通过关系信息确定第一频率范围和第二频率范围之间的大小关系,例如,第一频率范围大于第二频率范围,或者第二频率范围大于第一频率范围等。当然,若标识信息指示第一频率范围和第二频率范围相同,频率区域信息也可以包括关系信息,此时该关系信息也可以指示第一频率范围和第二频率范围相同。当根据标识信息或者关系信息确定第一频率范围和第二频率范围不相同时,可以根据关系信息确定第一频率范围和第二频率范围的大小关系,然后根据频率区域变化数量,确定第一频率范围和第二频率范围之间不同的频率范围内的频率区域的数量,然后根据预设方式,如查表、按照预 设带宽规划等方式,确定第一频率范围具体的范围。例如,若第一频率范围和第二频率范围不相同,则可以根据关系信息确定第一频率范围和第二频率范围中哪个频率范围更大,如若第一频率范围大于第二频率范围,则可以根据第一频率范围和第二频率范围不重合的部分的频率区域的数量,查询预设的表格,或者,按照预设的带宽进行划分,从而得到第一频率范围和第二频率范围不重合的部分的频率区域的边界,从而确定出第一频率范围所覆盖的准确的频率范围。
具体地,获取频率区域信息的方式有多种,下面分别进行介绍。
方式一、根据音频信号的采样频率以及预设的频带扩展的配置信息确定频率区域信息
其中,频率区域信息至少包括第一数量,音频信号的通道数量为至少一个,下面以该至少一个通道中的当前通道为例,对步骤503进行示例性说明。步骤503具体可以包括:根据当前帧的编码速率、音频信号的通道数量、采样频率、频带扩展上限或者第二数量中的一个或多个,确定当前通道的第一数量。
具体地,可以根据当前通道的第一判断标识确定第一数量,也可以根据第二判断标识确定第一数量,还可以根据当前通道的第一判断标识和第二判断标识确定第一数量。在此之前,可以根据当前帧的编码速率和通道数量,确定当前帧中的每个通道的第一判断标识,其中包括当前通道的第一判断标识,或者,根据采样频率和频带扩展上限,确定第二判断标识。其中,当前帧的编码速率为当前帧中的所有通道的总编码速率。
更具体地,获取到当前通道的第一判断标识的具体方式可以包括但不限于以下一种或多种:
1、根据当前帧的编码速率和通道数量得到当前帧中的每个通道的平均编码速率;对比平均编码速率和第一阈值,得到当前通道的第一判断标识。例如,可以使用当前帧的编码速率除以通道数,即可得到每个通道的平均编码速率。对比该平均编码速率和第一阈值,并根据对比结果得到当前通道的第一判断标识。例如,当平均编码速率高于24kbps(即24000比特每秒)(即第一阈值,也可以是其他值,如32kbps或128kbps等)时,当前通道的第一判断标识的值确定为1,当平均编码速率不高于24kbps时,当前通道的第一判断标识确定为0。
2、根据当前帧的编码速率和通道数量确定当前帧中的每个通道的实际编码速率;对比每个通道实际编码速率和第二阈值,得到每个通道的第一判断标识。可以理解为,可以根据当前帧的总编码速率,为每个通道分配实际编码速率。对比每个通道的实际编码速率与第二阈值,即可得到每个通道的第一判断标识。其中,确定每个通道的实际编码速率的方式可以包括多种方式,例如,可以随机为每个通道分配编码速率,也可以按照每个通道的数据大小为每个通道分配编码速率,通道的数据量越大,分配到的编码速率也就越大,或者,还可以按照固定的方式为每个通道分配编码速率等等,具体的分配方式可以按照实际应用场景进行调整。例如,当前音频信号可用的总的编码速率(即当前帧的编码速率)为256kbps,且该音频信号为具有三个通道,如通道1、通道2和通道3,则可以为该三个通道分配编码速率,如为通道1分配192kbps,为通道2分配44kbps,为通道3分配20kbps等。然后,对比每个通道实际编码速率和64kbps(即第二阈值),当当前通道的实际编码 速率高于64kbps时,该当前通道的第一判断标识的值确定为1,当当前通道的实际编码速率不高于64kbps时,则当前通道的第一判断标识确定为0,得到的通道1的第一判断标识的值为1,通道2和通道3的第一判断标识的值为0。
更具体地,获取到当前通道的第二判断标识的具体方式可以包括:当频带扩展上限包括最高频率的值时,比较频带扩展上限包括的最高频率的值与音频信号的最高频率的值是否相同,确定第二判断标识,音频信号的最高频率通常是采样频率的一半,当然,采样频率也可以设定为大于最高频率的2倍;或者,当频带扩展上限包括最高频带序号时,比较频带扩展上限所包括的最高频带序号与音频信号的最高频带序号是否相同,确定第二判断标识,音频信号的最高频带序号由采样频率确定,该音频信号的最高频带序号可以是音频信号的最高频率所在的频带的序号。此外,还可以比较频带扩展上限所包括的最高频点序号和音频信号的最高频点序号是否相同,或者比较频带扩展上限所包括的最高频率区域序号和音频信号的最高频率区域序号是否相同等,来确定第二判断标识等。
此外,当频带扩展上限所包括的数据与获取到的音频信号的最高频率的数据的类型不相同,则可以将频带扩展上限所包括的数据和获取到的音频信号的最高频率的数据转换为相同的类型,之后再对相同类型的数据进行对比,得到第二判断标识。例如,当频带扩展上限中包括最高频频率的值,而获取到音频信号的最高频点序号,则可以确定音频信号的最高频点序号对应的最高频率的值,对比频带扩展上限中包括最高频率的值和确定的音频信号的对应的最高频率的值,从而得到第二判断标识。
确定第二判断标识的具体方式例如,若频带扩展上限包括的最高频率的值等于音频信号的最高频率,则第二判断标识的值可以是0,否则第二判断标识的值为1。又例如,对频带扩展上限对应的频带序号与音频信号的最高频带序号进行比较,当频带扩展上限包括的最高频带序号与音频信号的最高频带序号相等时,第二判断标识的值可以是0,否则第二判断标识的值为1。通常,频带扩展上限对应的最高频率不超过音频信号的最高频率。
进一步地,确定第一数量的具体方式可以包括:
若当前通道的第一判断标识和第二判断标识都符合预设条件,则在第二数量的基础上增加一个或者多个频率区域,作为当前通道的第一数量,增加的频率区域的具体数量可以根据实际应用场景进行调整。具体地,该预设条件可以是:满足当前通道的平均编码速率大于第一阈值,或当前通道的实际编码速率大于第二阈值其中之一,并且,满足频带扩展上限所包括的最高频带序号不等于音频信号的最高频带序号,或频带扩展上限所包括的最高频带序号不等于音频信号的最高频带序号,或频带扩展上限所包括的最高频点序号不等于音频信号的最高频点序号的其中之一。
例如,可以根据音频信号的最高频率和频带扩展上限之间的差值确定增加的频率区域的数量,将音频信号的最高频率和频带扩展上限之间的差值划分为一个或者多个频率区域,从而使第一频率范围的频率上限高于频带扩展上限对应的最高频率,从而可以检测到高频带信号中更多的音调成分的信息。具体例如,前述的预设条件可以是第一判断标识和第二判断标识都为1,若当前通道的第一判断标识和第二判断标识均为1,则在第二数量的基础上增加一个或者多个频率区域,作为当前通道的第一数量。其中,增加的所述一个或多个 频率区域可以根据预先设置的划分方式对第一频率范围高于频带扩展上限的部分进行划分得到的。
若第一判断标识和第二判断标识中至少有一个不满足预设条件,则将第二数量作为第一数量。可以理解为,当音频信号的最高频率在第二频率范围内时,可以直接将第二频率范围作为第一频率范围,并对第一频率范围进行音调成分检测,也可以实现对高频带信号中的音调成分的更全面的检测。
为便于理解,下面以具体的应用场景为例对确定当前通道的第一数量的确定方式进行示例性说明。
通常,是否在第二数量的基础上增加附加频率区域(tile)得到当前通道的第一数量,可以由以下两个条件共同确定:
1、在音频信号的整体编码速率较低时,附加tile引入的比特消耗可能对编码效果产生负面影响,可能减低编码效率或者编码质量。因此,可以首先根据每个通道(channel)的编码速率选择是否需要增加一个附加tile。设编码器总速率为bitrate_tot,channel数量为n_channels,则每个channel的比特数为bitrate_ch=bitrate_tot/n_channels。或者,bitrate_ch也可以是根据bitrate_tot对每个channel分别分配得到。将bitrate_ch与预设的第一阈值比较,若超过第一阈值则设置标志flag_addTile(即第一判断标识)=1,否则设定flag_addTile=0。
2、可以比较频带扩展处理,例如智能间隙填充(IGF,Intelligent Gap Filling,)截止SFB序号和总SFB数量,判断IGF对应的频率范围是否可以覆盖音频信号的全频带,若不能覆盖音频信号的全频带,则增加一个或多个tile。
结合前述两个条件的确定是否增加tile的方式如下:
Figure PCTCN2021085920-appb-000001
其中igfStopSfb为IGF截止SFB序号,nr_of_sfb_long为总SFB数量,flag_addTile为第一判断标志,num_tiles为IGF频带中的tile数量,num_tiles_detect为进行音调成分检测的tile数量。
在一种可能的实施方式中,第一频率范围内的频率区域的数量还可以是预设数量。具体地,该预设数量可以是由用户确定,也可以是根据经验值确定,具体可以根据实际应用场景进行调整。
可选地,当第一频率范围内的频率区域的数量为预设数量时,该预设数量可以写入配置码流中,也可以不写入配置码流中。例如,编码设备和解码设备之间可以默认频率区域的数量为第二频率范围内所所包括的频率区域的数量加N,N可以是预先设定的正整数。
此外,除了获取当前通道的第一数量之外,还可以获取当前通道的其他的信息,如标识信息、关系信息或者频率区域变化数量等。例如,可以对比第一频率范围和第二频率范 围是否相同,从而得到标识信息;可以对比第一频率范围与第二频率范围之间的大小关系,从而得到关系信息;可以对比第一数量和第二数量之间的差值,从而得到频率区域变化数量等。
方式二、获取前一帧或者音频信号的第一帧所使用的频率区域信息作为当前帧的频率区域信息。
其中,频率区域信息可以是在对当前帧的前一帧进行编码时通过前述的方式一获取得到,在获取到当前帧时,即可直接读取该频率区域信息;频率区域信息也可以是在对音频信号的第一帧进行编码时通过方式一获取的,例如,音频信号中所包括的所有帧都可以使用相同的频率区域信息进行编码,从而降低编码设备的工作量,提高编码效率。
因此,本申请实施方式中,可以通过多种方式获取到频率区域信息,可以实时动态地通过方式一确定每一帧所使用的频率区域信息,使频率区域信息所指示的频率范围可以适应性地覆盖每一帧中高频带信号与低频带信号音调成分不相似的频率范围,提高编码质量;也可以多帧共用相同的频率区域信息,降低计算频率区域信息的工作量,提高编码质量以及编码效率。因此,本申请提供的音频信号编码方法可以灵活地适应更多的场景。
此外,除了确定需要进行音调成分检测的频率区域的第一数量之外,还可以根据频率区域信息确定需要进行音调成分检测的各个频率区域的边界,即第一频率区域边界,从而可以更准确地确定第一频率范围。可以理解为,在确定第一频率范围内的频率区域的数量之后,还需要确定第一频率范围内的各个频率区域的划分方式。
具体地,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同;当第一数量小于或等于第二数量时,第一频率范围内的频率区域的分布与配置信息中指示的第二频率范围内的频率区域的分布相同,即第一频率范围内的频率区域的划分方式与第二频率范围内的频率区域的划分方式相同。当第一数量大于第二数量时,第一频率范围的频率上限大于第二频率范围的频率上限,即第一频率范围覆盖且大于第二频率范围,第一频率范围与第二频率范围重合部分的频率区域的分布与第二频率范围内的频率区域的分布相同,即第一频率范围与第二频率范围重合部分的频率区域的划分方式与第二频率范围内的频率区域的划分方式相同,第一频率范围与第二频率范围的不重合部分内的频率区域的分布是根据预设方式确定的,即第一频率范围与第二频率范围的不重合部分内的频率区域根据预设方式划分。
可以理解的是,通常频带扩展的频率区域的划分方式为预先配置的,即配置信息中可以包括第二频率范围内的各个频率区域的划分,当第一数量小于或者等于频带扩展对应的第二数量时,可以按照第二频率范围内的频率区域划分方式对第一频率范围进行划分,从而得到第一频率范围内的各个频率区域。例如,若第二频率范围内的频率区域以1KHz为单位进行划分,则也可以以1KHz为单位对第一频率范围进行划分,得到第一频率范围内的一个或多个频率区域。当第一数量大于频带扩展对应的第二数量时,可以确定第一频率范围的频率上限大于第二频率范围的上限,第一频率范围可以完全覆盖并大于第二频率范围,第一频率范围中与第二频率范围重合的部分可以按照第二频率范围内的频率区域划分方式进行划分,第一频率范围中与第二频率范围不重合的部分,即第一数量与第二数量的差值 对应的频率区域,可以按照预设的方式进行划分,从而准确地确定出需要进行音调成分检测的第一频率范围所包括的各个频率区域的边界。该预设方式可以包括按照预设的宽度、频率区域的频率上限等。
示例性地,为便于理解,第一数量小于或者等于第二数量的场景可以参阅图6A,其中,第一频率范围内的频率区域的划分方式与第二频率范围内的频率区域划分方式相同。第一数量大于第二数量的场景可以参阅图6B,其中,第一频率范围中与第二频率范围重合的部分的频率区域划分方式与第二频率范围内的频率区域划分方式相同,而对于第一频率范围相对于第二频率范围多出的一个或多个频率区域,即第一数量和第二数量之间的差值对应的频率区域的划分,则可以按照预设方式进行划分,即对于第一频率范围和第二频率范围不重合部分,其频率区域的划分方式可以与重合部分的频域划分方式相同或者不相同。例如,可以将不重合的部分划分为一个或者多个频率区域,当然,也可以将不重合的部分划分至重合部分的最后一个频率区域中,如图6C所示。
其中,若将不重合的部分划分为一个或者多个频率区域,其划分的频率区域需要满足的条件可以包括:该频率区域的频率上限小于或等于音频信号的最高频率,通常是该频率区域的频率上限小于或等于音频信号的最高频率,且频率区域的宽度小于或等于预设值。
可以理解的是,前述的频率区域信息中所包括的频率区域变化数量,即为第一频率范围和第二频率范围不重合部分所包括的频率区域的数量。
在一种具体的场景中,可以对频率区域中的频带进行编号,则不重合部分内的频率区域的频率上限对应的频带序号小于或等于音频信号的最高频率对应的频带序号,且不重合部分内的频率区域的宽度小于或等于预设值,音频信号的最高频率对应的频带序号由采样频率和频带划分方式确定。
应理解,对于相邻的两个频率区域,频率较低的频率区域的频率上限,为频率较高的频率区域的下限。
因此,在本申请实施方式中,确定了第一频率范围内的频率区域的数量,以及各个频率区域的划分方式,从而使得后续在进行音调成分检测时,可以按照频率区域进行检测,得到更全面的音调成分检测。例如,可以以频率区域为单位进行音调成分检测,或者,以频率区域内的频带为单位进行音调成分检测等。
可以理解为,在确定第一频率范围所包括的频率区域的第一数量之后,还确定第一频率范围所包括的各个频率区域的边界。具体地,确定第一频率范围所包括的各个频率区域的边界方式可以包括:若第一数量小于或等于第二数量,则按照第二频率范围内的各个频率区域的边界确定第一频率范围所包括的各个频率区域的边界。若第一数量大于第二数量,则对于第一频率范围中与第二频率范围重合的部分,可以按照第二频率范围内的各个频率区域的边界确定第一频率范围所包括的各个频率区域的边界,对于第一频率范围中与第二频率范围不重合的部分,可以按照预设划分方式划分频率区域,并确定该频率区域的边界。
具体地,确定第一频率范围内的各个频率区域的边界的方式可以包括:若第一数量小于或等于第二数量,则将频带扩展对应的第二频率范围内各个频率区域的边界作为第一频率范围内的各个频率区域的边界;若第一数量大于第二数量,则将第二频率范围内各个频 率区域的边界,作为第一频率范围内的至少一个低频区域的边界,并根据预设方式确定至少一个高频区域的边界,低频区域为第一频率范围中,频率上限低于频带扩展上限的频率区域,高频区域为第一频率范围中,频率下限高于或等于频带扩展上限的频率区域。
其中,以至少一个高频区域中的第一频率区域为例进行示例性说明,根据预设方式确定至少一个高频区域的边界具体可以包括:将与第一频率区域相邻,且频率低于第一频率区域的频率区域的频率上限作为第一频率区域的频率下限,根据预设方式确定第一频率区域的频率上限,第一频率区域包括于至少一个高频区域中;其中,第一频率区域的频率上限小于或等于音频信号的最高频率,且第一频率区域的宽度小于或等于预设值;或者,第一频率区域的频率上限对应的频带序号小于或等于音频信号的最高频率对应的频带序号,且第一频率区域的宽度小于或等于预设值,音频信号的最高频率对应的频带序号由采样频率和预设的频带划分方式确定。
下面以具体的应用场景为例对确定第一频率范围内的各个频率区域的方式进行示例性说明。
通常,在确定了需要进行音调成分检测的tile数量之后,还需要先根据进行音调成分检测的tile数量确定音调成分检测的tile边界。tile边界可以是边界的SFB序号,也可以是边界的频率,也可以两者均包括。
为提高音调成分检测和编码效率,新增tile无需覆盖从IGF截止频率到Fs/2的整个剩余高频带,因此可以限定新增tile的最大宽度为128个频点,即频率区域的宽度小于或等于预设值。其中,Fs为采样频率。
示例性地,新增tile的宽度的确定方式、以及tile分带表和tile-sfb对应表的更新方式如下:
Figure PCTCN2021085920-appb-000002
其中,igfStopSfb为IGF的截止SFB序号,sfbIdx为SFB序号,tileWidth_new为新增tile的宽度,nr_of_sfb_long为总SFB数量,sfb_offset为SFB边界,第i个SFB的下限是sfb_offset[i],上限是sfb_offset[i+1],tile_sfb_wrap代表的是tile和sfb的对应关系,第i个tile的起始SFB序号是tile_sfb_wrap[i],终止SFB序号是 tile_sfb_wrap[i+1]-1。
因此,本申请实施方式中,可以确定第一频率范围内的各个频率区域的边界,从而可以更准确地进行音调成分检测。
504、在第一频率范围进行音调检测以获取高频带信号的音调成分的信息。
其中,在确定了频率区域信息所指示第一频率范围之后,对该第一频率范围进行音调成分检测,得到高频带信号的音调成分信息。
具体地,该音调成分信息可以包括音调成分的位置数量参数、以及所述音调成分的幅度参数或能量参数。或者,音调成分的信息还包括高频带信号的噪声基底参数。其中,位置数量参数表示由同一个参数表示音调成分的位置和音调成分的数量。在另一种实施方式中,音调成分信息可以包括音调成分的位置参数、音调成分的数量参数以及所述音调成分的幅度参数或能量参数;在这种情况下,音调成分的位置和数量采用不同的参数表示。
更具体地,频率区域信息中所指示的第一频率范围可以包括一个或者多个频率区域(tile),一个频率区域可以包括一个或者多个频带,一个频带可以包括一个或多个子带。步骤504具体可以包括:根据高频带信号中第一数量的频率区域中的当前频率区域的高频带信号,确定当前频率区域的音调成分的位置数量参数和当前频率区域的音调成分的幅度参数或能量参数等。
除了以频率区域为单位进行音调成分检测之外,还可以以频带为单位,或者以子带为单位进行音调成分的检测,此处不再赘述。
在确定当前频率区域的音调成分的信息之前,可以确定当前区域中是否包括音调成分,在当前频率区域内包括音调成分时,才根据当前频率区域的高频带信号,确定当前频率区域的音调成分的位置数量参数和当前频率区域的音调成分的幅度参数或能量参数。从而仅获取具有音调成分的频率区域的参数,从而提高编码效率。
相应地,当前帧的音调成分的信息还包括音调成分指示信息,音调成分指示信息用于指示当前频率区域内是否包括音调成分。使得音频解码器可以根据该指示信息进行解码,提高解码效率。
其中,在一个实施方式中,根据当前频率区域的高频带信号,确定当前频率区域的音调成分的信息,可以包括:根据至少一个频率区域中的当前频率区域的高频带信号在当前频率区域内进行峰值搜索,以获得当前区域的峰值数量信息、峰值位置信息以及峰值幅度信息中的至少一种;根据当前频率区域的峰值数量信息、峰值位置信息以及峰值幅度信息中的至少一种,确定当前频率区域的音调成分的位置数量参数和当前频率区域的音调成分的幅度参数或能量参数。
其中,进行峰值搜索的高频带信号可以是频域信号,也可以是时域信号。
具体地,在一个实施方式中,峰值搜索具体可以根据当前频率区域的功率谱、能量谱或幅度谱中的至少一种进行。
其中,在一个实施方式中,根据当前频率区域的峰值数量信息、峰值位置信息以及峰值幅度信息中的至少一种,确定当前频率区域的音调成分的位置数量参数和当前频率区域的音调成分的幅度参数或能量参数,可以包括:根据当前频率区域的峰值数量信息、峰值 位置信息以及峰值幅度信息中的至少一种,确定当前频率区域的音调成分的位置信息,数量信息以及幅度信息;根据当前频率区域的音调成分的位置信息,数量信息以及幅度信息确定当前频率区域的音调成分的位置数量参数和当前频率区域的音调成分的幅度参数或能量参数。
505、对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流。
其中,在得到频带扩展的参数和高频带信号的音调成分的信息之后,可以对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流。
具体地,在进行码流复用时,除了对频带扩展的参数和音调成分的信息进行码流复用之外,还可以结合低频带信号或者高频带信号的其他信息进行码流复用,如结合低频带的编码参数、时域噪声整形参数、频域噪声整形参数或频谱量化参数等进行码流复用,从而得到高质量的载荷码流。
具体地,在进行码流复用时,可以通过信号类型信息来指示某一个频率区域或者某一个频带是否存在音调成分,若不存在音调成分,则可以在码流中写入指示某一个频率区域或者频带不存在音调成分的信号类型信息,从而指示某一个频率区域或者频带不存在音调成分,提高解码效率;若存在音调成分,则需要将音调成分的信息写入码流中,同时还将指示哪些频率区域存在音调成分的信号类型信息写入码流中,以及将频带扩展的参数或者时域噪声整形参数、频域噪声整形参数或频谱量化参数等写入码流中,提高编码质量。
506、对频率区域信息进行码流复用,得到配置码流。
其中,在得到频率区域信息之后,可以对频率区域信息进行码流复用,得到配置码流。
具体地,可以将频率区域信息写入配置码流中,以使解码设备可以根据配置码流中所包括的频率区域信息对音频信号进行解码,从而可以对频率区域信息指示的频率范围的音调成分进行重建,得到高质量的解码数据。
需要说明的是,本申请实施例中的步骤506为可选步骤,可以在对音频信号的第一帧进行码流复用时,即可执行步骤506,而无需再对每一帧进行码流复用时执行本步骤506,即音频信号中的多帧可以共用相同的频率区域信息,从而降低占用的资源,提高编码效率。当然,也可以在对每一帧进行编码时,执行步骤506,本申请对此并不作限定。
可以理解的是,载荷码流中可以携带音频信号的各个帧的具体信息,配置码流中可以携带音频信号中各个帧共用的配置信息。载荷码流和配置码流可以是相互独立的码流,也可以包括于同一码流中,即载荷码流和配置码流可以是同一码流中的不同部分,具体可以根据实际应用场景进行调整,本申请对此并不作限定。
因此,本申请实施方式中,可以根据频率区域信息所指示的频率范围进行音调成分的检测,从而使检测得到的音调成分的信息可以覆盖更多的高频带信号与低频带信号之间音调成分不相似的频率范围,从而提高编码质量。
前述对本申请提供的音频编码的方法进行了详细介绍,下面对本申请提供的解码方法进行详细阐述。
参阅图7,本申请提供的一种解码方法的流程示意图,如下所述。
701、获取载荷码流。
其中,该载荷码流可以参阅前述步骤505中的相关描述,此处不再赘述。
702、对载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息。
其中,在获取到载荷码流之后,对该码流进行码流解复用,以得到音频信号的当前帧的频带扩展参数和音调成分信息。
具体地,音调成分的信息可以包括音调成分的位置数量参数、以及所述音调成分的幅度参数或能量参数。其中,位置数量参数表示由同一个参数表示音调成分的位置和音调成分的数量。在另一种实施方式中,音调成分的信息包括音调成分的位置参数、音调成分的数量参数以及所述音调成分的幅度参数或能量参数;在这种情况下,音调成分的位置和数量采用不同的参数表示。
在一种可能的实施方式中,高频带信号对应的频率范围中包括至少一个频率区域,一个频率区域包括至少一个频带,一个频带包括至少一个子带;相应地,音调成分的信息包括当前帧的高频带信号的音调成分的位置数量参数包括至少一个频率区域各自的音调成分的位置数量参数,当前帧的高频信号的音调成分的幅度参数或能量参数包括至少一个频率区域各自的音调成分的幅度参数或能量参数,可以理解为,音调成分的信息可以是以频率区域为单位的,当然,也可以是以频带为单位或者以子带为单位等,具体可以根据实际应用场景进行调整。
在一种可能的实施方式中,对载荷码流进行码流解复用,以得到音频信号的当前帧的音调成分的信息包括:获取至少一个频率区域的当前频率区域或者当前频带的音调成分的位置数量参数;根据当前频率区域或者当前频带的音调成分的位置数量参数从载荷码流中解析当前频率区域或者当前频带的音调成分的幅度参数或能量参数。
此外,对载荷码流进行码流解复用,除了可以得到音频信号的当前帧的频带扩展的参数和音调成分的信息之外,还可以获取到低频带信号相关的参数,如:低频带编码参数、时域噪声整形参数、频域噪声整形参数、频谱量化参数等。
需要说明的是,本申请实施方式中,音频信号可以是多通道信号,也可以是单通道信号。当音频信号为多通道信号时,可以对每个通道的信号的载荷码流进行解复用以及信号重建等,在本申请实施方式中,仅以其中一个通道(以下称为当前通道)的信号的编码过程为例进行示例性说明,在实际应用中,对于音频信号中的每个通道都可以执行步骤702-707,本申请对重复的步骤不再赘述。
703、根据频带扩展的参数得到当前帧的高频带信号。
其中,该频带扩展的参数可以参阅前述步骤502中的相关描述,此处不再赘述。
具体地,在时域扩展的场景中,可以根据频带扩展的参数,如高频带LPC参数、高频带增益或滤波参数等进行时域扩展,得到高频带信号。又或者,在频域扩展的场景中,可以根据时域包络或频域包络等参数,进行频域扩展,得到高频带信号。
此外,还可以根据对码流解复用得到的低频带的编码参数进行解码,得到低频带信号。在根据频带扩展的参数进行频带扩展时,还可以结合低频带信号对高频带信号进行恢复,得到更准确的高频带信号。可以理解为,在对载荷码流进行解复用之后,可以得到低频带 信号与高频带信号之间的相关的信息,在得到低频带信号之后,可以根据低频带信号和低频带信号与高频带之间的相关的信息,对高频带信号进行恢复,从而得到高频带信号。
704、获取配置码流。
其中,可以接收编码设备发送的配置码流,该配置码流中可以包括编码设备进行编码时的部分配置参数。该配置码流可以参阅前述步骤506中的相关描述,此处不再赘述。
705、根据配置码流获取频率区域信息。
其中,在得到配置码流之后,即可对配置码流进行解复用,得到频率区域信息。
该频率区域信息可以参阅前述步骤503中的相关描述,此处不再赘述。
需要说明的是,本申请中的步骤704-705为可选步骤,可以是在接收到音频信号的某一帧对应的码流时执行步骤704-705,即多帧可以共用频率区域信息,也可以是在接收到音频信号的每一帧对应的码流都执行步骤704-705,具体可以根据实际应用场景进行调整。
此外,编码设备也可以将频带扩展的配置信息通过配置码流发送至解码设备,也可以是编码设备和解码设备共用预设的配置信息,具体可以根据实际应用场景进行调整。
706、根据音调成分的信息和频率区域信息进行重建,得到重建音调信号。
在获取到频率区域信息之后,根据音调成分信息对频率区域信息指示的频率范围进行重建,得到重建音调信号。
本申请以下实施方式中,将需要进行音调成分重建的频率范围称为第一频率范围,将频带扩展对应的频率范围称为第二频率范围,且第一频率范围的频率下限和第二频率范围的频率下限相同,以下不再赘述。
其中,第一频率范围可以划分为一个或多个频率区域,一个频率区域可以包括一个或多个频带。根据音调成分的信息和频率区域信息进行重建,具体可以包括:根据频率区域信息,确定需要进行音调成分重建的频率区域的数量为第一数量;根据第一数量,确定第一频率范围内进行音调成分重建的各个频率区域;在第一频率范围内,根据音调成分的信息对音调成分进行重建,得到重建音调信号。
更具体地,根据第一数量,确定第一频率范围内进行音调成分重建的各个频率区域,可以包括:若第一数量小于或等于第二频率范围内的频率区域的第二数量,则根据第二频率范围内的频率区域的分布确定第一频率范围内的频率区域的分布,即按照第二频率范围内的频率区域的划分方式确定第一频率范围内的各个频率区域;若第一数量大于第二数量,则根据第二频率范围内的频率区域的分布确定第一频率范围与第二频率范围重合部分内的频率区域的分布,并根据预设方式确定第一频率范围与第二频率范围不重合部分内的频率区域的分布,从而得到第一频率范围内的各个频率区域的分布。可以理解为,若第一数量大于第二数量,可以按照第二频率范围内的频率划分方式对第一频率范围与第二频率范围重合部分进行划分,以及按照预设方式对第一频率范围与第二频率范围的不重合部分进行划分,得到需要进行音调成分重建的第一频率范围内的各个频率区域。从而可以结合第二频率范围内的第二数量,来准确地确定需要进行音调成分重建的频率范围内的频率区域的数量。
可选地,对于第一频率范围中与第二频率范围不重合的部分内的频率区域可以满足以 下条件:该频率区域的频率上限小于或等于音频信号的最高频率,通常是该频率区域的频率上限小于或等于采样频率的一半,频率区域的宽度小于或等于预设值。
其中,应理解,可以通过配置码流获取到频带扩展的配置信息,或者,也可以从本地获取频带扩展的配置信息,通过该配置信息确定进行频带扩展的第二频率范围,以及该第二频率范围内的频率区域的分布或划分方式等,从而根据该配置信息指示的第二频率范围内的频率区域的分布确定第一频率范围内的频率区域的分布。
在进行音调成分重建时,可以以频率区域为单位进行重建,也可以以频带为单位进行重建。参阅前述步骤503中的相关描述,需要进行音调成分重建的tile的数量可以是num_tiles_detect。
下面以频率区域为单位进行音调成分的重建为例进行示例性说明。重建后得到的重建音调信号可以是时域信号,也可以是频域信号。
具体地,音调成分的信息中可以包括音调成分的位置参数、数量参数、幅度参数等,音调成分的数量参数表证了音调成分的数量。一个位置上的音调成分的重建方法,具体的可以是:
(1)计算音调成分的位置。
具体地可以是:根据音调成分的位置参数计算音调成分位置。
tone_pos=tile[p]+(sfb+0.5)*tone_res[p]
其中,tile[p]为第p个频率区域的起始频点,sfb为频率区域内存在音调成分的子带序号,tone_res[p]为第p个频率区域的频域分辨率(即第p个频率区域中的子带宽度信息)。频率区域内存在音调成分的子带序号即为音调成分的位置参数。0.5表示存在音调成份的子带中音调成分的位置位于子带的中心。当然重建的音调成分也可以位于子带的其他位置。
(2)计算音调成分的幅度。
具体地可以是:根据音调成分的幅度参数计算音调成分的幅度。
具体地,可以是:
tone_val=pow(2.0,0.25*tone_val_q[p][tone_idx]–4.0)
其中,tone_val_q[p][tone_idx]表示第p个频率区域内的第tone_idx个位置参数对应的幅度参数,tone_val表示第p个频率区域内第tone_idx个位置参数对应的频点的幅度值。
tone_idx的取值范围属于[0,tone_cnt[p]-1],tone_cnt[p]为第p个频率区域中音调成分的数量。
(3)根据音调成分的位置和音调成分的幅度进行重建,得到重建音频信号。
音调成分的位置tone_pos对应的频域信号,满足:
pSpectralData[tone_pos]=tone_val
其中,pSpectralData[tone_pos]表示音调成分的位置tone_pos对应的频域信号,tone_val表示第p个频率区域内第tone_idx个位置参数对应的频点的幅度值。tone_pos表示第p个频率区域内第tone_idx个位置参数对应的音调成分的位置。
707、根据高频带信号和重建音调信号,得到当前帧的解码信号。
其中,除了根据高频带信号和重建音调信号得到当前帧的解码信号之外,还可以结合低频带信号,得到更完整的当前帧的解码信号。
具体地,在得到重建音调信号之后,结合高频带信号进行音调成分恢复,从而得到当前帧中的高频带部分的具体细节以及音调成分,并结合低频带信号对当前帧进行恢复,从而得到包含了完整的音调成分的当前帧。
因此,本申请实施方式中,解码设备在对音调成分进行恢复时,可以结合编码设备提供的频率区域信息,对第一频率范围内的音调成分进行恢复,从而使得到的当前帧包括了更完整的音调成分,即使在高频带的频谱中往往存在与低频带的频谱不相似的音调成分的场景中,也可以使解码得到的当前帧具有更丰富的音调成分,提高解码质量,从而提高用户体验。
前述对本申请提供的音频信号编码方法和解码方法进行了详细介绍,下面基于前述提供的方法,对本申请提供的装置进行详细介绍。
首先,本申请提供一种编码设备,用于执行上述图5所示的音频信号编码方法。请参阅图8,本申请提供的一种编码设备的结构示意图,如下所述。
该编码设备可以包括:
音频获取模块801,用于获取音频信号的当前帧,当前帧包括高频带信号和低频带信号;
参数获取模块802,用于根据高频带信号、低频带信号和预设的频带扩展的配置信息得到当前帧的频带扩展的参数;
频率获取模块803,用于获取频率区域信息,频率区域信息用于指示高频带信号中需要进行音调成分检测的第一频率范围;
音调成分编码模块804,用于在第一频率范围进行音调成分检测以获取高频带信号的音调成分的信息;
码流复用模块805,用于对频带扩展的参数和音调成分的信息进行码流复用,以得到载荷码流。
在一种可能的实施方式中,编码设备还可以包括:
码流复用模块805,还用于对频率区域信息进行码流复用,得到配置码流。
在一种可能的实施方式中,频率获取模块803,具体用于根据音频信号的采样频率以及频带扩展的配置信息确定频率区域信息。
在一种可能的实施方式中,频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系,频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。
在一种可能的实施方式中,频率区域信息至少包括第一数量,频带扩展的配置信息包括频带扩展上限和/或第二数量,第二数量为第二频率范围内的频率区域的数量;
频率获取模块803,具体用于根据当前帧的编码速率、音频信号的通道数量、采样频率、 频带扩展上限或者第二数量中的一个或多个,确定第一数量。
在一种可能的实施方式中,频带扩展上限包括以下一个或多个:第二频率范围内的最高频率、最高频点序号、最高频带序号或最高频率区域序号。
在一种可能的实施方式中,音频信号的通道数量为至少一个;
频率获取模块803,具体用于:
根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,当前帧的编码速率是当前帧的编码速率;根据第一判断标识,结合第二数量,确定当前通道的第一数量;
或者,
根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第二判断标识,结合第二数量,确定当前通道的第一数量;
或者,
根据当前帧的编码速率和通道数量,确定当前帧中当前通道的第一判断标识,以及根据采样频率和频带扩展上限,确定当前帧中当前通道的第二判断标识;根据第一判断标识和第二判断标识,结合第二数量,确定当前帧中当前通道的第一数量。
在一种可能的实施方式中,频率获取模块803,具体用于:根据当前帧的编码速率和通道数量得到当前帧中的每个通道的平均编码速率;根据平均编码速率和第一阈值,得到当前通道的第一判断标识。
在一种可能的实施方式中,频率获取模块803,具体可以用于:根据当前帧的编码速率和通道数量确定当前通道的实际编码速率;根据当前通道的实际编码速率和第二阈值,得到当前通道的第一判断标识。
在一种可能的实施方式中,频率获取模块803,具体可以用于:当频带扩展上限包括最高频率时,比较频带扩展上限包括的最高频率与音频信号的最高频率是否相同,确定当前帧中当前通道的第二判断标识;或者,当频带扩展上限包括最高频带序号时,比较频带扩展上限包括的最高频带序号与音频信号的最高频带序号是否相同,确定当前帧中当前通道的第二判断标识,音频信号的最高频带序号由采样频率确定。
在一种可能的实施方式中,频率获取模块803,具体可以用于:
若第一判断标识和第二判断标识都符合预设条件,则在频带扩展对应的第二数量基础上增加一个或多个频率区域,作为当前通道的第一数量;或
若第一判断标识或第二判断标识不满足预设条件,则将频带扩展对应的第二数量作为当前通道的第一数量。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同;当频率区域信息包括第一数量小于或等于频带扩展对应的第二数量时,第一频率范围内频率区域的分布与第二频率范围内的频率区域的分布相同;当第一数量大于第二数量时,第一频率范围的频率上限大于第二频率范围的频率上限,第一频率范围与第二频率范围重合部分的频率区域的分布与第二频率范围内的频率区域的分布相同,第一频率范围与第二频率范围的不重合部分内的频率区域的分布是按照预设方式确定的。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内的频率区域满足以下条件:第一频率范围与第二频率范围的不重合部分内的频率区域的宽度小于预设值,且第一频率范围与第二频率范围的不重合部分内的频率区域的频率上限小于或等于音频信号的最高频率。
在一种可能的实施方式中,高频带信号对应的频率范围包括至少一个频率区域,其中,一个频率区域包括至少一个频带。
在一种可能的实施方式中,第一频率范围内的频率区域的数量为预设数量。
在一种可能的实施方式中,音调成分信息包括音调成分的位置数量参数、以及音调成分的幅度参数或能量参数。
在一种可能的实施方式中,音调成分信息还包括高频带信号的噪声基底参数。
其次,本申请提供一种解码设备,用于执行上述图7所示的解码方法。参阅图9,本申请提供的一种解码设备的结构示意图,如下所述。
该解码设备可以包括:
获取模块901,用于获取载荷码流;
解复用模块902,用于对载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息;
频带扩展解码模块903,用于根据频带扩展的参数得到当前帧的高频带信号;
重建模块904,用于根据音调成分的信息和频率区域信息进行重建,得到重建音调信号,频率区域信息用于指示当前帧中需要进行音调成分重建的第一频率范围;
信号解码模块905,用于根据高频带信号和重建音调信号,得到当前帧的解码信号。
在一种可能的实施方式中,获取模块901,还可以用于:获取配置码流;根据配置码流获取频率区域信息。
在一种可能的实施方式中,频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,第一数量为第一频率范围内的频率区域的数量,标识信息用于指示第一频率范围与频带扩展对应的第二频率范围是否相同,关系信息用于在第一频率范围与第二频率范围不相同时,指示第一频率范围与第二频率范围之间的大小关系,频率区域变化数量为在第一频率范围与第二频率范围不相同时,第一频率范围与第二频率范围之间存在差异的频率区域的数量。
在一种可能的实施方式中,重建模块904,具体可以用于:根据频率区域信息,确定需要进行音调成分重建的频率区域的数量为第一数量;根据第一数量,确定第一频率范围进行音调成分重建的各个频率区域;在第一频率范围内,根据音调成分的信息对音调成分进行重建,得到重建音调信号。
在一种可能的实施方式中,第一频率范围的下限与配置信息指示的进行频带扩展的第二频率范围的下限相同,获取模块,具体可以用于:若第一数量小于或等于第二数量,则根据第二频率范围内的频率区域的分布确定第一频率范围内的各个频率区域的分布,第二数量为第二频率范围内的频率区域的数量;若第一数量大于第二数量,则确定第一频率范围的频率上限大于第二频率范围的频率上限,根据第二频率范围内的频率区域的分布确定 第一频率范围与第二频率范围重合部分内的频率区域的分布,以及根据预设方式确定第一频率范围与第二频率范围的不重合部分内的频率区域的分布,得到第一频率范围内的各个频率区域。
在一种可能的实施方式中,第一频率范围与第二频率范围的不重合部分内的频率区域满足以下条件:第一频率范围与第二频率范围的不重合部分内划分的频率区域的宽度小于预设值,且第一频率范围与第二频率范围的不重合部分内划分的频率区域的频率上限小于或等于音频信号的最高频率。
在一种可能的实施方式中,音调成分信息包括音调成分的位置数量参数、以及音调成分的幅度参数或能量参数。
在一种可能的实施方式中,音调成分信息还包括高频带信号的噪声基底参数。
参阅图10,本申请提供的另一种编码设备的结构示意图。该编码设备1000可以包括处理器1001、存储器1002和收发器1003。该处理器1001、存储器1002和收发器1003通过线路互联。其中,存储器1002中存储有程序指令和数据。
存储器1002中存储了前述图5对应的实施方式中,由编码设备执行的步骤对应的程序指令以及数据。
处理器1001,用于执行前述图5中任一实施例所示的由编码设备执行的步骤,例如,可以执行前述图5中的步骤501-505等。
收发器1003可以用于进行数据的接收和发送,例如,可以用于执行前述图5中的步骤506。
一种实现方式中,编码设备1000可以包括相对于图10更多或更少的部件,本申请对此仅仅是示例性说明,并不作限定。
参阅图11,本申请提供的另一种解码设备的结构示意图。该解码设备1100可以包括处理器1101、存储器1102和收发器1103。该处理器1101、存储器1102和收发器1103通过线路互联。其中,存储器1102中存储有程序指令和数据。
存储器1102中存储了前述图7对应的实施方式中,由解码设备执行的步骤对应的程序指令以及数据。
处理器1101,用于执行前述图7中任一实施例所示的由解码设备执行的步骤,例如,可以执行前述图7中的步骤702、703、705-707等。
收发器1103可以用于进行数据的接收和发送,例如,可以用于执行前述图7中的步骤701或704。
一种实现方式中,解码设备1100可以包括相对于图11更多或更少的部件,本申请对此仅仅是示例性说明,并不作限定。
本申请还提供了一种通信系统,该通信系统可以包括:编码设备以及解码设备。
该编码设备可以前述图8或图10所示的编码设备,可以用于执行前述图5中所示的任一实施方式中由编码设备执行的步骤。
该解码设备可以前述图9或图11所示的解码设备,可以用于执行前述图7中所示的任一实施方式中由解码设备执行的步骤。
本申请提供了一种网络设备,该网络设备可以应用于编码设备或者解码设备等设备中,网络设备与存储器耦合,用于读取并执行所述存储器中存储的指令,使得所述网络设备实现前述图5-7中任一实施方式中由编码设备或者解码设备行的方法的步骤。在一种可能的设计中,该网络设备为芯片或片上系统。
本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持编码设备或者解码设备实现上述方面中所涉及的功能,例如,例如发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
在另一种可能的设计中,当该芯片系统为编码设备或者解码设备等内的芯片时,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该编码设备或者解码设备等内的芯片执行上述图5-7中任一项实施例中编码设备或者解码设备执行的方法的步骤。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述OLT或ONU等内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
本申请实施例还提供了一种处理器,用于与存储器耦合,用于执行上述各实施例中任一实施例中涉及编码设备或者解码设备的方法和功能。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一方法实施例中与编码设备或者解码设备相关的方法流程。对应的,该计算机可以为上述编码设备或者解码设备。
应理解,本申请以上实施例中的芯片系统、编码设备或者解码设备等中提及的处理器,或者本申请上述实施例提供的处理器,可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请中以上实施例中的芯片系统、编码设备或者解码设备等中的处理器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。本申请实施例中的存储器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。
还应理解,本申请实施例中以上实施例中的芯片系统、编码设备或者解码设备等中提及的存储器或可读存储介质等,可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易 失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
还需要说明的是,当编码设备或者解码设备包括处理器(或处理单元)与存储器时,本申请中的处理器可以是与存储器集成在一起的,也可以是处理器与存储器通过接口连接,可以根据实际应用场景调整,并不作限定。
本申请实施例还提供了一种计算机程序或包括计算机程序的一种计算机程序产品,该计算机程序在某一计算机上执行时,将会使所述计算机实现上述任一方法实施例中编码设备或者解码设备执行的方法流程。对应的,该计算机可以为上述的编码设备或者解码设备。
在上述图5-7中各个实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者其他网络设备等)执行本申请图5-7中各个实施例所述方法的全部或部分步骤。而该存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请各实施例中提供的消息/帧/信息、模块或单元等的名称仅为示例,可以使用其他名称,只要消息/帧/信息、模块或单元等的作用相同即可。
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本申请实施例中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。
取决于语境,如在此所使用的词语“如果”或“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (49)

  1. 一种音频信号编码方法,其特征在于,包括:
    获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;
    根据所述高频带信号、所述低频带信号和预设的频带扩展的配置信息得到所述当前帧的频带扩展的参数;
    获取频率区域信息,所述频率区域信息用于指示所述高频带信号中需要进行音调成分检测的第一频率范围;
    在所述第一频率范围进行音调成分检测以获取所述高频带信号的音调成分的信息;
    对所述频带扩展的参数和所述音调成分的信息进行码流复用,以得到载荷码流。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述频率区域信息进行码流复用,得到配置码流。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取频率区域信息,包括:
    根据所述音频信号的采样频率以及所述频带扩展的配置信息确定所述频率区域信息。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,所述第一数量为所述第一频率范围内的频率区域的数量,所述标识信息用于指示所述第一频率范围与所述配置信息指示的频带扩展对应的第二频率范围是否相同,所述关系信息用于在所述第一频率范围与所述第二频率范围不相同时,指示所述第一频率范围与所述第二频率范围之间的大小关系,所述频率区域变化数量为在所述第一频率范围与所述第二频率范围不相同时,所述第一频率范围与所述第二频率范围之间存在差异的频率区域的数量。
  5. 根据权利要求4所述的方法,其特征在于,所述频率区域信息至少包括所述第一数量,所述频带扩展的配置信息包括频带扩展上限和/或第二数量,所述第二数量为所述第二频率范围内的频率区域的数量;
    所述方法还包括:
    根据所述当前帧的编码速率、所述音频信号的通道数量、所述音频信号的采样频率、所述频带扩展上限或者所述第二数量中的一个或多个,确定所述第一数量。
  6. 根据权利要求5所述的方法,其特征在于,所述频带扩展上限包括以下一个或多个:所述第二频率范围内的最高频率、最高频点序号、最高频带序号或最高频率区域序号。
  7. 根据权利要求5或6所述的方法,其特征在于,所述音频信号的通道数量为至少一个;
    所述根据当前帧的编码速率、所述音频信号的通道数量、所述采样频率、所述频带扩展上限或者所述第二数量中的一个或多个,确定所述第一数量,包括:
    根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识;根据所述第一判断标识,结合所述第二数量,确定所述当前通道的第一数量;
    或者,
    根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识;根据所述第二判断标识,结合所述第二数量,确定所述当前通道的第一数量;
    或者,
    根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识,以及根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识;根据所述第一判断标识和所述第二判断标识,结合所述第二数量,确定所述当前帧中当前通道的第一数量。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识,包括:
    根据所述当前帧的编码速率和所述通道数量得到所述当前帧中的每个通道的平均编码速率;
    根据所述平均编码速率和第一阈值,得到所述当前通道的所述第一判断标识。
  9. 根据权利要求7所述的方法,其特征在于,所述根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识,包括:
    根据所述当前帧的编码速率和所述通道数量确定所述当前通道的实际编码速率;
    根据所述当前通道的实际编码速率和第二阈值,得到所述当前通道的所述第一判断标识。
  10. 根据权利要求6-8中任一项所述的方法,其特征在于,
    当所述频带扩展上限包括所述最高频率时,所述根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识,包括:
    比较所述频带扩展上限包括的最高频率与所述音频信号的最高频率是否相同,确定所述当前帧中当前通道的第二判断标识;或者,
    当所述频带扩展上限包括所述最高频带序号时,所述根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识,包括:
    比较所述频带扩展上限包括的最高频带序号与所述音频信号的最高频带序号是否相同,确定所述当前帧中当前通道的第二判断标识,所述音频信号的最高频带序号由所述采样频率确定。
  11. 根据权利要求7-10中任一项所述的方法,其特征在于,所述结合所述第二数量,确定所述当前帧中当前通道的第一数量,包括:
    若所述第一判断标识和所述第二判断标识都符合预设条件,则在所述第二频率范围内的所述第二数量基础上增加一个或多个频率区域,作为所述当前通道的第一数量;或,
    若所述第一判断标识或所述第二判断标识不满足所述预设条件,则将所述频带扩展对应的所述第二数量作为所述当前通道的第一数量。
  12. 根据权利要求4-11中任一项所述的方法,其特征在于,所述第一频率范围的下限与所述配置信息指示的进行频带扩展的第二频率范围的下限相同;
    当所述第一数量小于或等于所述第二频率范围内的频率区域的第二数量时,所述第一频率范围内频率区域的分布与所述第二频率范围内的频率区域的分布相同;
    当所述第一数量大于所述第二数量时,所述第一频率范围的频率上限大于所述第二频率范围的频率上限,所述第一频率范围与所述第二频率范围重合部分的频率区域的分布与 所述第二频率范围内的频率区域的分布相同,所述第一频率范围中与所述第二频率范围的不重合部分内的频率区域的分布是按照预设方式确定的。
  13. 根据权利要求12所述的方法,其特征在于,所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的宽度小于或等于预设值,且所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的频率上限小于或等于所述音频信号的最高频率。
  14. 根据权利要求1-4中任一项所述的方法,其特征在于,所述第一频率范围内的频率区域的数量为预设数量。
  15. 一种解码方法,其特征在于,包括:
    获取载荷码流;
    对所述载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息;
    根据所述频带扩展的参数得到所述当前帧的高频带信号;
    根据所述音调成分的信息和频率区域信息进行重建,得到重建音调信号,所述频率区域信息用于指示所述当前帧中需要进行音调成分重建的第一频率范围;
    根据所述高频带信号和所述重建音调信号,得到所述当前帧的解码信号。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    获取配置码流;
    根据所述配置码流获取所述频率区域信息。
  17. 根据权利要求15或16所述的方法,其特征在于,所述频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,所述第一数量为所述第一频率范围内的频率区域的数量,所述标识信息用于指示所述第一频率范围与所述频带扩展对应的第二频率范围是否相同,所述关系信息用于在所述第一频率范围与所述第二频率范围不相同时,指示所述第一频率范围与所述第二频率范围之间的大小关系,所述频率区域变化数量为在所述第一频率范围与所述第二频率范围不相同时,所述第一频率范围与所述第二频率范围之间存在差异的频率区域的数量。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述音调成分的信息和频率区域信息进行重建,得到重建音调信号,包括:
    根据所述频率区域信息,确定需要进行音调成分重建的频率区域的数量为所述第一数量;
    根据所述第一数量,确定所述第一频率范围内进行音调成分重建的各个频率区域;
    在所述第一频率范围内,根据所述音调成分的信息对音调成分进行重建,得到所述重建音调信号。
  19. 根据权利要求18所述的方法,其特征在于,所述第一频率范围的下限与所述配置信息指示的进行频带扩展的第二频率范围的下限相同,所述根据所述第一数量,确定所述第一频率范围内进行音调成分重建的各个频率区域,包括:
    若所述第一数量小于或等于第二数量,则根据所述第二频率范围内的频率区域的分布确定所述第一频率范围内的频率区域的分布,所述第二数量为所述第二频率范围内的频率 区域的数量;
    若所述第一数量大于所述第二数量,则确定所述第一频率范围的频率上限大于所述第二频率范围的频率上限,根据所述第二频率范围内的频率区域的分布确定所述第一频率范围与所述第二频率范围重合部分内的频率区域的分布,以及根据预设方式确定所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的分布,得到所述第一频率范围内的频率区域的分布。
  20. 根据权利要求19所述的方法,其特征在于,所述第一频率范围与所述第二频率范围的不重合部分内划分的频率区域的宽度小于或等于预设值,且所述第一频率范围与所述第二频率范围的不重合部分内划分的频率区域的频率上限小于或等于所述音频信号的最高频率。
  21. 一种编码设备,其特征在于,包括:
    音频获取模块,用于获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;
    参数获取模块,用于根据所述高频带信号、所述低频带信号和预设的频带扩展的配置信息得到所述当前帧的频带扩展的参数;
    频率获取模块,用于获取频率区域信息,所述频率区域信息用于指示所述高频带信号中需要进行音调成分检测的第一频率范围;
    音调成分编码模块,用于在所述第一频率范围进行音调成分检测以获取所述高频带信号的音调成分的信息;
    码流复用模块,用于对所述频带扩展的参数和所述音调成分的信息进行码流复用,以得到载荷码流。
  22. 根据权利要求21所述的编码设备,其特征在于,所述编码设备还包括:
    所述码流复用模块,还用于对所述频率区域信息进行码流复用,得到配置码流。
  23. 根据权利要求21或22所述的编码设备,其特征在于,
    所述频率获取模块,具体用于根据所述音频信号的采样频率以及所述频带扩展的配置信息确定所述频率区域信息。
  24. 根据权利要求21-23中任一项所述的编码设备,其特征在于,所述频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,所述第一数量为所述第一频率范围内的频率区域的数量,所述标识信息用于指示所述第一频率范围与所述频带扩展对应的第二频率范围是否相同,所述关系信息用于在所述第一频率范围与所述第二频率范围不相同时,指示所述第一频率范围与所述第二频率范围之间的大小关系,所述频率区域变化数量为在所述第一频率范围与所述第二频率范围不相同时,所述第一频率范围与所述第二频率范围之间存在差异的频率区域的数量。
  25. 根据权利要求24所述的编码设备,其特征在于,所述频率区域信息至少包括所述第一数量,所述频带扩展的配置信息包括频带扩展上限和/或第二数量,所述第二数量为所述第二频率范围内的频率区域的数量;
    所述频率获取模块,具体用于根据当前帧的编码速率、所述音频信号的通道数量、所 述音频信号的采样频率、所述频带扩展上限或者所述第二数量中的一个或多个,确定所述第一数量。
  26. 根据权利要求25所述的编码设备,其特征在于,所述频带扩展上限包括以下一个或多个:所述第二频率范围内的最高频率、最高频点序号、最高频带序号或最高频率区域序号。
  27. 根据权利要求25或26所述的编码设备,其特征在于,所述音频信号的通道数量为至少一个;
    所述频率获取模块,具体用于:
    根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识,所述当前帧的编码速率是所述当前帧的编码速率;根据所述第一判断标识,结合所述第二数量,确定所述当前通道的第一数量;
    或者,
    根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识;根据所述第二判断标识,结合所述第二数量,确定所述当前通道的第一数量;
    或者,
    根据所述当前帧的编码速率和所述通道数量,确定所述当前帧中当前通道的第一判断标识,以及根据所述采样频率和所述频带扩展上限,确定所述当前帧中当前通道的第二判断标识;根据所述第一判断标识和所述第二判断标识,结合所述第二数量,确定所述当前帧中当前通道的第一数量。
  28. 根据权利要求27所述的编码设备,其特征在于,所述频率获取模块,具体用于:
    根据所述当前帧的编码速率和所述通道数量得到所述当前帧中的每个通道的平均编码速率;
    根据所述平均编码速率和第一阈值,得到所述当前通道的所述第一判断标识。
  29. 根据权利要求27所述的编码设备,其特征在于,所述频率获取模块,具体用于:
    根据所述当前帧的编码速率和所述通道数量确定所述当前通道的实际编码速率;
    根据所述当前通道的实际编码速率和第二阈值,得到所述当前通道的所述第一判断标识。
  30. 根据权利要求26-29中任一项所述的编码设备,其特征在于,所述频率获取模块,具体用于:
    当所述频带扩展上限包括所述最高频率时,比较所述频带扩展上限包括的最高频率与所述音频信号的最高频率是否相同,确定所述当前帧中当前通道的第二判断标识;或者,
    当所述频带扩展上限包括所述最高频带序号时,比较所述频带扩展上限包括的最高频带序号与所述音频信号的最高频带序号是否相同,确定所述当前帧中当前通道的第二判断标识,所述音频信号的最高频带序号由所述采样频率确定。
  31. 根据权利要求27-30中任一项所述的编码设备,其特征在于,所述频率获取模块,具体用于:
    若所述第一判断标识和所述第二判断标识都符合预设条件,则在所述频带扩展对应的 所述第二数量基础上增加一个或多个频率区域,作为所述当前通道的第一数量;或,
    若所述第一判断标识或所述第二判断标识不满足所述预设条件,则将所述频带扩展对应的所述第二数量作为所述当前通道的第一数量。
  32. 根据权利要求21-31中任一项所述的编码设备,其特征在于,所述第一频率范围的下限与所述配置信息指示的进行频带扩展的第二频率范围的下限相同;
    当所述频率区域信息包括第一数量小于或等于所述频带扩展对应的第二数量时,所述第一频率范围内频率区域的分布与所述第二频率范围内的频率区域的分布相同;
    当所述第一数量大于所述第二数量时,所述第一频率范围的频率上限大于所述第二频率范围的频率上限,所述第一频率范围与所述第二频率范围重合部分的频率区域的分布与所述第二频率范围内的频率区域的分布相同,所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的分布为根据预设方式确定的。
  33. 根据权利要求32所述的编码设备,其特征在于,所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的宽度小于预设值,且所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的频率上限小于或等于所述音频信号的最高频率。
  34. 根据权利要求21-33中任一项所述的编码设备,其特征在于,所述高频带信号对应的频率范围包括至少一个频率区域,其中,一个频率区域包括至少一个频带。
  35. 根据权利要求21-24中任一项所述的编码设备,其特征在于,所述第一频率范围内的频率区域的数量为预设数量。
  36. 一种解码设备,其特征在于,包括:
    获取模块,用于获取载荷码流;
    解复用模块,用于对所述载荷码流进行码流解复用,以得到音频信号的当前帧的频带扩展的参数和音调成分的信息;
    频带扩展解码模块,用于根据所述频带扩展的参数得到所述当前帧的高频带信号;
    重建模块,用于根据所述音调成分的信息和频率区域信息进行重建,得到重建音调信号,所述频率区域信息用于指示所述当前帧中需要进行音调成分重建的第一频率范围;
    信号解码模块,用于根据所述高频带信号和所述重建音调信号,得到所述当前帧的解码信号。
  37. 根据权利要求36所述的解码设备,其特征在于,所述获取模块,还用于:
    获取配置码流;
    根据所述配置码流获取所述频率区域信息。
  38. 根据权利要求36或37所述的解码设备,其特征在于,所述频率区域信息包括以下至少一种:第一数量、标识信息、关系信息或者频率区域变化数量,所述第一数量为所述第一频率范围内的频率区域的数量,所述标识信息用于指示所述第一频率范围与所述频带扩展对应的第二频率范围是否相同,所述关系信息用于在所述第一频率范围与所述第二频率范围不相同时,指示所述第一频率范围与所述第二频率范围之间的大小关系,所述频率区域变化数量为在所述第一频率范围与所述第二频率范围不相同时,所述第一频率范围与所述第二频率范围之间存在差异的频率区域的数量。
  39. 根据权利要求38所述的解码设备,其特征在于,所述重建模块,具体用于:
    根据所述频率区域信息,确定需要进行音调成分重建的频率区域的数量为所述第一数量;
    根据所述第一数量,确定所述第一频率范围进行音调成分重建的各个频率区域;
    在所述第一频率范围内,根据所述音调成分的信息对音调成分进行重建,得到所述重建音调信号。
  40. 根据权利要求39所述的解码设备,其特征在于,所述第一频率范围的下限与所述配置信息指示的进行频带扩展的第二频率范围的下限相同,所述获取模块,具体用于:
    若所述第一数量小于或等于第二数量,则根据所述第二频率范围内的频率区域的分布确定所述第一频率范围内的各个频率区域的分布,所述第二数量为所述第二频率范围内的频率区域的数量;
    若所述第一数量大于所述第二数量,则确定所述第一频率范围的频率上限大于所述第二频率范围的频率上限,根据所述第二频率范围内的频率区域的分布确定所述第一频率范围与所述第二频率范围重合部分内的频率区域的分布,以及根据预设方式确定所述第一频率范围与所述第二频率范围的不重合部分内的频率区域的分布,得到所述第一频率范围内的各个频率区域的分布。
  41. 根据权利要求40所述的解码设备,其特征在于,所述第一频率范围与所述第二频率范围的不重合部分内划分的频率区域的宽度小于预设值,且所述第一频率范围与所述第二频率范围的不重合部分内划分的频率区域的频率上限小于或等于所述音频信号的最高频率。
  42. 一种编码设备,其特征在于,包括:包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至14中任一项所述的方法。
  43. 一种解码设备,其特征在于,包括:包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求15至20中任一项所述的方法。
  44. 一种通信系统,其特征在于,包括:编码设备和解码设备;
    所述编码设备为如权利要求21-35中任一项所述的编码设备;
    所述解码设备为如权利要求36-41中任一项所述的解码设备。
  45. 一种计算机可读存储介质,包括程序,当所述程序在计算机上运行时,使得所述计算机执行如权利要求1-14或15-20中任一项所述的方法。
  46. 一种网络设备,包括处理器和存储器,其特征在于,所述处理器与存储器耦合,用于读取并执行所述存储器中存储的指令,实现如权利要求1-14或15-20中任一项的步骤。
  47. 如权利要求46所述的网络设备,其特征在于,所述网络设备为芯片或片上系统。
  48. 一种计算机可读存储介质,其特征在于,存储有根据如权利要求1-14任一项所述的方法生成的载荷码流。
  49. 一种存储于计算机可读存储介质上的计算机程序,所述计算机程序包括指令,当 所述指令被执行时,实现如权利要求1-14或15-20中任一项所述的方法。
PCT/CN2021/085920 2020-04-15 2021-04-08 音频信号编码方法、解码方法、编码设备以及解码设备 WO2021208792A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
BR112022020773A BR112022020773A2 (pt) 2020-04-15 2021-04-08 Método de codificação de sinal de áudio, método de decodificação, dispositivo de codificação, e dispositivo de decodificação
EP21788941.9A EP4131261A4 (en) 2020-04-15 2021-04-08 AUDIO SIGNAL ENCODING METHOD, DECODING METHOD, ENCODING DEVICE AND DECODING DEVICE
MX2022012891A MX2022012891A (es) 2020-04-15 2021-04-08 Método de codificación de señal de audio, método de decodificación, dispositivo de codificación y dispositivo de decodificación.
KR1020227039651A KR20230002697A (ko) 2020-04-15 2021-04-08 오디오 신호 인코딩 방법, 디코딩 방법, 인코딩 기기 및 디코딩 기기
US17/965,979 US20230048893A1 (en) 2020-04-15 2022-10-14 Audio Signal Encoding Method, Decoding Method, Encoding Device, and Decoding Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010297340.0A CN113593586A (zh) 2020-04-15 2020-04-15 音频信号编码方法、解码方法、编码设备以及解码设备
CN202010297340.0 2020-04-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/965,979 Continuation US20230048893A1 (en) 2020-04-15 2022-10-14 Audio Signal Encoding Method, Decoding Method, Encoding Device, and Decoding Device

Publications (1)

Publication Number Publication Date
WO2021208792A1 true WO2021208792A1 (zh) 2021-10-21

Family

ID=78083913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085920 WO2021208792A1 (zh) 2020-04-15 2021-04-08 音频信号编码方法、解码方法、编码设备以及解码设备

Country Status (7)

Country Link
US (1) US20230048893A1 (zh)
EP (1) EP4131261A4 (zh)
KR (1) KR20230002697A (zh)
CN (1) CN113593586A (zh)
BR (1) BR112022020773A2 (zh)
MX (1) MX2022012891A (zh)
WO (1) WO2021208792A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192517B (zh) 2020-01-13 2024-04-26 华为技术有限公司 一种音频编解码方法和音频编解码设备
CN114550732B (zh) * 2022-04-15 2022-07-08 腾讯科技(深圳)有限公司 一种高频音频信号的编解码方法和相关装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1950686A (zh) * 2004-05-14 2007-04-18 松下电器产业株式会社 编码装置、解码装置以及编码/解码方法
CN101164104A (zh) * 2005-04-20 2008-04-16 Qnx软件操作系统(威美科)有限公司 用于改善语音质量和可懂度的系统
CN101903944A (zh) * 2007-12-18 2010-12-01 Lg电子株式会社 用于处理音频信号的方法和装置
CN104584124A (zh) * 2013-01-22 2015-04-29 松下电器产业株式会社 带宽扩展参数生成装置、编码装置、解码装置、带宽扩展参数生成方法、编码方法、以及解码方法
CN105280190A (zh) * 2015-09-16 2016-01-27 深圳广晟信源技术有限公司 带宽扩展编码和解码方法以及装置
CN105453175A (zh) * 2013-07-22 2016-03-30 弗劳恩霍夫应用研究促进协会 用于对编码音频信号进行解码的设备、方法及计算机程序
CN106463143A (zh) * 2014-03-03 2017-02-22 三星电子株式会社 用于带宽扩展的高频解码的方法及设备
US10224048B2 (en) * 2016-12-27 2019-03-05 Fujitsu Limited Audio coding device and audio coding method
EP3576088A1 (en) * 2018-05-30 2019-12-04 Fraunhofer Gesellschaft zur Förderung der Angewand Audio similarity evaluator, audio encoder, methods and computer program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101355376B1 (ko) * 2007-04-30 2014-01-23 삼성전자주식회사 고주파수 영역 부호화 및 복호화 방법 및 장치
CN101662288B (zh) * 2008-08-28 2012-07-04 华为技术有限公司 音频编码、解码方法及装置、系统
EP2273493B1 (en) * 2009-06-29 2012-12-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Bandwidth extension encoding and decoding

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1950686A (zh) * 2004-05-14 2007-04-18 松下电器产业株式会社 编码装置、解码装置以及编码/解码方法
CN101164104A (zh) * 2005-04-20 2008-04-16 Qnx软件操作系统(威美科)有限公司 用于改善语音质量和可懂度的系统
CN101903944A (zh) * 2007-12-18 2010-12-01 Lg电子株式会社 用于处理音频信号的方法和装置
CN104584124A (zh) * 2013-01-22 2015-04-29 松下电器产业株式会社 带宽扩展参数生成装置、编码装置、解码装置、带宽扩展参数生成方法、编码方法、以及解码方法
CN105453175A (zh) * 2013-07-22 2016-03-30 弗劳恩霍夫应用研究促进协会 用于对编码音频信号进行解码的设备、方法及计算机程序
CN106463143A (zh) * 2014-03-03 2017-02-22 三星电子株式会社 用于带宽扩展的高频解码的方法及设备
CN105280190A (zh) * 2015-09-16 2016-01-27 深圳广晟信源技术有限公司 带宽扩展编码和解码方法以及装置
US10224048B2 (en) * 2016-12-27 2019-03-05 Fujitsu Limited Audio coding device and audio coding method
EP3576088A1 (en) * 2018-05-30 2019-12-04 Fraunhofer Gesellschaft zur Förderung der Angewand Audio similarity evaluator, audio encoder, methods and computer program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KOK SENG CHONG, TANAKA N., NOMURA T., SHIMADA O., KIM HANN KUAH, TSUSHIMA M., TAKAMIZAWA Y., SUA HONG NEO, NORIMATSU T., SERIZAWA : "Low power spectral band replication technology for the MPEG-4 audio standard", INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, 2003 AND FOURTH PAC IFIC RIM CONFERENCE ON MULTIMEDIA. PROCEEDINGS OF THE 2003 JOINT CONFE RENCE OF THE FOURTH INTERNATIONAL CONFERENCE ON SINGAPORE 15-18 DEC. 2003, PISCATAWAY, NJ, USA,IEEE, vol. 3, 15 December 2003 (2003-12-15) - 18 December 2003 (2003-12-18), pages 1408 - 1412, XP010701164, ISBN: 978-0-7803-8185-8 *
See also references of EP4131261A4

Also Published As

Publication number Publication date
US20230048893A1 (en) 2023-02-16
CN113593586A (zh) 2021-11-02
MX2022012891A (es) 2023-01-11
BR112022020773A2 (pt) 2022-11-29
KR20230002697A (ko) 2023-01-05
EP4131261A4 (en) 2023-05-03
EP4131261A1 (en) 2023-02-08

Similar Documents

Publication Publication Date Title
US10885921B2 (en) Multi-stream audio coding
US20230048893A1 (en) Audio Signal Encoding Method, Decoding Method, Encoding Device, and Decoding Device
US11854560B2 (en) Audio scene encoder, audio scene decoder and related methods using hybrid encoder-decoder spatial analysis
WO2019170955A1 (en) Audio coding
US20230402053A1 (en) Combining of spatial audio parameters
WO2019228423A1 (zh) 立体声信号的编码方法和装置
WO2021130404A1 (en) The merging of spatial audio parameters
JP2024059711A (ja) チャネル間位相差パラメータ符号化方法および装置
US11900952B2 (en) Time-domain stereo encoding and decoding method and related product
JP2022163058A (ja) ステレオ信号符号化方法およびステレオ信号符号化装置
WO2021244418A1 (zh) 一种音频编码方法和音频编码装置
WO2021213128A1 (zh) 音频信号编码方法和装置
JP7159351B2 (ja) ダウンミックスされた信号の計算方法及び装置
US20220335962A1 (en) Audio encoding method and device and audio decoding method and device
US20240153512A1 (en) Audio codec with adaptive gain control of downmixed signals
WO2021244417A1 (zh) 一种音频编码方法和音频编码装置
GB2598932A (en) Spatial audio parameter encoding and associated decoding
US20230154473A1 (en) Audio coding method and related apparatus, and computer-readable storage medium
CN116762127A (zh) 量化空间音频参数
WO2024021732A1 (zh) 音频编解码方法、装置、存储介质及计算机程序产品
US20230154472A1 (en) Multi-channel audio signal encoding method and apparatus
WO2023173941A1 (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
KR20230088409A (ko) 오디오 코덱에 있어서 오디오 대역폭 검출 및 오디오 대역폭 스위칭을 위한 방법 및 디바이스
WO2023179846A1 (en) Parametric spatial audio encoding
WO2024110562A1 (en) Adaptive encoding of transient audio signals

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21788941

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022020773

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2021788941

Country of ref document: EP

Effective date: 20221024

ENP Entry into the national phase

Ref document number: 20227039651

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112022020773

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20221013