WO2021143692A1 - 一种音频编解码方法和音频编解码设备 - Google Patents

一种音频编解码方法和音频编解码设备 Download PDF

Info

Publication number
WO2021143692A1
WO2021143692A1 PCT/CN2021/071328 CN2021071328W WO2021143692A1 WO 2021143692 A1 WO2021143692 A1 WO 2021143692A1 CN 2021071328 W CN2021071328 W CN 2021071328W WO 2021143692 A1 WO2021143692 A1 WO 2021143692A1
Authority
WO
WIPO (PCT)
Prior art keywords
band signal
current
signal
current frame
parameter
Prior art date
Application number
PCT/CN2021/071328
Other languages
English (en)
French (fr)
Inventor
夏丙寅
李佳蔚
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21741759.1A priority Critical patent/EP4084001A4/en
Priority to KR1020227026854A priority patent/KR20220123108A/ko
Priority to JP2022542749A priority patent/JP7443534B2/ja
Publication of WO2021143692A1 publication Critical patent/WO2021143692A1/zh
Priority to US17/864,116 priority patent/US20220358941A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques

Definitions

  • This application relates to the technical field of audio signal coding and decoding, and in particular to an audio coding and decoding method and audio coding and decoding equipment.
  • the embodiments of the present application provide an audio coding and decoding method and an audio coding and decoding device, which can improve the quality of decoded audio signals.
  • a first aspect of the present invention provides an audio encoding method, the method includes: acquiring a current frame of an audio signal, the current frame including a high-band signal and a low-band signal; according to the high-band signal and the The low-band signal obtains the first coding parameter; the second coding parameter of the current frame is obtained according to the high-band signal, the second coding parameter includes pitch component information; the first coding parameter and the second coding parameter
  • the coding parameters are coded stream multiplexed to obtain the coded code stream.
  • the obtaining the second coding parameter of the current frame according to the high-band signal includes: detecting whether the high-band signal includes a tonal component; The frequency band signal includes a tone component, and the second encoding parameter of the current frame is obtained according to the high frequency band signal.
  • the pitch component information includes at least one of the following: quantity information of pitch components, position information of pitch components, amplitude information of pitch components, or pitch Energy information of the ingredients.
  • the second encoding parameter further includes a noise floor parameter.
  • the noise floor parameter is used to indicate the noise floor energy.
  • a second aspect of the present invention provides an audio decoding method, the method comprising: obtaining a coded code stream; demultiplexing the coded code stream to obtain the first coding parameter and the first coding parameter of the current frame of the audio signal.
  • the second encoding parameter of the current frame, the second encoding parameter of the current frame includes pitch component information; the first high-frequency band signal of the current frame and the first high-frequency signal of the current frame are obtained according to the first encoding parameter A low-band signal; obtain the second high-band signal of the current frame according to the second encoding parameter, the second high-band signal including a reconstructed tone signal; according to the second high-band signal of the current frame And the first high-frequency band signal of the current frame obtains the fused high-frequency band signal of the current frame.
  • the first high-band signal includes: a decoded high-band signal obtained by direct decoding according to the first encoding parameter, and a frequency band based on the first low-band signal At least one of the expanded high-band signals obtained by the expansion.
  • the second high-frequency band signal if the first high-frequency band signal includes the extended high-frequency band signal, the second high-frequency band signal according to the current frame Obtaining the fused high-band signal of the current frame with signal and the first high-band signal of the current frame includes: if the value of the reconstructed tone signal spectrum at the current frequency point of the current sub-band of the current frame satisfies the preset Assuming conditions, the fused high-band signal at the current frequency point is obtained according to the spectrum of the extended high-band signal at the current frequency point and the noise floor information of the current subband; or if the current frame of the current frame The value of the reconstructed tone signal spectrum at the current frequency point of the subband does not meet the preset condition, and the fused high-band signal at the current frequency point is obtained according to the reconstructed tone signal spectrum at the current frequency point.
  • the noise floor information includes a noise floor gain parameter.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, and the extended high-band signal of the current subband is The energy of the frequency spectrum and the noise floor energy of the current subband are obtained.
  • the The second high-band signal of the current frame and the first high-band signal of the current frame to obtain the fused high-frequency signal of the current frame includes: if the current frequency point of the current subband of the current frame is The value of the reconstructed tone signal spectrum does not meet the preset condition, and the fused high-band signal at the current frequency point is obtained according to the reconstructed tone signal spectrum at the current frequency point; or if the current subband of the current frame is current The value of the reconstructed tone signal spectrum at the frequency point satisfies a preset condition, according to the spectrum of the extended high-band signal at the current frequency point, the spectrum of the decoded high-band signal at the current frequency point, and the current The noise floor information of the subband obtains the fused high-frequency band signal at the current frequency point.
  • the noise floor information includes a noise floor gain parameter.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, the noise floor energy of the current subband, and The energy of the spectrum of the extended high-band signal of the current subband and the energy of the spectrum of the decoded high-band signal of the current subband are obtained.
  • the method further The method includes: selecting at least one signal from the decoded high-band signal, the extended high-band signal, and the reconstructed tone signal to obtain the fused high-band signal of the current frame according to preset instruction information or instruction information obtained by decoding .
  • the second encoding parameter further includes a noise floor parameter for indicating the energy of the noise floor.
  • the preset condition includes: the value of the reconstructed tone signal spectrum is 0 or less than a preset threshold.
  • a third aspect of the present invention provides an audio encoder, including: a signal acquisition unit for acquiring a current frame of an audio signal, the current frame including a high-band signal and a low-band signal; a parameter acquisition unit, according to the The high frequency band signal and the low frequency band signal obtain a first coding parameter; the second coding parameter of the current frame is obtained according to the high frequency band signal, and the second coding parameter includes pitch component information; an encoding unit is used for Perform code stream multiplexing on the first coding parameter and the second coding parameter to obtain a coded code stream.
  • the parameter acquisition unit is specifically further configured to: detect whether the high-band signal includes a tonal component; if the high-band signal includes a tonal component , Obtaining the second encoding parameter of the current frame according to the high frequency band signal.
  • the pitch component information includes at least one of the following: quantity information of pitch components, position information of pitch components, amplitude information of pitch components, or pitch Energy information of the ingredients.
  • the second encoding parameter further includes a noise floor parameter.
  • the noise floor parameter is used to indicate the noise floor energy.
  • the fourth aspect of the present invention provides an audio decoder, including: a receiving unit for obtaining a coded stream; a demultiplexing unit for demultiplexing the coded stream to obtain an audio signal The first encoding parameter of the current frame and the second encoding parameter of the current frame, where the second encoding parameter of the current frame includes pitch component information; the acquiring unit is configured to acquire the current frame according to the first encoding parameter The first high-band signal of the current frame and the first low-band signal of the current frame; the second high-band signal of the current frame is obtained according to the second encoding parameter, and the second high-band signal includes a reconstructed tone Signal; a fusion unit for obtaining the fused high-band signal of the current frame according to the second high-band signal of the current frame and the first high-band signal of the current frame.
  • the first high-band signal includes: a decoded high-band signal obtained by direct decoding according to the first encoding parameter, and a frequency band based on the first low-band signal At least one of the expanded high-band signals obtained by the expansion.
  • the first high-band signal includes the extended high-band signal
  • the fusion unit is specifically configured to: if the current frame The value of the reconstructed tone signal spectrum at the current frequency of the current subband satisfies a preset condition, and the current frequency is obtained according to the spectrum of the extended high-band signal at the current frequency and the noise floor information of the current subband.
  • the noise floor information includes a noise floor gain parameter.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, and the extended high-band signal of the current subband is The energy of the frequency spectrum and the noise floor energy of the current subband are obtained.
  • the fusion unit Specifically, if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame does not meet a preset condition, obtain the value of the reconstructed tone signal spectrum at the current frequency point according to the reconstructed tone signal spectrum at the current frequency point Fused high-band signal; or if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame satisfies a preset condition, according to the spectrum of the extended high-band signal at the current frequency point, The frequency spectrum of the decoded high-band signal at the current frequency point and the noise floor information of the current subband obtain the fused high-band signal at the current frequency point.
  • the noise floor information includes a noise floor gain parameter.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, the noise floor energy of the current subband, and The energy of the spectrum of the extended high-band signal of the current subband and the energy of the spectrum of the decoded high-band signal of the current subband are obtained.
  • the fusion unit It is also used for: selecting at least one signal from the decoded high-band signal, the extended high-band signal, and the reconstructed tone signal to obtain the fused high frequency of the current frame according to preset instruction information or instruction information obtained by decoding With signal.
  • the second encoding parameter further includes a noise floor parameter for indicating the energy of the noise floor.
  • the preset condition includes: the value of the reconstructed tone signal spectrum is 0 or less than a preset threshold.
  • a fifth aspect of the present invention provides an audio encoding device, including at least one processor, the at least one processor is configured to be coupled with a memory, read and execute instructions in the memory, so as to implement any of the instructions in the first aspect.
  • a sixth aspect of the present invention provides an audio decoding device, including at least one processor, the at least one processor is configured to be coupled with a memory, read and execute instructions in the memory, so as to implement any of the instructions in the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, which when run on a computer, causes the computer to execute the above-mentioned first or second aspect. The method described.
  • embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first or second aspect above.
  • an embodiment of the present application provides a communication device.
  • the communication device may include entities such as audio codec equipment or a chip.
  • the communication device includes a processor and optionally a memory; the memory is used for Storing instructions; the processor is configured to execute the instructions in the memory, so that the communication device executes the method according to any one of the foregoing first aspect or second aspect.
  • this application provides a chip system that includes a processor for supporting audio codec devices to implement the functions involved in the above aspects, for example, sending or processing the data and/or involved in the above methods Or information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data of the audio codec device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the audio encoder in the embodiment of the present invention encodes the tonal component information, so that the audio decoder can decode the audio signal according to the received tonal component information, and can more accurately restore the tonal components in the audio signal, thereby Improved the quality of decoded audio signals.
  • FIG. 1 is a schematic structural diagram of an audio codec system provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of an audio coding method provided by an embodiment of the application
  • FIG. 3 is a schematic flowchart of an audio decoding method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of a mobile terminal according to an embodiment of the application.
  • Fig. 5 is a schematic diagram of a network element according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of the composition structure of an audio coding device provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of the composition structure of an audio decoding device provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of the composition structure of another audio coding device provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of the composition structure of another audio decoding device provided by an embodiment of the application.
  • the audio signal in the embodiment of the present application refers to the input signal in the audio encoding device.
  • the audio signal may include multiple frames.
  • the current frame may specifically refer to a certain frame in the audio signal.
  • the current frame The audio signal coding and decoding are illustrated by examples.
  • the previous frame or the next frame of the current frame in the audio signal can be coded and decoded according to the coding and decoding mode of the current frame audio signal.
  • the audio signal in the embodiment of the present application may be a mono audio signal, or may also be a stereo signal.
  • the stereo signal can be the original stereo signal, it can also be a stereo signal composed of two signals (left channel signal and right channel signal) included in the multi-channel signal, or it can be composed of the multi-channel signal.
  • Fig. 1 is a schematic structural diagram of an audio coding and decoding system according to an exemplary embodiment of the application.
  • the audio codec system includes an encoding component 110 and a decoding component 120.
  • the encoding component 110 is used to encode the current frame (audio signal) in the frequency domain or the time domain.
  • the encoding component 110 can be implemented by software; alternatively, it can also be implemented by hardware; or, it can also be implemented by a combination of software and hardware, which is not limited in the embodiments of the present application.
  • the encoding component 110 encodes the current frame in the frequency domain or the time domain, in a possible implementation manner, the steps shown in FIG. 2 may be included.
  • the encoding component 110 and the decoding component 120 may be connected in a wired or wireless manner, and the decoding component 120 may obtain the encoded bitstream generated by the encoding component 110 through the connection between the encoding component 110 and the encoding component 110; or, the encoding component 110 may The generated code stream is stored in the memory, and the decoding component 120 reads the code stream in the memory.
  • the decoding component 120 can be implemented by software; alternatively, it can also be implemented by hardware; or, it can also be implemented by a combination of software and hardware, which is not limited in the embodiment of the present application.
  • the decoding component 120 decodes the current frame (audio signal) in the frequency domain or the time domain, in a possible implementation manner, the steps shown in FIG. 3 may be included.
  • the encoding component 110 and the decoding component 120 can be provided in the same device; or, they can also be provided in different devices.
  • the device can be a terminal with audio signal processing functions such as mobile phones, tablet computers, laptop computers and desktop computers, Bluetooth speakers, voice recorders, wearable devices, etc., or it can be a core network or wireless network with audio signal processing capabilities This embodiment does not limit this.
  • the encoding component 110 is installed in the mobile terminal 130
  • the decoding component 120 is installed in the mobile terminal 140.
  • the mobile terminal 130 and the mobile terminal 140 are independent of each other and have audio signal processing capabilities.
  • the electronic device may be a mobile phone, a wearable device, a virtual reality (VR) device, or an augmented reality (AR) device, etc., and the mobile terminal 130 and the mobile terminal 140 are connected wirelessly or wiredly. Take network connection as an example.
  • the mobile terminal 130 may include an acquisition component 131, an encoding component 110, and a channel encoding component 132, where the acquisition component 131 is connected to the encoding component 110, and the encoding component 110 is connected to the encoding component 132.
  • the mobile terminal 140 may include an audio playing component 141, a decoding component 120, and a channel decoding component 142.
  • the audio playing component 141 is connected to the decoding component 120
  • the decoding component 120 is connected to the channel decoding component 142.
  • the mobile terminal 130 After the mobile terminal 130 collects the audio signal through the collection component 131, it encodes the audio signal through the encoding component 110 to obtain an encoded code stream; then, the channel encoding component 132 encodes the encoded code stream to obtain a transmission signal.
  • the mobile terminal 130 transmits the transmission signal to the mobile terminal 140 through a wireless or wired network.
  • the mobile terminal 140 After receiving the transmission signal, the mobile terminal 140 decodes the transmission signal through the channel decoding component 142 to obtain a code stream; decodes the code stream through the decoding component 110 to obtain an audio signal; and plays the audio signal through the audio playback component. It can be understood that the mobile terminal 130 may also include components included in the mobile terminal 140, and the mobile terminal 140 may also include components included in the mobile terminal 130.
  • the encoding component 110 and the decoding component 120 are provided in a network element 150 capable of processing audio signals in the same core network or wireless network as an example for description.
  • the network element 150 includes a channel decoding component 151, a decoding component 120, an encoding component 110, and a channel encoding component 152.
  • the channel decoding component 151 is connected to the decoding component 120
  • the decoding component 120 is connected to the encoding component 110
  • the encoding component 110 is connected to the channel encoding component 152.
  • the channel decoding component 151 After the channel decoding component 151 receives the transmission signal sent by other devices, it decodes the transmission signal to obtain the first coded code stream; the decoding component 120 decodes the coded code stream to obtain the audio signal; the coding component 110 performs the decoding on the audio signal Encode to obtain a second coded code stream; use the channel coding component 152 to encode the second coded code stream to obtain a transmission signal.
  • the other device may be a mobile terminal with audio signal processing capability; or, it may also be other network elements with audio signal processing capability, which is not limited in this embodiment.
  • the encoding component 110 and the decoding component 120 in the network element can transcode the encoded code stream sent by the mobile terminal.
  • the device installed with the encoding component 110 may be referred to as an audio encoding device.
  • the audio encoding device may also have an audio decoding function, which is not limited in the implementation of this application.
  • the device installed with the decoding component 120 may be referred to as an audio decoding device.
  • the audio decoding device may also have an audio encoding function, which is not limited in the implementation of this application.
  • Figure 2 depicts the audio coding method process provided by an embodiment of the present invention, including:
  • 201 Acquire a current frame of an audio signal, where the current frame includes a high-band signal and a low-band signal.
  • the current frame can be any frame in the audio signal, and the current frame can include a high-band signal and a low-band signal.
  • the division of the high-band signal and the low-band signal can be determined by the frequency band threshold, which is higher than the frequency band threshold.
  • the frequency band threshold signal is a high frequency band signal, and the signal below the frequency band threshold value is a low frequency band signal.
  • the frequency band threshold can be determined according to the transmission bandwidth, the data processing capability of the encoding component 110 and the decoding component 120, and it will not be done here. limited.
  • the high-band signal and the low-band signal are relative. For example, a signal below a certain frequency is a low-band signal, but a signal above this frequency is a high-band signal (the signal corresponding to the frequency can be classified into the low-band Signals can also be assigned to high-band signals).
  • the frequency varies according to the bandwidth of the current frame. For example, when the current frame is a 0-8khz wideband signal, the frequency may be 4khz; when the current frame is a 0-16khz ultra-wideband signal, the frequency may be 8khz.
  • the first coding parameters may specifically include: time domain noise shaping parameters, frequency domain noise shaping parameters, frequency spectrum quantization parameters, frequency band extension parameters, and so on.
  • the tonal component information includes at least one of the following: quantity information of the tonal component, position information of the tonal component, amplitude information of the tonal component, or energy information of the tonal component.
  • the amplitude information and the energy information may include only one.
  • step 203 may be performed only when the high frequency band signal includes tonal components.
  • the obtaining the second coding parameter of the current frame according to the high-band signal may include: detecting whether the high-band signal includes a tonal component; if the high-band signal includes a tonal component, according to the The high frequency band signal obtains the second coding parameter of the current frame.
  • the second encoding parameter may further include a noise floor parameter, for example, the noise floor parameter may be used to indicate noise floor energy.
  • the audio encoder in the embodiment of the present invention encodes the tonal component information, so that the audio decoder can decode the audio signal according to the received tonal component information, and can more accurately restore the tonal components in the audio signal, thereby Improved the quality of decoded audio signals.
  • FIG. 3 depicts the flow of an audio decoding method provided by another embodiment of the present invention, including:
  • the first coding parameter and the second coding parameter can refer to the coding method, which will not be repeated here.
  • the first high-band signal includes: a decoded high-band signal obtained by direct decoding according to the first encoding parameter, and an extended high-band signal obtained by performing frequency band expansion according to the first low-band signal. At least one.
  • the current frame is obtained according to the second high frequency band signal of the current frame and the first high frequency band signal of the current frame.
  • the fusion high-band signal of the frame may include: if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame satisfies a preset condition, according to the value of the extended high-band signal at the current frequency point Frequency spectrum and the noise floor information of the current subband to obtain the fused high-band signal at the current frequency point; or if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame does not meet the expected Assuming a condition, the fused high-band signal at the current frequency point is obtained according to the reconstructed tone signal spectrum at the current frequency point.
  • the noise floor information may include a noise floor gain parameter.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, the energy of the spectrum of the extended high-band signal of the current subband, and the noise of the current subband Base energy gain.
  • the second high-frequency signal of the current frame and the first high-frequency signal of the current frame may include: if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame does not meet a preset condition, according to the current frequency point To obtain the fused high-band signal at the current frequency point from the reconstructed tone signal spectrum; or if the value of the reconstructed tone signal spectrum at the current frequency point of the current subband of the current frame satisfies a preset condition, according to the current The spectrum of the extended high-band signal at the frequency point, the spectrum of the decoded high-band signal at the current frequency point, and the noise floor information of the current subband to obtain the fused high-band signal at the current frequency point .
  • the noise floor information includes noise floor gain parameters.
  • the noise floor gain parameter of the current subband is based on the width of the current subband, the noise floor energy of the current subband, and the frequency spectrum of the extended high-band signal of the current subband. Energy, and the energy of the frequency spectrum of the decoded high-band signal of the current subband is obtained.
  • the preset condition includes: the value of the reconstructed tone signal spectrum is 0. In another embodiment of the present invention, the preset condition includes: the value of the reconstructed tone signal spectrum is less than a preset threshold, and the preset threshold is a real number greater than zero.
  • the audio encoder in the embodiment of the present invention encodes the tonal component information, so that the audio decoder can decode the audio signal according to the received tonal component information, and can more accurately restore the tonal components in the audio signal, thereby Improved the quality of decoded audio signals.
  • the audio decoding method described in FIG. 3 may further include:
  • At least one signal is selected from the decoded high-band signal, the extended high-band signal, and the reconstructed tone signal to obtain the fused high-band signal of the current frame.
  • the frequency spectrum of the decoded high-band signal obtained by direct decoding according to the first coding parameter is denoted as enc_spec[sfb]
  • the spectrum of the extended high-band signal obtained by performing frequency band expansion according to the first low-band signal is denoted as patch_spec[sfb]
  • the spectrum of the reconstructed tone signal is denoted as recon_spec[sfb].
  • the noise floor energy is denoted as E noise_floor [sfb].
  • the noise floor energy can be obtained, for example, from the noise floor energy parameter E noise_floor [tile] in the spectrum interval according to the correspondence between the spectrum interval and the subband, that is, each sfb in the tile-th spectrum interval
  • the energy of the noise floor is equal to E noise_floor [tile].
  • the fused high-frequency signal of the current frame obtained according to the second high-frequency signal of the current frame and the first high-frequency signal of the current frame can be divided into the following types condition:
  • merge_spec[sfb][k] patch_spec[sfb][k],k ⁇ [sfb_offset[sfb],sfb_offset[sfb+1])
  • merge_spec[sfb][k] represents the fusion signal spectrum at the kth frequency point of the sfb subband
  • sfb_offset is the subband division table
  • sfb_offset[sfb] and sfb_offset[sfb+1] are respectively the sfb and sfb+1 The starting point of each subband.
  • g noise_floor [sfb] is the noise floor gain parameter of the sfb subband, which is calculated from the noise floor energy parameter of the sfb subband and the energy of patch_spec[sfb], namely:
  • sfb_width[sfb] is the width of the sfb-th subband, expressed as:
  • sfb_width[sfb] sfb_offset[sfb+1]-sfb_offset[sfb]
  • E patch [sfb] is the energy of patch_spec[sfb], and the calculation process is as follows:
  • the value range of k is k ⁇ [sfb_offset[sfb], sfb_offset[sfb+1]).
  • Fusion methods can be divided into two types, one is to combine the above three spectrums, with recon_spec[sfb] as the main component, and the other two are adjusted to the noise floor energy level; the other is to combine enc_spec[sfb] and patch_spec [sfb] way.
  • the high frequency signal spectrum obtained by patch_spec[sfb] and enc_spec[sfb] is adjusted with the noise floor gain and combined with recon_spec[sfb] to obtain the fusion signal spectrum.
  • g noise_floor [sfb] is the noise floor gain parameter of the sfbth subband, which is calculated from the sfbth subband noise floor energy parameter, patch_spec[sfb] energy, and enc_spec[sfb] energy, namely:
  • E patch [sfb] is the energy of patch_spec[sfb]
  • E enc [sfb] is the energy of enc_spec[sfb]
  • the value range of k is k ⁇ [sfb_offset[sfb], sfb_offset[sfb+1]).
  • Recon_spec[sfb] is no longer reserved, and the fusion signal is composed of patch_spec[sfb] and enc_spec[sfb].
  • method one and method two one of them can be selected by a preset method, or the judgment can be made in a certain manner, for example, method one is selected when the signal meets a certain preset condition.
  • the embodiment of the present invention does not limit the specific selection method.
  • Figure 6 depicts the structure of an audio encoder provided by an embodiment of the present invention, including:
  • the signal acquisition unit 601 is configured to acquire a current frame of an audio signal, where the current frame includes a high-band signal and a low-band signal.
  • the parameter obtaining unit 602 obtains a first coding parameter according to the high frequency band signal and the low frequency band signal; obtains a second coding parameter of the current frame according to the high frequency band signal, and the second coding parameter includes a pitch Ingredient information
  • the coding unit 603 is configured to perform code stream multiplexing on the first coding parameter and the second coding parameter to obtain a coded code stream.
  • Figure 7 illustrates the structure of an audio decoder provided by an embodiment of the present invention, including:
  • the receiving unit 701 is configured to obtain an encoding code stream
  • the demultiplexing unit 702 is configured to demultiplex the code stream to obtain the first coding parameter of the current frame of the audio signal and the second coding parameter of the current frame.
  • Two coding parameters include tonal component information
  • the obtaining unit 703 is configured to obtain the first high-band signal of the current frame and the first low-band signal of the current frame according to the first coding parameter; obtain the information of the current frame according to the second coding parameter A second high-band signal, where the second high-band signal includes a reconstructed tone signal;
  • the fusion unit 704 is configured to obtain the fused high-frequency signal of the current frame according to the second high-frequency signal of the current frame and the first high-frequency signal of the current frame.
  • the specific implementation of the audio decoder can refer to the above-mentioned audio decoding method, which will not be repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the above-mentioned audio encoding method or audio decoding method.
  • the embodiment of the present invention also provides a computer program product containing instructions, which when running on a computer, causes the computer to execute the above-mentioned audio encoding method or audio decoding method.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes some or all of the steps recorded in the above method embodiments.
  • the audio coding device 1000 includes:
  • the receiver 1001, the transmitter 1002, the processor 1003, and the memory 1004 (the number of processors 1003 in the audio encoding device 1000 may be one or more, and one processor is taken as an example in FIG. 8).
  • the receiver 1001, the transmitter 1002, the processor 1003, and the memory 1004 may be connected by a bus or in other ways, where the bus connection is taken as an example in FIG. 8.
  • the memory 1004 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1003. A part of the memory 1004 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1004 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them.
  • the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1003 controls the operation of the audio encoding device, and the processor 1003 may also be referred to as a central processing unit (CPU).
  • the various components of the audio encoding device are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, and a status signal bus.
  • various buses are referred to as bus systems in the figure.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 1003 or implemented by the processor 1003.
  • the processor 1003 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1003 or instructions in the form of software.
  • the above-mentioned processor 1003 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Other programmable logic devices discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1004, and the processor 1003 reads the information in the memory 1004, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 1001 can be used to receive input digital or character information, and generate signal input related to the related settings and function control of the audio coding device.
  • the transmitter 1002 can include display devices such as a display screen, and the transmitter 1002 can be used to output through an external interface Number or character information.
  • the processor 1003 is configured to execute the aforementioned audio coding method.
  • the audio decoding device 1100 includes:
  • the receiver 1101, the transmitter 1102, the processor 1103, and the memory 1104 (the number of processors 1103 in the audio decoding device 1100 may be one or more, and one processor is taken as an example in FIG. 9).
  • the receiver 1101, the transmitter 1102, the processor 1103, and the memory 1104 may be connected by a bus or in other ways, wherein the connection by a bus is taken as an example in FIG. 9.
  • the memory 1104 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1103. A part of the memory 1104 may also include NVRAM.
  • the memory 1104 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them.
  • the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 1103 controls the operation of the audio decoding device, and the processor 1103 may also be referred to as a CPU.
  • the various components of the audio decoding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1103 or implemented by the processor 1103.
  • the processor 1103 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1103 or instructions in the form of software.
  • the aforementioned processor 1103 may be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and completes the steps of the foregoing method in combination with its hardware.
  • the processor 1103 is configured to execute the aforementioned audio decoding method.
  • the chip when the audio encoding device or the audio decoding device is a chip in the terminal, the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, Input/output interface, pin or circuit, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the method of any one of the above-mentioned first aspects.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory). -only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Abstract

本申请实施例公开了一种音频编解码方法和音频编解码设备,用于能够提高解码音频信号的质量。其中,音频编码方法包括:获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;根据所述高频带信号和所述低频带信号得到第一编码参数;根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。

Description

一种音频编解码方法和音频编解码设备
本申请要求于2020年1月13日提交中国专利局、申请号为202010033326.X、发明名称为“一种音频编解码方法和音频编解码设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频信号编解码技术领域,尤其涉及一种音频编解码方法和音频编解码设备。
背景技术
随着生活质量的提高,人们对高质量音频的需求不断增大。为了利用有限的带宽更好地传输音频信号,通常需要先对音频信号进行编码,然后将编码处理后的码流传输到解码端。解码端对接收到的码流进行解码处理,得到解码后的音频信号,解码后的音频信号用于回放。
其中,如何提高解码音频信号的质量,成为一个亟需解决的技术问题。
发明内容
本申请实施例提供了一种音频编解码方法和音频编解码设备,能够提高解码音频信号的质量。
为解决上述技术问题,本申请实施例提供以下技术方案:
本发明的第一方面提供了一种音频编码方法,所述方法包括:获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;根据所述高频带信号和所述低频带信号得到第一编码参数;根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
结合第一方面,在一种实施方式中,所述根据所述高频带信号得到所述当前帧的第二编码参数,包括:检测所述高频带信号是否包括音调成分;若所述高频带信号包括音调成分,根据所述高频带信号得到所述当前帧的第二编码参数。
结合第一方面以及第一方面的前述实施方式,在一种实施方式中,所述音调成分信息包括如下至少一种:音调成分的数量信息、音调成分位置信息、音调成分的幅度信息、或音调成分的能量信息。
结合第一方面以及第一方面的前述实施方式,在一种实施方式中,所述第二编码参数还包括噪声基底参数。
结合第一方面以及第一方面的前述实施方式,在一种实施方式中,所述噪声基底参数用于指示噪声基底能量。
本发明的第二方面提供了一种音频解码方法,所述方法包括:获取编码码流;对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息;根据所述第一编码参数得到所述当前帧的第一高频带信号和所述当前帧的第一低频带信号;根据所述第二编码参数得到 所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号;根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
结合第二方面,在一种实施方式中,所述第一高频带信号包括:根据所述第一编码参数直接解码得到的解码高频带信号,以及根据所述第一低频带信号进行频带扩展得到的扩展高频带信号中的至少一种。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,若所述第一高频带信号包括所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号包括:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述噪声基底信息包括噪声基底增益参数。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的噪声基底能量获得。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号包括:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱,所述当前频点上的解码高频带信号的频谱,以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述噪声基底信息包括噪声基底增益参数。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的噪声基底能量,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的解码高频带信号的频谱的能量获得。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述方法还包括:根据预设指示信息或解码得到的指示信息,从所述解码高频带信号,扩展高频带信号以及所述重建音调信号中选择至少一个信号得到所述当前帧的融合高频带信号。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述第二编码参数还包括用于指示所述噪声基底能量的噪声基底参数。
结合第二方面以及第二方面的前述实施方式,在一种实施方式中,所述预设条件包括: 重建音调信号频谱的值为0或小于预设阈值。
本发明的第三方面提供了一种音频编码器,包括:信号获取单元,用于获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;参数获取单元,根据所述高频带信号和所述低频带信号得到第一编码参数;根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;编码单元,用于对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
结合第三方面以及第三方面的前述实施方式,在一种实施方式中,参数获取单元具体还用于:检测所述高频带信号是否包括音调成分;若所述高频带信号包括音调成分,根据所述高频带信号得到所述当前帧的第二编码参数。
结合第三方面以及第三方面的前述实施方式,在一种实施方式中,所述音调成分信息包括如下至少一种:音调成分的数量信息、音调成分位置信息、音调成分的幅度信息、或音调成分的能量信息。
结合第三方面以及第三方面的前述实施方式,在一种实施方式中,所述第二编码参数还包括噪声基底参数。
结合第三方面以及第三方面的前述实施方式,在一种实施方式中,所述噪声基底参数用于指示噪声基底能量。
本发明的第四方面提高了一种音频解码器,包括:接收单元,用于获取编码码流;解复用单元,用于对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息;获取单元,用于根据所述第一编码参数得到所述当前帧的第一高频带信号和所述当前帧的第一低频带信号;根据所述第二编码参数得到所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号;融合单元,用于根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
结合第四方面,在一种实施方式中,所述第一高频带信号包括:根据所述第一编码参数直接解码得到的解码高频带信号,以及根据所述第一低频带信号进行频带扩展得到的扩展高频带信号中的至少一种。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述第一高频带信号包括所述扩展高频带信号,所述融合单元具体用于:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述噪声基底信息包括噪声基底增益参数。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的噪声基底能量获得。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述融合单元具体用于:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱,所述当前频点上的解码高频带信号的频谱,以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述噪声基底信息包括噪声基底增益参数。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的噪声基底能量,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的解码高频带信号的频谱的能量获得。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述融合单元还用于:根据预设指示信息或解码得到的指示信息,从所述解码高频带信号,扩展高频带信号以及所述重建音调信号中选择至少一个信号得到所述当前帧的融合高频带信号。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述第二编码参数还包括用于指示所述噪声基底能量的噪声基底参数。
结合第四方面以及第四方面的前述实施方式,在一种实施方式中,所述预设条件包括:重建音调信号频谱的值为0或小于预设阈值。
本发明的第五方面提供了一种音频编码设备,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如第一方面中任一方法。
本发明的第六方面提供了一种音频解码设备,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如第二方面的任一方法。
第七方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第八方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第九方面,本申请实施例提供一种通信装置,该通信装置可以包括音频编解码设备或者芯片等实体,所述通信装置包括:处理器,可选的,还包括存储器;所述存储器用于存储指令;所述处理器用于执行所述存储器中的所述指令,使得所述通信装置执行如前述第一方面或第二方面中任一项所述的方法。
第十方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持音频编解码设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存音频编解码设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分 立器件。
从上可知,本发明实施例中音频编码器会对音调成分信息进行编码,使得音频解码器可以根据接收的音调成分信息进行音频信号的解码,能够更准确地恢复音频信号中的音调成分,从而提高了解码音频信号的质量。
附图说明
图1为本申请实施例提供的一种音频编解码系统的结构示意图;
图2为本申请实施例提供的一种音频编码方法的示意性流程图;
图3为本申请实施例提供的一种音频解码方法的示意性流程图;
图4为本申请实施例的移动终端的示意图;
图5为本申请实施例的网元的示意图;
图6为本申请实施例提供的一种音频编码设备的组成结构示意图;
图7为本申请实施例提供的一种音频解码设备的组成结构示意图;
图8为本申请实施例提供的另一种音频编码设备的组成结构示意图;
图9为本申请实施例提供的另一种音频解码设备的组成结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请实施例中的音频信号是指音频编码设备中的输入信号,该音频信号中可以包括多个帧,例如当前帧可以特指音频信号中的某一个帧,本申请实施例中以当前帧音频信号的编解码进行示例说明,音频信号中当前帧的前一帧或者后一帧都可以根据该当前帧音频信号的编解码方式进行相应的编解码,对于音频信号中当前帧的前一帧或者后一帧的编解码过程不再逐一说明。另外,本申请实施例中的音频信号可以是单声道音频信号,或者,也可以为立体声信号。其中,立体声信号可以是原始的立体声信号,也可以是多声道信号中包括的两路信号(左声道信号和右声道信号)组成的立体声信号,还可以是由多声道信号中包含的至少三路信号产生的两路信号组成的立体声信号,本申请实施例中对此并不限定。
图1为本申请一个示例性实施例的音频编解码系统的结构示意图。该音频编解码系统包括编码组件110和解码组件120。
编码组件110用于对当前帧(音频信号)在频域或时域上进行编码。可选地,编码组件110可以通过软件实现;或者,也可以通过硬件实现;或者,还可以通过软硬件结合的形式实现,本申请实施例中对此不作限定。
编码组件110对当前帧在频域或时域上进行编码时,在一种可能的实现方式中,可以 包括如图2所示的步骤。
可选地,编码组件110与解码组件120可以通过有线或无线的方式相连,解码组件120可以通过其与编码组件110之间的连接获取编码组件110生成的编码码流;或者,编码组件110可以将生成的编码码流存储至存储器,解码组件120读取存储器中的编码码流。
可选地,解码组件120可以通过软件实现;或者,也可以通过硬件实现;或者,还可以通过软硬件结合的形式实现,本申请实施例中对此不作限定。
解码组件120对当前帧(音频信号)在频域或时域上进行解码时,在一种可能的实现方式中,可以包括如图3所示的步骤。
可选地,编码组件110和解码组件120可以设置在同一设备中;或者,也可以设置在不同设备中。设备可以为手机、平板电脑、膝上型便携计算机和台式计算机、蓝牙音箱、录音笔、可穿戴式设备等具有音频信号处理功能的终端,也可以是核心网、无线网中具有音频信号处理能力的网元,本实施例对此不作限定。
示意性地,如图4所示,本实施例以编码组件110设置于移动终端130中、解码组件120设置于移动终端140中,移动终端130与移动终端140是相互独立的具有音频信号处理能力的电子设备,例如可以是手机,可穿戴设备,虚拟现实(virtual reality,VR)设备,或增强现实(augmented reality,AR)设备等等,且移动终端130与移动终端140之间通过无线或有线网络连接为例进行说明。
可选地,移动终端130可以包括采集组件131、编码组件110和信道编码组件132,其中,采集组件131与编码组件110相连,编码组件110与编码组件132相连。
可选地,移动终端140可以包括音频播放组件141、解码组件120和信道解码组件142,其中,音频播放组件141与解码组件120相连,解码组件120与信道解码组件142相连。
移动终端130通过采集组件131采集到音频信号后,通过编码组件110对该音频信号进行编码,得到编码码流;然后,通过信道编码组件132对编码码流进行编码,得到传输信号。
移动终端130通过无线或有线网络将该传输信号发送至移动终端140。
移动终端140接收到该传输信号后,通过信道解码组件142对传输信号进行解码得到码码流;通过解码组件110对编码码流进行解码得到音频信号;通过音频播放组件播放该音频信号。可以理解的是,移动终端130也可以包括移动终端140所包括的组件,移动终端140也可以包括移动终端130所包括的组件。
示意性地,如图5所示,以编码组件110和解码组件120设置于同一核心网或无线网中具有音频信号处理能力的网元150中为例进行说明。
可选地,网元150包括信道解码组件151、解码组件120、编码组件110和信道编码组件152。其中,信道解码组件151与解码组件120相连,解码组件120与编码组件110相连,编码组件110与信道编码组件152相连。
信道解码组件151接收到其它设备发送的传输信号后,对该传输信号进行解码得到第一编码码流;通过解码组件120对编码码流进行解码得到音频信号;通过编码组件110对该音频信号进行编码,得到第二编码码流;通过信道编码组件152对该第二编码码流进行 编码得到传输信号。
其中,其它设备可以是具有音频信号处理能力的移动终端;或者,也可以是具有音频信号处理能力的其它网元,本实施例对此不作限定。
可选地,网元中的编码组件110和解码组件120可以对移动终端发送的编码码流进行转码。
可选地,本申请实施例中可以将安装有编码组件110的设备称为音频编码设备,在实际实现时,该音频编码设备也可以具有音频解码功能,本申请实施对此不作限定。
可选地,本申请实施例中可以将安装有解码组件120的设备称为音频解码设备,在实际实现时,该音频解码设备也可以具有音频编码功能,本申请实施对此不作限定。
图2描述了本发明一个实施例提供的音频编码方法流程,包括:
201、获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号。
其中,当前帧可以是音频信号中的任意一个帧,在当前帧中可以包括高频带信号和低频带信号,其中,高频带信号和低频带信号的划分可以通过频带阈值确定,高于该频带阈值的信号为高频带信号,低于该频带阈值的信号为低频带信号,对于频带阈值的确定可以根据传输带宽、编码组件110和解码组件120的数据处理能力来确定,此处不做限定。
其中高频带信号和低频带信号是相对的,例如低于某个频率的信号为低频带信号,但是高于该频率的信号为高频带信号(该频率对应的信号既可以划到低频带信号,也可以划到高频带信号)。该频率根据当前帧的带宽不同会有不同。例如,在当前帧为0-8khz的宽带信号时,该频率可以为4khz;在当前帧为0-16khz的超宽带信号时,该频率可以为8khz。
202、根据所述高频带信号和所述低频带信号得到第一编码参数。
第一编码参数具体可以包括:时域噪声整形参数、频域噪声整形参数、频谱量化参数、频带扩展参数等。
203、根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息。
在一个实施方式中,所述音调成分信息包括如下至少一种:音调成分的数量信息、音调成分位置信息、音调成分的幅度信息、或音调成分的能量信息。其中,幅度信息和能量信息可以只包括一个。
在一种实施方式中,步骤203可以在所述高频带信号包括了音调成分时才执行。此时,所述根据所述高频带信号得到所述当前帧的第二编码参数可以包括:检测所述高频带信号是否包括音调成分;若所述高频带信号包括音调成分,根据所述高频带信号得到所述当前帧的第二编码参数。
在一种实施方式中,所述第二编码参数还可以包括噪声基底参数,例如,所述噪声基底参数可以用于指示噪声基底能量。
204、对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
从上可知,本发明实施例中音频编码器会对音调成分信息进行编码,使得音频解码器可以根据接收的音调成分信息进行音频信号的解码,能够更准确地恢复音频信号中的音调成分,从而提高了解码音频信号的质量。
图3描述了本发明另一个实施例提供的一种音频解码方法的流程,包括:
301、获取编码码流。
302、对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息。
第一编码参数和第二编码参数可以参考编码方法,此处不再赘述。
303、根据所述第一编码参数得到所述当前帧的第一高频带信号和所述当前帧的第一低频带信号。
其中,所述第一高频带信号包括:根据所述第一编码参数直接解码得到的解码高频带信号,以及根据所述第一低频带信号进行频带扩展得到的扩展高频带信号中的至少一种。
304、根据所述第二编码参数得到所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号。
其中,在所述第一高频带信号包括所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号可以包括:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号。
其中,所述噪声基底信息可以包括噪声基底增益参数。在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的噪声基底能量获得。
若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号可以包括:若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号;或若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱,所述当前频点上的解码高频带信号的频谱,以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号。
其中,所述噪声基底信息包括噪声基底增益参数。在一种实施方式中,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的噪声基底能量,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的解码高频带信号的频谱的能量获得。
在本发明的一个实施例中,所述预设条件包括:重建音调信号频谱的值为0。在本发明的另一个实施例中,所述预设条件包括:重建音调信号频谱的值小于预设阈值,所述预设阈值为大于0的实数。
305、根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
从上可知,本发明实施例中音频编码器会对音调成分信息进行编码,使得音频解码器可以根据接收的音调成分信息进行音频信号的解码,能够更准确地恢复音频信号中的音调成分,从而提高了解码音频信号的质量。
在另一个实施例中,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,图3描述的音频解码方法还可以包括:
根据预设指示信息或解码得到的指示信息,从所述解码高频带信号,扩展高频带信号以及所述重建音调信号中选择至少一个信号得到所述当前帧的融合高频带信号。
例如,在本发明的一个实施例中,当前帧的高频带信号的第sfb个子带中,根据所述第一编码参数直接解码得到的解码高频带信号的频谱记为enc_spec[sfb],根据所述第一低频带信号进行频带扩展得到的扩展高频带信号的频谱记为patch_spec[sfb],重建音调信号的频谱记为recon_spec[sfb]。噪声基底能量记为E noise_floor[sfb],噪声基底能量例如可以由频谱区间的噪声基底能量参数E noise_floor[tile]按频谱区间与子带的对应关系获得,即第tile个频谱区间中的各个sfb的噪声基底能量均等于E noise_floor[tile]。
针对第sfb个高频子带,根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号可以分为如下几种情况:
情况1:
若第sfb子带中仅存在patch_spec[sfb],则第sfb子带的融合信号频谱表示为:
merge_spec[sfb][k]=patch_spec[sfb][k],k∈[sfb_offset[sfb],sfb_offset[sfb+1])
其中merge_spec[sfb][k]表示第sfb子带第k频点上的融合信号频谱,sfb_offset为子带划分表,sfb_offset[sfb]和sfb_offset[sfb+1]分别为第sfb和第sfb+1个子带的起点。
情况2:
若第sfb子带中仅存在patch_spec[sfb]和enc_spec[sfb],则第sfb子带的融合信号频谱由以上两者融合得到:
若第sfb子带第k频点上,enc_spec[sfb][k]为零,则:
merge_spec[sfb][k]=patch_spec[sfb][k],if enc_spec[sfb][k]=0
若第sfb子带第k频点上,enc_spec[sfb][k]不为零,则:
merge_spec[sfb][k]=enc_spec[sfb][k],if enc_spec[sfb][k]!=0
情况3:
若第sfb子带中仅存在patch_spec[sfb]和recon_spec[sfb],则第sfb子带的融合信 号频谱由以上两者融合得到:
若第sfb子带第k频点上,recon_spec[sfb][k]为零,则:
merge_spec[sfb][k]=g noise_floor[sfb]*patch_spec[sfb][k],if recon_spec[sfb][k]=0
其中g noise_floor[sfb]为第sfb子带的噪声基底增益参数,由第sfb子带噪声基底能量参数和patch_spec[sfb]的能量计算得到,即:
Figure PCTCN2021071328-appb-000001
其中,sfb_width[sfb]为第sfb个子带的宽度,表示为:
sfb_width[sfb]=sfb_offset[sfb+1]-sfb_offset[sfb]
其中,E patch[sfb]为patch_spec[sfb]的能量,计算过程如下:
E patch[sfb]=∑ k(patch_spec[sfb][k]) 2
其中k取值范围是k∈[sfb_offset[sfb],sfb_offset[sfb+1])。
若第sfb子带第k频点上,recon_spec[sfb][k]不为零,则:
merge_spec[sfb][k]=recon_spec[sfb][k],if recon_spec[sfb][k]!=0
情况4:
若第sfb子带中同时存在enc_spec[sfb]、patch_spec[sfb],以及recon_spec[sfb],则可以将以上三者融合得到融合信号。
融合方式可以分为两种,一种是融合以上三者频谱的方式,以recon_spec[sfb]为主要成分,其他两者能量调整到噪声基底能量水平;另一种是融合enc_spec[sfb]和patch_spec[sfb]的方式。
方式一:
将patch_spec[sfb]和enc_spec[sfb]所得高频信号频谱用噪声基底增益进行调整,并与recon_spec[sfb]结合,得到融合信号频谱。
具体方法如下:
若第sfb子带中第k频点上,若recon_spec[sfb][k]不为零,则:
merge_spec[sfb][k]=recon_spec[sfb][k],if recon_spec[sfb][k]!=0
若第sfb子带中第k频点上,recon_spec[sfb][k]为零,则:
merge_spec[sfb][k]
=g noise_floor[sfb]*(patch_spec[sfb][k]+enc_spec[sfb][k]),if recon_spec[sfb][k]=0
其中g noise_floor[sfb]为第sfb子带的噪声基底增益参数,由第sfb子带噪声基底能量参数、patch_spec[sfb]的能量、enc_spec[sfb]的能量计算得到,即:
Figure PCTCN2021071328-appb-000002
其中,E patch[sfb]为patch_spec[sfb]的能量;
E enc[sfb]为enc_spec[sfb]的能量,计算过程如下:
E enc[sfb]=∑ k(enc_spec[sfb][k]) 2
其中k取值范围是k∈[sfb_offset[sfb],sfb_offset[sfb+1])。
方式二:
不再保留recon_spec[sfb],融合信号由patch_spec[sfb]和enc_spec[sfb]构成。
具体实施方式同情况2。
方式一和方式二的选择策略:
上述方式一和方式二两种高频频谱融合方法,可以通过预设方式选择其中一种,或者通过某种方式进行判断,例如在信号满足某种预设条件时选择方式一。本发明实施例对具体的选择方式不做限定。
图6描述了本发明一个实施例提供的音频编码器的结构,包括:
信号获取单元601,用于获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号。
参数获取单元602,根据所述高频带信号和所述低频带信号得到第一编码参数;根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;
编码单元603,用于对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
该音频编码器的具体实现可以参考上述的音频编码方法,此处不再赘述。
图7描述了本发明一个实施例提供的音频解码器的结构,包括:
接收单元701,用于获取编码码流;
解复用单元702,用于对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息;
获取单元703,用于根据所述第一编码参数得到所述当前帧的第一高频带信号和所述 当前帧的第一低频带信号;根据所述第二编码参数得到所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号;
融合单元704,用于根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
该音频解码器的具体实现可以参考上述的音频解码方法,此处不再赘述。
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本发明实施例还提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述的音频编码方法或音频解码方法。
本发明实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述的音频编码方法或音频解码方法。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储有程序,该程序执行包括上述方法实施例中记载的部分或全部步骤。
接下来介绍本申请实施例提供的另一种音频编码设备,请参阅图8所示,音频编码设备1000包括:
接收器1001、发射器1002、处理器1003和存储器1004(其中音频编码设备1000中的处理器1003的数量可以一个或多个,图8中以一个处理器为例)。在本申请的一些实施例中,接收器1001、发射器1002、处理器1003和存储器1004可通过总线或其它方式连接,其中,图8中以通过总线连接为例。
存储器1004可以包括只读存储器和随机存取存储器,并向处理器1003提供指令和数据。存储器1004的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1004存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1003控制音频编码设备的操作,处理器1003还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,音频编码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1003中,或者由处理器1003实现。处理器1003可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1003中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1003可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻 辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1004,处理器1003读取存储器1004中的信息,结合其硬件完成上述方法的步骤。
接收器1001可用于接收输入的数字或字符信息,以及产生与音频编码设备的相关设置以及功能控制有关的信号输入,发射器1002可包括显示屏等显示设备,发射器1002可用于通过外接接口输出数字或字符信息。
本申请实施例中,处理器1003,用于执行前述的音频编码方法。
接下来介绍本申请实施例提供的另一种音频解码设备,请参阅图9所示,音频解码设备1100包括:
接收器1101、发射器1102、处理器1103和存储器1104(其中音频解码设备1100中的处理器1103的数量可以一个或多个,图9中以一个处理器为例)。在本申请的一些实施例中,接收器1101、发射器1102、处理器1103和存储器1104可通过总线或其它方式连接,其中,图9中以通过总线连接为例。
存储器1104可以包括只读存储器和随机存取存储器,并向处理器1103提供指令和数据。存储器1104的一部分还可以包括NVRAM。存储器1104存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1103控制音频解码设备的操作,处理器1103还可以称为CPU。具体的应用中,音频解码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1103中,或者由处理器1103实现。处理器1103可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1103中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1103可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1104,处理器1103读取存储器1104中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中,处理器1103,用于执行前述的音频解码方法。
在另一种可能的设计中,当音频编码设备或音频解码设备为终端内的芯片时,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行上述第一方面任意一项的方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (35)

  1. 一种音频编码方法,其特征在于,所述方法包括:
    获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;
    根据所述高频带信号和所述低频带信号得到第一编码参数;
    根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;
    对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述高频带信号得到所述当前帧的第二编码参数,包括:
    检测所述高频带信号是否包括音调成分;
    若所述高频带信号包括音调成分,根据所述高频带信号得到所述当前帧的第二编码参数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述音调成分信息包括如下至少一种:音调成分的数量信息、音调成分位置信息、音调成分的幅度信息、或音调成分的能量信息。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述第二编码参数还包括噪声基底参数。
  5. 根据权利要求4所述的方法,其特征在于,所述噪声基底参数包括噪声基底能量。
  6. 一种音频解码方法,其特征在于,所述方法包括:
    获取编码码流;
    对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息;
    根据所述第一编码参数得到所述当前帧的第一高频带信号和所述当前帧的第一低频带信号;
    根据所述第二编码参数得到所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号;
    根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
  7. 根据权利要求6所述的方法,其特征在于,所述第一高频带信号包括:根据所述第一编码参数直接解码得到的解码高频带信号,以及根据所述第一低频带信号进行频带扩展得到的扩展高频带信号中的至少一种。
  8. 根据权利要求7所述的方法,其特征在于,若所述第一高频带信号包括所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号包括:
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号;或
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号。
  9. 根据权利要求8所述的方法,其特征在于,所述噪声基底信息包括噪声基底增益参数。
  10. 根据权利要求9所述的方法,其特征在于,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的噪声基底能量获得。
  11. 根据权利要求7所述的方法,其特征在于,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号包括:
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号;或
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱,所述当前频点上的解码高频带信号的频谱,以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号。
  12. 根据权利要求11所述的方法,其特征在于,所述噪声基底信息包括噪声基底增益参数。
  13. 根据权利要求12所述的方法,其特征在于,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的噪声基底能量,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的解码高频带信号的频谱的能量获得。
  14. 根据权利要求7所述的方法,其特征在于,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述方法还包括:
    根据预设指示信息或解码得到的指示信息,从所述解码高频带信号,扩展高频带信号以及所述重建音调信号中选择至少一个信号得到所述当前帧的融合高频带信号。
  15. 根据权利要求10或13所述的方法,其特征在于,所述第二编码参数包括用于指示所述噪声基底能量的噪声基底参数。
  16. 根据权利要求8或11所述的方法,其特征在于,所述预设条件包括:重建音调信号频谱的值为0或小于预设阈值。
  17. 一种音频编码器,其特征在于,包括:
    信号获取单元,用于获取音频信号的当前帧,所述当前帧包括高频带信号和低频带信号;
    参数获取单元,根据所述高频带信号和所述低频带信号得到第一编码参数;根据所述高频带信号得到所述当前帧的第二编码参数,所述第二编码参数包括音调成分信息;
    编码单元,用于对所述第一编码参数和所述第二编码参数进行码流复用,以得到编码码流。
  18. 根据权利要求17所述的音频编码器,其特征在于,参数获取单元具体还用于:
    检测所述高频带信号是否包括音调成分;
    若所述高频带信号包括音调成分,根据所述高频带信号得到所述当前帧的第二编码参数。
  19. 根据权利要求17或18所述的音频编码器,其特征在于,所述音调成分信息包括如下至少一种:音调成分的数量信息、音调成分位置信息、音调成分的幅度信息、或音调成分的能量信息。
  20. 根据权利要求17至19任一所述的音频编码器,其特征在于,所述第二编码参数还包括噪声基底参数。
  21. 根据权利要求20所述的音频编码器,其特征在于,所述噪声基底参数用于指示噪声基底能量。
  22. 一种音频解码器,其特征在于,包括:
    接收单元,用于获取编码码流;
    解复用单元,用于对所述编码码流进行码流解复用,以得到音频信号的当前帧的第一编码参数和所述当前帧的第二编码参数,所述当前帧的第二编码参数包括音调成分信息;
    获取单元,用于根据所述第一编码参数得到所述当前帧的第一高频带信号和所述当前帧的第一低频带信号;根据所述第二编码参数得到所述当前帧的第二高频带信号,所述第二高频带信号包括重建音调信号;
    融合单元,用于根据所述当前帧的第二高频带信号以及所述当前帧的第一高频带信号得到所述当前帧的融合高频带信号。
  23. 根据权利要求22所述的音频解码器,其特征在于,所述第一高频带信号包括:根据所述第一编码参数直接解码得到的解码高频带信号,以及根据所述第一低频带信号进行频带扩展得到的扩展高频带信号中的至少一种。
  24. 根据权利要求23所述的音频解码器,其特征在于,所述第一高频带信号包括所述扩展高频带信号,所述融合单元具体用于:
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号;或
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号。
  25. 根据权利要求24所述的音频解码器,其特征在于,所述噪声基底信息包括噪声基底增益参数。
  26. 根据权利要求25所述的音频解码器,其特征在于,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的噪声基底能量获得。
  27. 根据权利要求23所述的音频解码器,其特征在于,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述融合单元具体用于:
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值不满足预设条件,根据所述当前频点上的重建音调信号频谱得到所述当前频点上的融合高频带信号;或
    若所述当前帧的当前子带的当前频点上的重建音调信号频谱的值满足预设条件,根据所述当前频点上的扩展高频带信号的频谱,所述当前频点上的解码高频带信号的频谱,以及所述当前子带的噪声基底信息得到所述当前频点上的融合高频带信号。
  28. 根据权利要求27所述的音频解码器,其特征在于,所述噪声基底信息包括噪声基底增益参数。
  29. 根据权利要求28所述的音频解码器,其特征在于,所述当前子带的噪声基底增益参数根据所述当前子带的宽度,所述当前子带的噪声基底能量,所述当前子带的扩展高频带信号的频谱的能量,以及所述当前子带的解码高频带信号的频谱的能量获得。
  30. 根据权利要求23所述的音频解码器,其特征在于,若所述第一高频带信号包括所述解码高频带信号以及所述扩展高频带信号,所述融合单元还用于:
    根据预设指示信息或解码得到的指示信息,从所述解码高频带信号,扩展高频带信号以及所述重建音调信号中选择至少一个信号得到所述当前帧的融合高频带信号。
  31. 根据权利要求26或29所述的音频解码器,其特征在于,所述第二编码参数包括用于指示所述噪声基底能量的噪声基底参数。
  32. 根据权利要求31或34所述的音频解码器,其特征在于,所述预设条件包括:重建音调信号频谱的值为0或小于预设阈值。
  33. 一种音频编码设备,其特征在于,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求1至5中任一项所述的方法。
  34. 一种音频解码设备,其特征在于,包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求6至16中任一项所述的方法。
  35. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至16任意一项所述的方法。
PCT/CN2021/071328 2020-01-13 2021-01-12 一种音频编解码方法和音频编解码设备 WO2021143692A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21741759.1A EP4084001A4 (en) 2020-01-13 2021-01-12 AUDIO CODING AND DECODING METHODS AND DEVICES
KR1020227026854A KR20220123108A (ko) 2020-01-13 2021-01-12 오디오 인코딩 및 디코딩 방법 및 오디오 인코딩 및 디코딩 장치
JP2022542749A JP7443534B2 (ja) 2020-01-13 2021-01-12 オーディオ符号化および復号方法ならびにオーディオ符号化および復号デバイス
US17/864,116 US20220358941A1 (en) 2020-01-13 2022-07-13 Audio encoding and decoding method and audio encoding and decoding device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010033326.X 2020-01-13
CN202010033326.XA CN113192523A (zh) 2020-01-13 2020-01-13 一种音频编解码方法和音频编解码设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/864,116 Continuation US20220358941A1 (en) 2020-01-13 2022-07-13 Audio encoding and decoding method and audio encoding and decoding device

Publications (1)

Publication Number Publication Date
WO2021143692A1 true WO2021143692A1 (zh) 2021-07-22

Family

ID=76863590

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071328 WO2021143692A1 (zh) 2020-01-13 2021-01-12 一种音频编解码方法和音频编解码设备

Country Status (6)

Country Link
US (1) US20220358941A1 (zh)
EP (1) EP4084001A4 (zh)
JP (1) JP7443534B2 (zh)
KR (1) KR20220123108A (zh)
CN (1) CN113192523A (zh)
WO (1) WO2021143692A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808596A (zh) * 2020-05-30 2021-12-17 华为技术有限公司 一种音频编码方法和音频编码装置
CN113808597A (zh) * 2020-05-30 2021-12-17 华为技术有限公司 一种音频编码方法和音频编码装置
CN114127844A (zh) * 2021-10-21 2022-03-01 北京小米移动软件有限公司 一种信号编解码方法、装置、编码设备、解码设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1831940A (zh) * 2006-04-07 2006-09-13 安凯(广州)软件技术有限公司 基于音频解码器的音调和节奏快速调节方法
CN102194458A (zh) * 2010-03-02 2011-09-21 中兴通讯股份有限公司 频带复制方法、装置及音频解码方法、系统
CN104584124A (zh) * 2013-01-22 2015-04-29 松下电器产业株式会社 带宽扩展参数生成装置、编码装置、解码装置、带宽扩展参数生成方法、编码方法、以及解码方法
US20180182403A1 (en) * 2016-12-27 2018-06-28 Fujitsu Limited Audio coding device and audio coding method
US20190035413A1 (en) * 2017-07-28 2019-01-31 Fujitsu Limited Audio encoding apparatus and audio encoding method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602004010188T2 (de) * 2004-03-12 2008-09-11 Nokia Corp. Synthese eines mono-audiosignals aus einem mehrkanal-audiosignal
WO2007052088A1 (en) * 2005-11-04 2007-05-10 Nokia Corporation Audio compression
JP2008058727A (ja) * 2006-08-31 2008-03-13 Toshiba Corp 音声符号化装置
KR101355376B1 (ko) * 2007-04-30 2014-01-23 삼성전자주식회사 고주파수 영역 부호화 및 복호화 방법 및 장치
JP4932917B2 (ja) * 2009-04-03 2012-05-16 株式会社エヌ・ティ・ティ・ドコモ 音声復号装置、音声復号方法、及び音声復号プログラム
RU2689181C2 (ru) * 2014-03-31 2019-05-24 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Кодер, декодер, способ кодирования, способ декодирования и программа
MX2018012490A (es) * 2016-04-12 2019-02-21 Fraunhofer Ges Forschung Codificador de audio para codificar una se?al de audio, metodo para codificar una se?al de audio y programa de computadora en consideracion de una region espectral del pico detectada en una banda de frecuencia superior.
DK4120261T3 (da) * 2018-01-26 2024-01-22 Dolby Int Ab Bagudkompatibel integration af højfrekvensrekonstruktionsteknikker til lydsignaler
KR102474146B1 (ko) * 2018-04-25 2022-12-06 돌비 인터네셔널 에이비 후처리 지연을 저감시킨 고주파 재구성 기술의 통합

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1831940A (zh) * 2006-04-07 2006-09-13 安凯(广州)软件技术有限公司 基于音频解码器的音调和节奏快速调节方法
CN102194458A (zh) * 2010-03-02 2011-09-21 中兴通讯股份有限公司 频带复制方法、装置及音频解码方法、系统
CN104584124A (zh) * 2013-01-22 2015-04-29 松下电器产业株式会社 带宽扩展参数生成装置、编码装置、解码装置、带宽扩展参数生成方法、编码方法、以及解码方法
US20180182403A1 (en) * 2016-12-27 2018-06-28 Fujitsu Limited Audio coding device and audio coding method
US20190035413A1 (en) * 2017-07-28 2019-01-31 Fujitsu Limited Audio encoding apparatus and audio encoding method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4084001A4

Also Published As

Publication number Publication date
EP4084001A4 (en) 2023-03-08
US20220358941A1 (en) 2022-11-10
CN113192523A (zh) 2021-07-30
EP4084001A1 (en) 2022-11-02
JP7443534B2 (ja) 2024-03-05
KR20220123108A (ko) 2022-09-05
JP2023510556A (ja) 2023-03-14

Similar Documents

Publication Publication Date Title
WO2021143692A1 (zh) 一种音频编解码方法和音频编解码设备
WO2021143694A1 (zh) 一种音频编解码方法和音频编解码设备
US9251797B2 (en) Preserving matrix surround information in encoded audio/video system and method
JP6204501B2 (ja) 高周波帯域信号を予測するための方法、符号化デバイス、および復号デバイス
US11887610B2 (en) Audio encoding and decoding method and audio encoding and decoding device
WO2021208792A1 (zh) 音频信号编码方法、解码方法、编码设备以及解码设备
US20230040515A1 (en) Audio signal coding method and apparatus
WO2021244418A1 (zh) 一种音频编码方法和音频编码装置
US20220335962A1 (en) Audio encoding method and device and audio decoding method and device
US20230105508A1 (en) Audio Coding Method and Apparatus
US20220335961A1 (en) Audio signal encoding method and apparatus, and audio signal decoding method and apparatus
WO2022012677A1 (zh) 音频编解码方法和相关装置及计算机可读存储介质
US20230154472A1 (en) Multi-channel audio signal encoding method and apparatus
WO2021136343A1 (zh) 音频信号的编解码方法和编解码装置
WO2023051367A1 (zh) 解码方法、装置、设备、存储介质及计算机程序产品
CA3193063A1 (en) Spatial audio parameter encoding and associated decoding
CN115881140A (zh) 编解码方法、装置、设备、存储介质及计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21741759

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022542749

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227026854

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021741759

Country of ref document: EP

Effective date: 20220725

NENP Non-entry into the national phase

Ref country code: DE